JSON parser grammar

Oliver Hunt oliver at apple.com
Tue Jun 2 20:59:29 PDT 2009


On Jun 2, 2009, at 7:26 PM, Mark S. Miller wrote:
> Since octal wasn't an official part of ES3, remains absent from  
> official ES5, and is now explicitly prohibited from ES5/strict, it  
> is good that it is not specified by JSON. I am surprised that  
> json2.js accepts the syntax, and even more surprised that it  
> interprets it as octal. Although the rfc says
>
>    A JSON parser transforms a JSON text into another  
> representation.  A
>    JSON parser MUST accept all texts that conform to the JSON grammar.
>    A JSON parser MAY accept non-JSON forms or extensions.
> I think the behavior you state of json2.js, ie8, and chrome should  
> be considered a bug. I hesitate to make the same statement about  
> SpiderMonkey, because their behavior falls within both the letter  
> and spirit of the rfc, while maintaining the subset relationship  
> between JSON and EcmaScript.
I'm not talking about the RFC, i'm talking about the ES5 spec.  I  
guess it would be in the spirit of the RFC for the ES5 spec to define  
a JSON grammar that was more (or less) lax than the the RFC, but the  
ES5 spec itself should not allow variation between implementations  
that would be considered "valid" as historically any place in ES that  
has undefined "valid" behaviour has proved to be a compatibility  
problem later on.  Currently I can make a string containing a JSON  
object that will produce different output (or not produce output at  
all) across multiple implementations that are all "correct" -- this  
seems like something that is just inviting disaster.

The json.org grammar allows the following set of characters in a string
  * Any unicode character except ", \, or a control character
  * \", \\, \/, \b, \f, \n, \r, \t, or \u four-hex-digits

The ES5 spec is the same, only it defines "control character" as any  
character less than 0x20, and drops escaped unicode.  I'm inclined to  
believe that dropping the unicode escaping is likely to be a typo- 
esque error, the exclusion of control characters seems deliberate but  
has the effect of disallowing tab characters (among others).  My  
testing seems to imply that mozilla allows all control characters in a  
JSON string literal including newlines, so i'd like clarification on  
what is actually allowed.

--Oliver


More information about the es-discuss mailing list