JSON parser grammar
oliver at apple.com
Tue Jun 2 20:59:29 PDT 2009
On Jun 2, 2009, at 7:26 PM, Mark S. Miller wrote:
> Since octal wasn't an official part of ES3, remains absent from
> official ES5, and is now explicitly prohibited from ES5/strict, it
> is good that it is not specified by JSON. I am surprised that
> json2.js accepts the syntax, and even more surprised that it
> interprets it as octal. Although the rfc says
> A JSON parser transforms a JSON text into another
> representation. A
> JSON parser MUST accept all texts that conform to the JSON grammar.
> A JSON parser MAY accept non-JSON forms or extensions.
> I think the behavior you state of json2.js, ie8, and chrome should
> be considered a bug. I hesitate to make the same statement about
> SpiderMonkey, because their behavior falls within both the letter
> and spirit of the rfc, while maintaining the subset relationship
> between JSON and EcmaScript.
I'm not talking about the RFC, i'm talking about the ES5 spec. I
guess it would be in the spirit of the RFC for the ES5 spec to define
a JSON grammar that was more (or less) lax than the the RFC, but the
ES5 spec itself should not allow variation between implementations
that would be considered "valid" as historically any place in ES that
has undefined "valid" behaviour has proved to be a compatibility
problem later on. Currently I can make a string containing a JSON
object that will produce different output (or not produce output at
all) across multiple implementations that are all "correct" -- this
seems like something that is just inviting disaster.
The json.org grammar allows the following set of characters in a string
* Any unicode character except ", \, or a control character
* \", \\, \/, \b, \f, \n, \r, \t, or \u four-hex-digits
The ES5 spec is the same, only it defines "control character" as any
character less than 0x20, and drops escaped unicode. I'm inclined to
believe that dropping the unicode escaping is likely to be a typo-
esque error, the exclusion of control characters seems deliberate but
has the effect of disallowing tab characters (among others). My
testing seems to imply that mozilla allows all control characters in a
JSON string literal including newlines, so i'd like clarification on
what is actually allowed.
More information about the es-discuss