JSON parser grammar
Allen.Wirfs-Brock at microsoft.com
Tue Jun 2 22:27:10 PDT 2009
>From: es-discuss-bounces at mozilla.org [mailto:es-discuss-
>bounces at mozilla.org] On Behalf Of Oliver Hunt
>Sent: Tuesday, June 02, 2009 8:59 PM
>To: Mark S.Miller
>Cc: es-discuss at mozilla.org
>Subject: Re: JSON parser grammar
>On Jun 2, 2009, at 7:26 PM, Mark S. Miller wrote:
>I'm not talking about the RFC, i'm talking about the ES5 spec. I
>guess it would be in the spirit of the RFC for the ES5 spec to define
>a JSON grammar that was more (or less) lax than the the RFC, but the
>ES5 spec itself should not allow variation between implementations
>that would be considered "valid" as historically any place in ES that
>has undefined "valid" behaviour has proved to be a compatibility
>problem later on.
The intent was for the ES5 JSON grammar to exactly match the JSON RFC grammar. If you think it is different, then you may have found a bug so let's make sure...
The ES5 spec intentionally doesn't include the " JSON parser MAY accept non-JSON forms or extensions." language from the RFC but the general extension allowance given in section 16 are probably sufficient to allow a conforming ES5 implementation of JSON.parse to accept non-JSON forms or extension. See more below...
>Currently I can make a string containing a JSON
>object that will produce different output (or not produce output at
>all) across multiple implementations that are all "correct" -- this
>seems like something that is just inviting disaster.
Examples, please? The intent is that applying JSON.parse to a string containing a valid JSON form should produce an equivalent set of objects on all conforming ES5 implementation.
>The json.org grammar allows the following set of characters in a string
> * Any unicode character except ", \, or a control character
> * \", \\, \/, \b, \f, \n, \r, \t, or \u four-hex-digits
>The ES5 spec is the same, only it defines "control character" as any
>character less than 0x20,
The JSON RFC also defines control character in this way: " All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
>and drops escaped unicode.
No it doesn't (from the grammar in 188.8.131.52:
JSONSourceCharacter but not double-quote " or backslash \
I'm inclined to
>believe that dropping the unicode escaping is likely to be a typo-
>esque error, the exclusion of control characters seems deliberate but
>has the effect of disallowing tab characters (among others).
It identically matches the RFC
>testing seems to imply that mozilla allows all control characters in a
>JSON string literal including newlines, so i'd like clarification on
>what is actually allowed.
Step 2 of 15.12.1 (JSON.parse) seems pretty clear in this regard:
2. Parse JText using the grammars in 15.12.1. Throw a SyntaxError exception if the JText did not conform to the JSON grammar for the goal symbol JSONValue.
A string containing control characters does not does not conform JSONString so a SyntaxError should be thrown.
However, section 16 says: " all operations (...) that are allowed to throw SyntaxError are permitted to exhibit implementation-defined behaviour instead of throwing SyntaxError when they encounter an implementation-defined extension to the program syntax or regular expression pattern or flag syntax."
More information about the es-discuss