JSON parser grammar

Allen Wirfs-Brock Allen.Wirfs-Brock at microsoft.com
Tue Jun 2 22:27:10 PDT 2009


See inline

>-----Original Message-----
>From: es-discuss-bounces at mozilla.org [mailto:es-discuss-
>bounces at mozilla.org] On Behalf Of Oliver Hunt
>Sent: Tuesday, June 02, 2009 8:59 PM
>To: Mark S.Miller
>Cc: es-discuss at mozilla.org
>Subject: Re: JSON parser grammar
>
>On Jun 2, 2009, at 7:26 PM, Mark S. Miller wrote:
>I'm not talking about the RFC, i'm talking about the ES5 spec.  I
>guess it would be in the spirit of the RFC for the ES5 spec to define
>a JSON grammar that was more (or less) lax than the the RFC, but the
>ES5 spec itself should not allow variation between implementations
>that would be considered "valid" as historically any place in ES that
>has undefined "valid" behaviour has proved to be a compatibility
>problem later on.

The intent was for the ES5 JSON grammar to exactly match the JSON RFC grammar.  If you think it is different, then you may have found a bug so let's make sure...

The ES5 spec intentionally doesn't include the " JSON parser MAY accept non-JSON forms or extensions." language from the RFC but the general extension allowance given in section 16 are probably sufficient to allow a conforming ES5 implementation of JSON.parse to accept non-JSON forms or extension.  See more below...

>Currently I can make a string containing a JSON
>object that will produce different output (or not produce output at
>all) across multiple implementations that are all "correct" -- this
>seems like something that is just inviting disaster.

Examples, please? The intent is that applying JSON.parse to a string containing a valid JSON form should produce an equivalent set of objects on all conforming ES5 implementation.


>
>The json.org grammar allows the following set of characters in a string
>  * Any unicode character except ", \, or a control character
>  * \", \\, \/, \b, \f, \n, \r, \t, or \u four-hex-digits
>
>The ES5 spec is the same, only it defines "control character" as any
>character less than 0x20, 

The JSON RFC also defines control character in this way: " All Unicode characters may be placed within the
   quotation marks except for the characters that must be escaped:
   quotation mark, reverse solidus, and the control characters (U+0000
   through U+001F)."

>and drops escaped unicode.

No it doesn't (from the grammar in 15.12.1.1:

JSONStringCharacter ::
     JSONSourceCharacter but not double-quote " or backslash \
     \ JSONEscapeSequence

JSONEscapeSequence ::
    JSONEscapeCharacter
    UnicodeEscapeSequence   <------------



I'm inclined to
>believe that dropping the unicode escaping is likely to be a typo-
>esque error, the exclusion of control characters seems deliberate but
>has the effect of disallowing tab characters (among others).

It identically matches the RFC


>My
>testing seems to imply that mozilla allows all control characters in a
>JSON string literal including newlines, so i'd like clarification on
>what is actually allowed.

Step 2 of 15.12.1 (JSON.parse) seems pretty clear in this regard:
  2.	Parse JText using the grammars in 15.12.1. Throw a SyntaxError exception if the JText did not conform to the JSON grammar for the goal symbol JSONValue.

A string containing control characters does not does not conform JSONString so a SyntaxError should be thrown.

However, section 16 says: " all operations (...) that are allowed to throw SyntaxError are permitted to exhibit implementation-defined behaviour instead of throwing SyntaxError when they encounter an implementation-defined extension to the program syntax or regular expression pattern or flag syntax."

We can probably debate whether this extension allowance includes or should include JSON.parse.  I probably could  be convinced that it should not but there seems to be a strong history of tolerance of almost correct inputs by JavaScript implementations so I don't know whether or not we could get consensus on that.

  


More information about the es-discuss mailing list