JSON parser grammar

Allen Wirfs-Brock Allen.Wirfs-Brock at microsoft.com
Wed Jun 3 21:48:37 PDT 2009


Given that the output produced by stringify is specified algorithmically I don't see any reason to clutter the grammar with [not for output] annotations.  If we decide we want to quote <LS> or <PS> in outputted string literals we can specify such in the stringify Quote algorithm.

Unescaped <LS> and <PS> in are already allowed for by the grammar in JSONStirng.  In fact, all Unicode characters except double quote,  backslash, and U+0000-U+001F can occur unquoted in a JSONString.

I don't think I've yet seen a real justification for allowing <CR> or <LF> as a JSONString char.

I'm quite dubious of the unquote property name proposal.

>-----Original Message-----
>From: es5-discuss-bounces at mozilla.org [mailto:es5-discuss-
>bounces at mozilla.org] On Behalf Of David-Sarah Hopwood
>Sent: Wednesday, June 03, 2009 7:47 PM
>To: es5-discuss at mozilla.org
>Subject: Re: JSON parser grammar
>
>Mark S. Miller wrote:
> On Wed, Jun 3, 2009 at 12:59 PM, Douglas Crockford
>> <douglas at crockford.com> wrote:
>>>> 2) Do we want to permit conforming implementations to extend the
>JSON
>>>> grammar that they recognize?
>>> No. An implementation has the license to support other formats (such
>as an
>>> XML object or a SuperJSON object). But the JSON object should
>recognize only
>>> the JSON forms described by ES5. There should be no Chapter 16
>squishiness
>>> here.
>>
>> Crock, is your position that ES5 should specify a validating JSON
>> parse exactly equivalent to the parse specified in the RFC (i.e.,
>> waiving the escape clause), but with JSON <value> as the start symbol?
>> If so, then I agree.
>
>I am 100% in favour of tightly specifying what the ES5 JSON parser
>must accept, and what it must generate -- subject to the following
>caveat:
>
>For practical interoperability, I think that the ES5 JSON
>parser must be able to accept a slightly larger variant of
>JSON than it generates. This is unfortunate; it could probably
>have been avoided if JSON had originally been specified a
>little more carefully, but that is water under the bridge.
>
>The constructs that must be accepted but not generated are:
>
> - unescaped LineTerminators (at least <LS> and <PS>, possibly
>   also <CR> and <LF>) in string literals.
> - unquoted property names, matching <IdentifierName>, in
>   object literals.
>
>It is possible that I have missed something; if anyone knows
>of other constructs that might need to be accepted for
>interoperability, now is the time to speak up.
>
>The language to be accepted should *not* include constructs
>that happen to be accepted by some existing parsers, but are
>not produced by existing JSON generators, and are not likely
>to occur in hand-produced JSON. I believe octal numeric
>literals and octal character escapes are in this category.
>
>Although this approach technically means specifying two grammars,
>I think this is best done by annotating just the alternatives
>that are "not for output".
>
>
>Proposed changes in detail:
>
>Change the following production in section 15.12.1.1 and
>in Annex A.:
>
>  JSONStringCharacter ::
>    JSONSourceCharacter but not " or \ or LineTerminator
>    \ JSONEscapeSequence
>    [not for output] LineTerminator
>
>Change the following production in section 15.12.1.2 and
>in Annex A.:
>
>  JSONMember :
>    JSONString : JSONValue
>    [not for output] IdentifierName : JSONValue
>
>Add at the end of section 5.1.6:
>
>  Some alternatives in JSON lexical and syntactic grammar
>  productions are annotated with [not for output]. These
>  alternatives shall be accepted by JSON.parse (15.12.2), but
>  shall not be generated by JSON.stringify (15.12.3).
>
>  For example, the production
>
>    JSONMember :
>      JSONString : JSONValue
>      [not for output] IdentifierName : JSONValue
>
>  means that the alternative IdentifierName : JSONValue shall
>  be accepted but not generated, while the alternative
>  JSONString : JSONValue is both accepted and generated.
>
>Change the text of section 15.12.1 to:
>
>  JSON.parse accepts a string that conforms to the following
>  JSON grammar (including alternatives that are annotated with
>  [not for output]). JSON.stringify produces a string that
>  conforms to the JSON grammar (excluding alternatives
>  annotated with [not for output]).
>
>In section 15.12.2, change the sentence
>
># JSON uses a more limited set of white space characters than
># WhiteSpace and allows Unicode code points U+2028 and U+2029 to
># directly appear in JSONString literals without using an escape
># sequence.
>
>to
>
>  JSON uses a more limited set of white space characters than
>  WhiteSpace. For the purpose of JSON.parse (but not for
>  JSON.stringify), the characters <CR>, <LF>, <LS> and <PS>
>  may directly appear in JSONString literals without using an
>  escape sequence.
>
>Also, delete the NOTE in section 15.12.2 and replace step 3 with:
>
>  3. Let unfiltered be the result of parsing and evaluating
>     JText as if it was the source text of an ECMAScript program,
>     except that JSONString is used in place of StringLiteral.
>
>(NOTEs are supposed to be informative, but this is a normative
>requirement.)
>
>Change step 2.c in the algorithm for abstract operation
>Quote(value) in section 15.12.3, to
>
>  c. Else If C is a control character having a code unit value
>     less than that of <SP>, or if C is <LS> or <PS>
>
>
>[No other specification changes are needed in order to
>accept but not generate unquoted property names, because
>step 8.b.i of abstraction operation JO(value) already applies
>Quote to property names.]
>
>--
>David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com
>
>_______________________________________________
>es5-discuss mailing list
>es5-discuss at mozilla.org
>https://mail.mozilla.org/listinfo/es5-discuss


More information about the es5-discuss mailing list