JSON parser grammar

David-Sarah Hopwood david-sarah at jacaranda.org
Wed Jun 3 19:47:27 PDT 2009


Mark S. Miller wrote:
 On Wed, Jun 3, 2009 at 12:59 PM, Douglas Crockford
> <douglas at crockford.com> wrote:
>>> 2) Do we want to permit conforming implementations to extend the JSON
>>> grammar that they recognize?
>> No. An implementation has the license to support other formats (such as an
>> XML object or a SuperJSON object). But the JSON object should recognize only
>> the JSON forms described by ES5. There should be no Chapter 16 squishiness
>> here.
> 
> Crock, is your position that ES5 should specify a validating JSON
> parse exactly equivalent to the parse specified in the RFC (i.e.,
> waiving the escape clause), but with JSON <value> as the start symbol?
> If so, then I agree.

I am 100% in favour of tightly specifying what the ES5 JSON parser
must accept, and what it must generate -- subject to the following
caveat:

For practical interoperability, I think that the ES5 JSON
parser must be able to accept a slightly larger variant of
JSON than it generates. This is unfortunate; it could probably
have been avoided if JSON had originally been specified a
little more carefully, but that is water under the bridge.

The constructs that must be accepted but not generated are:

 - unescaped LineTerminators (at least <LS> and <PS>, possibly
   also <CR> and <LF>) in string literals.
 - unquoted property names, matching <IdentifierName>, in
   object literals.

It is possible that I have missed something; if anyone knows
of other constructs that might need to be accepted for
interoperability, now is the time to speak up.

The language to be accepted should *not* include constructs
that happen to be accepted by some existing parsers, but are
not produced by existing JSON generators, and are not likely
to occur in hand-produced JSON. I believe octal numeric
literals and octal character escapes are in this category.

Although this approach technically means specifying two grammars,
I think this is best done by annotating just the alternatives
that are "not for output".


Proposed changes in detail:

Change the following production in section 15.12.1.1 and
in Annex A.:

  JSONStringCharacter ::
    JSONSourceCharacter but not " or \ or LineTerminator
    \ JSONEscapeSequence
    [not for output] LineTerminator

Change the following production in section 15.12.1.2 and
in Annex A.:

  JSONMember :
    JSONString : JSONValue
    [not for output] IdentifierName : JSONValue

Add at the end of section 5.1.6:

  Some alternatives in JSON lexical and syntactic grammar
  productions are annotated with [not for output]. These
  alternatives shall be accepted by JSON.parse (15.12.2), but
  shall not be generated by JSON.stringify (15.12.3).

  For example, the production

    JSONMember :
      JSONString : JSONValue
      [not for output] IdentifierName : JSONValue

  means that the alternative IdentifierName : JSONValue shall
  be accepted but not generated, while the alternative
  JSONString : JSONValue is both accepted and generated.

Change the text of section 15.12.1 to:

  JSON.parse accepts a string that conforms to the following
  JSON grammar (including alternatives that are annotated with
  [not for output]). JSON.stringify produces a string that
  conforms to the JSON grammar (excluding alternatives
  annotated with [not for output]).

In section 15.12.2, change the sentence

# JSON uses a more limited set of white space characters than
# WhiteSpace and allows Unicode code points U+2028 and U+2029 to
# directly appear in JSONString literals without using an escape
# sequence.

to

  JSON uses a more limited set of white space characters than
  WhiteSpace. For the purpose of JSON.parse (but not for
  JSON.stringify), the characters <CR>, <LF>, <LS> and <PS>
  may directly appear in JSONString literals without using an
  escape sequence.

Also, delete the NOTE in section 15.12.2 and replace step 3 with:

  3. Let unfiltered be the result of parsing and evaluating
     JText as if it was the source text of an ECMAScript program,
     except that JSONString is used in place of StringLiteral.

(NOTEs are supposed to be informative, but this is a normative
requirement.)

Change step 2.c in the algorithm for abstract operation
Quote(value) in section 15.12.3, to

  c. Else If C is a control character having a code unit value
     less than that of <SP>, or if C is <LS> or <PS>


[No other specification changes are needed in order to
accept but not generate unquoted property names, because
step 8.b.i of abstraction operation JO(value) already applies
Quote to property names.]

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com



More information about the es5-discuss mailing list