JSON parser grammar
Allen.Wirfs-Brock at microsoft.com
Wed Jun 3 12:42:46 PDT 2009
I want to bring this discussion around to focus on concrete points that we need to make decisions on.
1) There is a bug in the ES5 candidate spec. in that it says that:
SourceCharacter but not U+0000 thru U+001F
This is pretty clearly bogus as it means that tabs and new line characters cannot occur anywhere in JSON source text (not just string literals). I'll probably fix it by simply equating JSONSourceCharter to SourceCharacter.
2) Do we want to permit conforming implementations to extend the JSON grammar that they recognize? This probably could be done by extending the syntax error extension allowance in section 16 to include the JSON grammar. If we allow this then most of the observed variation for the current emerging implementation that we have been talking about would probably be acceptable extensions.
My inclination is to say we should disallow such open-ended extensions. As I suggest earlier, an implementation can always provide a non-standard extended parse function if it wants to support an extended grammar.
3) If we disallow JSON grammar extensions (for JSON.parse) should we extend the existing grammar with some Postel's Law flexibility?
I could accept this for cases where we have some evidence that there are actual JSON encoders in the wild that violate/extend the JSON grammar in the identified manner.
Here are the individual cases that I know of to consider:
a) Allow strings, numbers, Booleans, and null in addition to objects and arrays as top level JSON text.
The ES5 spec. already has this although it isn't in the RFC. I haven't heard any suggestions that we remove it.
b) Permit leading zeros on numbers either with or without octal implications.
Does anyone know of any encoders or uses that actually insert leading 0's?
c) Trailing commas in objects and arrays
Are there encoders that do this or are we just anticipating that there might be manually generated files where this is convenient?
I could go either way on this one but would prefer some supporting evidence
d) Holes in arrays, eg [1,,3]
I don't think we should allow it unless we know there are encoders that generate it was acceptable to legacy eval based parsers.
e) Allow some/all control characters to appear unescaped in JSON string literals. Which ones?
Might be plausible. Crock, why did you originally forbid them? Are there known encoders that pass through such characters without escaping them?
f) Allow single quotes within JSON text as string delimiters
I'm not really suggesting we allow this, but I'm told that at least one major web site has done this.
Any other possible Postelisms? I have to say, that going through this list I don't find many of them very compelling.
>From: Rob Sayre [mailto:rsayre at mozilla.com]
>Sent: Wednesday, June 03, 2009 11:32 AM
>To: Oliver Hunt
>Cc: Allen Wirfs-Brock; Robert Sayre; Mark S.Miller; es-
>discuss at mozilla.org
>Subject: Re: JSON parser grammar
>On 6/3/09 2:25 PM, Oliver Hunt wrote:
>>>>> 2.) trailing commas in objects and arrays are allowed
>>>> V8's JSON implementation also accepts [1,,,2]
>>> What does it produce? An array with holes, or an array with null
>> An array with holes -- in so far as i can tell V8's json object
>> exactly matches the result of eval(string), just prohibiting arbitrary
>> code execution.
>I don't think we want to live with this. You should never get a
>hole/undefined out of JSON.
More information about the es-discuss