JSON parser grammar

David-Sarah Hopwood david-sarah at jacaranda.org
Fri Jun 5 09:21:03 PDT 2009


Douglas Crockford wrote:
> Mark S. Miller wrote:
>> On Wed, Jun 3, 2009 at 9:48 PM, Allen Wirfs-Brock
>> <Allen.Wirfs-Brock at microsoft.com> wrote:
>>> Given that the output produced by stringify is specified
>>> algorithmically I don't see any reason to clutter the grammar with
>>> [not for output] annotations.  If we decide we want to quote <LS> or
>>> <PS> in outputted string literals we can specify such in the
>>> stringify Quote algorithm.
>>
>> Even at this late date, I think this change to the spec is called for.
>> I agree we should change the Quote algorithm so that JSON.stringify
>> does not generate unescaped <LS> or <PS> characters. JSON.parse would
>> still accept them of course, and so the language accepted by
>> JSON.parse (full JSON <value>) would not be a subset of ES5
>> <expression>. But with this change, the subset of JSON <value> emitted
>> by JSON.stringify would also be a subset of ES5 <expression>. Small
>> cost with high payoff.
> 
> I disagree with your estimation of the payoff. And the escaping of \<LS>
> violates the JSON rules.

No-one suggested escaping <LS> as \<LS>. As for escaping <LS> as \u2028
and <PS> as \u2029, json2.js already does that. In fact it \u-escapes
all of the characters in this regexp:

var escapable =
/[\\\"\x00-\x1f\x7f-\x9f\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g

Since these are almost exactly the characters that are rejected by
various browsers in JavaScript source, and therefore will not be
accepted in strings by a naive eval-based JSON parser on those browsers,
the ES5 spec should probably do the same (as Oliver Hunt has just
pointed out).

Incidentally, the format-control characters \u06FF and \u200B and the
the noncharacters \uFDD0..FDEF should probably be added to the above
set of escapable characters. (The surrogate codes should not be added,
since JSON is not tied to UTF-16.) Note that \u0604 is unassigned, and
probably doesn't need to be in this set.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com



More information about the es5-discuss mailing list