JSON parser grammar

David-Sarah Hopwood david-sarah at jacaranda.org
Fri Jun 5 16:24:54 PDT 2009

John Cowan wrote:
> David-Sarah Hopwood scripsit:
>> var escapable =
>> /[\\\"\x00-\x1f\x7f-\x9f\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff]/g
>> Incidentally, the format-control characters \u06FF and \u200B and the
>> the noncharacters \uFDD0..FDEF should probably be added to the above
>> set of escapable characters.
> Agree on U+200B and the non-characters.  U+06FF is a letter,
> ARABIC LETTER HEH WITH INVERTED V; I assume you don't mean that one.

Typo. I meant U+06DD ARABIC END OF AYAH.

>> (The surrogate codes should not be added, since JSON is not tied to UTF-16.)
> Presumably, ES5 strings with unpaired surrogates cannot be JSONized, since
> unpaired surrogates don't represent Unicode characters.

Right, but the main reason not to escape surrogate codes is that it
would be wrong for correctly paired surrogates, which *can* be JSONized.

This has the side-effect that an ES5 string with unpaired surrogates will
be encoded as JSON with unpaired surrogates -- garbage-in, garbage-out.
I'm not sure it's worth requiring an error here.

>> Note that \u0604 is unassigned, and probably doesn't need to be in this set.
> +1

Another character I wasn't sure about is U+202F NARROW NO-BREAK SPACE.
It is harmless to escape it, but it isn't in the Jacaranda (or Cajita, IIRC)
lists of known characters that are not accepted in JavaScript source by
some JS implementations, and it isn't a format-control character.

David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com

More information about the es5-discuss mailing list