JSON parser grammar

Waldemar Horwat waldemar at google.com
Thu Jun 11 16:20:03 PDT 2009


John Cowan wrote:
> David-Sarah Hopwood scripsit:
> 
>> This has the side-effect that an ES5 string with unpaired surrogates will
>> be encoded as JSON with unpaired surrogates -- garbage-in, garbage-out.
>> I'm not sure it's worth requiring an error here.
> 
> The trouble is that, as we discussed earlier, ES strings aren't necessarily
> Unicode (they can be a random sequence of uint16's), whereas JSON documents
> *are* necessarily Unicode.  Since the JSON encoder has to process surrogates
> anyhow, it might as well check for unpaired ones and barf.

I don't like the idea of having valid native ES strings that cannot be serialized.  The sensible thing to do is to just escape surrogates, whether they are paired or not.

This is an issue not just for surrogates.  There are 66 other code units that are not Unicode characters.  For example:

\uFFFE
\uFFFF

These are covered by the same Unicode conformance clause as unpaired surrogates, so we must treat them the same way.  Do any JSON serializers or parsers reject such escape sequences?

    Waldemar


More information about the es5-discuss mailing list