JSON parser grammar

Douglas Crockford douglas at crockford.com
Fri Jun 12 17:32:05 PDT 2009


John Cowan wrote:
> Douglas Crockford scripsit:
> 
>>     JSON text SHALL be encoded in Unicode.
>>
>> This should be understood as meaning Unicode, and not BIG5 or
>> Latin-1. It does not mean that Unicode's various not-a-character
>> classifications render characters as unrepresentable.
> 
> But that's just my point: all *characters* are representable.  Unpaired
> surrogates aren't characters and aren't representable -- text containing
> them is ill-formed per Unicode 4.0 and later.  Noncharacters are
> representable, are not characters, and shouldn't be interchanged.

JSON shares JavaScript and Java's pre-UTF16 notion of characters and 
obliviousness to surrogate pairs. If a string makes sense in JavaScript, then it 
makes sense in JSON. JavaScript does not fuss about mismatched pairs.


More information about the es5-discuss mailing list