JSON parser grammar
douglas at crockford.com
Fri Jun 12 16:10:05 PDT 2009
John Cowan wrote:
> Waldemar Horwat scripsit:
>> I don't like the idea of having valid native ES strings that cannot
>> be serialized. The sensible thing to do is to just escape surrogates,
>> whether they are paired or not.
> Unfortunately, RFC 4627 says plainly in section 3:
> JSON text SHALL be encoded in Unicode.
> The cited version is Unicode 4.1. As of Unicode 4.0, UTF-* documents
> are ill-formed if they contain unpaired surrogates; only the codepoints
> U+0000 to U+D7FF and U+E000 to U+10FFFF are encodable. The fact that
> the ABNF seems to allow U+D800 to U+DFFF is irrelevant.
>> This is an issue not just for surrogates. There are 66 other code
>> units that are not Unicode characters. For example:
>> These are covered by the same Unicode conformance clause as unpaired
>> surrogates, so we must treat them the same way.
> Section 2.5 of the RFC says that all Unicode characters may appear
> within quotes except those that must be escaped: by clear implication,
> non-characters may not appear within quotes. We are also told
> "Any character may be escaped", but there is no permission to escape
> non-characters. This is appropriate, because JSON is an interchange
> format (per the Abstract), and non-characters should never be used
> in interchange.
> In short, ES5 JSON encoders should check for non-characters and unpaired
> surrogates and refuse to encode them.
I think that is a serious misreading of the intent of the RFC.
More information about the es5-discuss