New full Unicode for ES6 idea
allen at wirfs-brock.com
Sun Feb 19 16:24:28 PST 2012
On Feb 19, 2012, at 1:34 PM, Brendan Eich wrote:
> Wes Garland wrote:
>> Is there a proposal for interaction with JSON?
> From http://www.ietf.org/rfc/rfc4627, 2.5:
> To escape an extended character that is not in the Basic Multilingual
> Plane, the character is represented as a twelve-character sequence,
> encoding the UTF-16 surrogate pair. So, for example, a string
> containing only the G clef character (U+1D11E) may be represented as
I think it is actually more complex than just the above. 2.5 also says:
"All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)." (emphasis added)
and 3. says:
"JSON text SHALL be encoded in Unicode. The default encoding is UTF-8." and then goes on to talk about how to detect UTF-8, 16, and 32 LE and BE encodings. So all those are legal.
With the BRS, JSON.parse and JSON.stringify could encounter non-BMP characters in the JS string it is processing and those also would presumably pass through transparently. The one requirement of rfc 4627 that would be impacted by the BRS would be the 12-charcter escape sequences mentioned above. Currently JSON.parse implementations encode those as UTF-16 surrogate pairs in the generated strings. If the BSR is flipped, the rfc seems to require that they generate a single string element. Because, the JSON.stringify spec. does not escape anything other than control characters, any non-BMP characters it encounter would pass through unencoded. This implies that JSON.parse input of the form "\uD834\uDD1E" would probably round trip back out via JSON.stringify as JSON string containing the single unencoded G clef character. Logically equivalent but not the identical JSON text.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss