Full Unicode strings strawman
allen at wirfs-brock.com
Mon May 16 15:42:31 PDT 2011
On May 16, 2011, at 2:23 PM, Shawn Steele wrote:
> In UTF-8, individually encoded surrogates are illegal (and a security risk). Eg: you shouldn’t be able to encode D800/DC00 as two 3 byte sequences, they should be a single 6 byte sequence. Having not played with the js encoding/decoding in quite some time, I’m not sure what they do in that case, but hopefully it isn’t illegal UTF-8. (You also shouldn’t be able to have half a surrogate pair in UTF-16, but many things are pretty lax about that.)
There is a chicken and egg issue here. The DOM will never evolved to directly support non UTF-16 encoded supplemental characters unless ECMAScript first provides such support. It may take 20 years to get there but that clock won't even start until ECMAScript provides the necessary support.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss