Full Unicode strings strawman
Allen Wirfs-Brock
allen at wirfs-brock.com
Mon May 16 17:07:19 PDT 2011
On May 16, 2011, at 4:21 PM, Shawn Steele wrote:
> > Not in my proposal! "\ud800\udc00"=== "\u+010000" is false in my proposal.
>
> That’s exactly my problem. I think the engine’s (or at least the applications written in JavaScript) are still UTF-16-centric and that they’ll have d800, dc00 === 10000. For example, if they were different, then d800, dc00 should print �� instead of 𐀀, however I’m reasonably sure that any implementation would end up rendering it as 𐀀.
I think you'll find that the actual JS engines are currently UCS-2 centric. The surrounding browser environments are doing the UTF-16 interpretation. That why you see 𐀀 instead of �� in browser generated display output.
>
> In other words I don’t think you can get the engine to be completely UTF-32. At least not without declaring a page as being UTF-32.
>
I agree that application writer will continue for the foreseeable future have to know whether or not they are dealing with UTF-16 encoded data and/or communicating with other subsystems that expect such data. However, core language support for UTF-32 is a prerequisite for ever moving beyond UTF-16APIs and libraries and getting back to uniform sized character processing.
Allen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110516/802001e0/attachment-0001.html>
More information about the es-discuss
mailing list