Full Unicode strings strawman

Allen Wirfs-Brock allen at wirfs-brock.com
Mon May 16 17:07:19 PDT 2011


On May 16, 2011, at 4:21 PM, Shawn Steele wrote:

> > Not in my proposal!  "\ud800\udc00"=== "\u+010000"  is false in my proposal.
>  
> That’s exactly my problem.  I think the engine’s (or at least the applications written in JavaScript) are still UTF-16-centric and that they’ll have d800, dc00 === 10000.  For example, if they were different, then d800, dc00 should print �� instead of 𐀀, however I’m reasonably sure that any implementation would end up rendering it as 𐀀. 

I think you'll find that the actual JS engines are currently UCS-2 centric. The surrounding browser environments are doing the UTF-16 interpretation.  That why you see 𐀀 instead of �� in browser generated display output.

>  
> In other words I don’t think you can get the engine to be completely UTF-32.  At least not without declaring a page as being UTF-32.
>  

I agree that application writer will continue for the foreseeable future have to know whether or not they are dealing with UTF-16 encoded data and/or communicating with other subsystems that expect such data.  However, core language support for UTF-32 is a prerequisite for ever moving beyond UTF-16APIs and libraries and getting back to uniform sized character processing. 

Allen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110516/802001e0/attachment-0001.html>


More information about the es-discuss mailing list