Re: Question about the “full Unicode in strings” strawman

Allen Wirfs-Brock allen at wirfs-brock.com
Wed Jan 25 11:33:47 PST 2012


On Jan 25, 2012, at 9:54 AM, John Tamplin wrote:

> On Wed, Jan 25, 2012 at 12:46 PM, Allen Wirfs-Brock <allen at wirfs-brock.com> wrote:
> Arbitrary 16-bit values can be placed in a String using either String.fromCharCode (15.5.3.2) or the \uxxxx notation in string literals.  Neither of these enforce a requirement that individual String elements are valid Unicode code units.
> 
> You can't really store arbitrary 16-bit values in strings, as they will get corrupted in some browsers.  Specifically combining marks and unpaired surrogates are problematic, and some invalid code points get replaced with another character.  Even if it is only text, you can't rely on the strings not being mangled -- GWT RPC quotes different ranges of characters on different browsers.
> 
> http://code.google.com/p/google-web-toolkit/source/browse/trunk/user/src/com/google/gwt/user/client/rpc/impl/ClientSerializationStreamWriter.java?spec=svn10146&r=10146#86
> 
> (the Android bug mentioned has been fixed long ago, but I haven't gone through any kind of research to see how many of the broken browsers are still in use to see if it is safe to remove).

It isn't clear from your  source code what encoding issues you have actually identified.  I suspect that you are talking about what happens when an external resource (a application/javascript file) which may be in various UTF encodings is normalized and passed to the JavaScript parser.  If so, that isn't what we are talking about here.  We are talking about what values can exist at runtime as the individual elements of a string value.

Any browser JavaScript implementation conforming to either the ES3 or ES5.1 spec should display passed for the following test case:

var hc, s;
for(var c=0; c<=0xffff;c++) {
   // test charCode creation and access
   if (String.fromCharCode(c).charCodeAt(0)!==c) alert("failed for: "+c);
   //test "\uxxxx" using eval
   hc= '"\\u'+(c<16?'000':c<256?'00':c<4096?'0':'')+c.toString(16)+'"';
   s = eval(hc);
   if (s.length !== 1) alert(' failed \\u bad length for '+c);
   if (s.charCodeAt(0)!==c) alert('failed \\u for '+c);
};
alert("passed");


Allen


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120125/28287c56/attachment.html>


More information about the es-discuss mailing list