New full Unicode for ES6 idea

Brendan Eich brendan at mozilla.com
Mon Feb 20 08:20:07 PST 2012


Allen Wirfs-Brock wrote:
>> Last year we dispensed with the binary data hacking in strings use-case. I don't see the hardship. But rather than throw exceptions on concatenation I would simply eliminate the ability to spell code units with "\uXXXX" escapes. Who's with me?
>
> I think we need to be careful not to equate the syntax of ES string literals with the actual encoding space of string elements.

I agree, which is why I'm saying with the BRS set, we should forbid 
"\uXXXX" since that is not a code point rather a code unit.

>    Whether you say "\ud800" or "\u{00d800}", or call a function that does full-unicode to UTF-16 encoding, or simply create a string from file contents you may end up with string elements containing upper or lower half surrogates.

I don't agree in the case of "\u{00d800}". That's simply an illegal code 
point, not a code unit (upper or lower half). We can reject it statically.

>      Eliminating the "\uXXXX" syntax really doesn't change anything regarding actual string processing.

True, but not my point!

> What it might do, however, is eliminate the ambiguity about the intended meaning of  "\uD800\uDc00" in legacy code.

And arising from concatenations, avoiding the loss of Gavin's 
distributive .length property.

> If "full unicode string mode" only supported \u{} escapes then existing code that uses \uXXXX would have to be updated before it could be used in that mode.  That might be a good thing.

My point! ;-)

/be


More information about the es-discuss mailing list