New full Unicode for ES6 idea
Brendan Eich
brendan at mozilla.com
Mon Feb 20 08:20:07 PST 2012
Allen Wirfs-Brock wrote:
>> Last year we dispensed with the binary data hacking in strings use-case. I don't see the hardship. But rather than throw exceptions on concatenation I would simply eliminate the ability to spell code units with "\uXXXX" escapes. Who's with me?
>
> I think we need to be careful not to equate the syntax of ES string literals with the actual encoding space of string elements.
I agree, which is why I'm saying with the BRS set, we should forbid
"\uXXXX" since that is not a code point rather a code unit.
> Whether you say "\ud800" or "\u{00d800}", or call a function that does full-unicode to UTF-16 encoding, or simply create a string from file contents you may end up with string elements containing upper or lower half surrogates.
I don't agree in the case of "\u{00d800}". That's simply an illegal code
point, not a code unit (upper or lower half). We can reject it statically.
> Eliminating the "\uXXXX" syntax really doesn't change anything regarding actual string processing.
True, but not my point!
> What it might do, however, is eliminate the ambiguity about the intended meaning of "\uD800\uDc00" in legacy code.
And arising from concatenations, avoiding the loss of Gavin's
distributive .length property.
> If "full unicode string mode" only supported \u{} escapes then existing code that uses \uXXXX would have to be updated before it could be used in that mode. That might be a good thing.
My point! ;-)
/be
More information about the es-discuss
mailing list