Full Unicode strings strawman

Shawn Steele Shawn.Steele at microsoft.com
Mon May 16 15:22:58 PDT 2011

The problem is that “\UD800\UDC00” === “\U+010000”.  And if the internal representation is UTF-32, then they’d have to continue to be the same.  And it’s really hard for them to have the same length if one’s 2 code points and the other’s 1 code point.


From: es-discuss-bounces at mozilla.org [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Allen Wirfs-Brock
Sent: Monday, May 16, 2011 3:18 PM
To: Mark Davis ☕
Cc: Markus Scherer; es-discuss at mozilla.org
Subject: Re: Full Unicode strings strawman

On May 16, 2011, at 2:19 PM, Mark Davis ☕ wrote:

I'm quite sympathetic to the goal, but the proposal does represent a significant breaking change. The problem, as Shawn points out, is with indexing. Before, the strings were defined as UTF16.

Not by the ECMAScript specification

Take a sample string "\ud800\udc00\u0061" = "\u{10000}\u{61}". Right now, the 'a' (the \u{61}) is at offset 2. If the proposal were accepted, the 'a' would be at offset 1.

It the string is written as   \ud800\udc00\u0061" the 'a' will be at offset 1, even in the new proposal.  It would only be at offset 1 if it was written as "\u+010000\u+000061"  (using the literal notation from the proposal).

This will definitely cause breakage in existing code;

How does this break existing code.  Existing code can not say "\u+010000\u+000061".  As I've pointed out elsewhere on this thread existing libraries that do UTF-16 encoding/decoding must continue to do so even under this new proposal.

characters are in different positions than they were, even characters that are not supplemental ones. All it takes is one supplemental character before the current position and the offsets will be off for the rest of the string.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110516/517dcd89/attachment-0001.html>

More information about the es-discuss mailing list