Full Unicode strings strawman

Shawn Steele Shawn.Steele at microsoft.com
Mon May 16 11:34:32 PDT 2011


Thanks for making a strawman

Unicode Escape Sequences
Is it possible for U+ to accept either 4, 5, or 6 digit sequences?   Typically when I encounter U+ notation the leading zero is omitted, and I see BMP characters quite often.  Obviously BMP could use the U notation, however it seems like it'd be annoying to the occasional user to know that U is used for some and U+ for others.  Seems like it'd be easier for developers to remember that U+ is "the new way" and U is "the old way that doesn't always work".

String Position
It's unclear to me if the string indices can be "changed" from UTF-16 to UTF-32 positions.  Although UTF-32 indices are clearly desirable, I think that many implementations currently allow UTF-16 codepoints U+D800 through U+DFFF.  In other words, I can already have Javascript strings with full Unicode range data in them.  Existing applications would then have indices that pointed to the UTF-16, not UTF-32 index.  Changing the definition of the index to UTF-32 would break those applications I think.

You also touch on that with charCodeAt/codepointAt, which resolves the problem with the output type, but doesn't address the problem with the indexing.  Similar to the way you differentiated charCode/codepoint, it may be necessary to differentiate charCode/codepoint indices.  IMO .fromCharCode doesn't have this problem since it used to fail, but now works, which wouldn't be breaking.  Unless we're concerned that now it can return a different UTF-16 length than before.

I don't like the "21" in the name of decodeURI21.  Also, the "trick" I think, is encoding to surrogate pairs (illegally, since UTF8 doesn't allow that) vs decoding to UTF16.  It seems like decoding can safely detect input supplementary characters and properly decode them, or is there something about encoding that doesn't make that state detectable?

-Shawn

From: es-discuss-bounces at mozilla.org [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Allen Wirfs-Brock
Sent: Monday, May 16, 2011 11:12 AM
To: es-discuss at mozilla.org
Subject: Full Unicode strings strawman

I tried to post a pointer to this strawman on this list a few weeks ago, but apparently it didn't reach the list for some reason.

Feed back would be appreciated:

http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings

Allen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110516/d0aa815e/attachment-0001.html>


More information about the es-discuss mailing list