Full Unicode strings strawman
Shawn.Steele at microsoft.com
Mon May 16 11:34:32 PDT 2011
Thanks for making a strawman
Unicode Escape Sequences
Is it possible for U+ to accept either 4, 5, or 6 digit sequences? Typically when I encounter U+ notation the leading zero is omitted, and I see BMP characters quite often. Obviously BMP could use the U notation, however it seems like it'd be annoying to the occasional user to know that U is used for some and U+ for others. Seems like it'd be easier for developers to remember that U+ is "the new way" and U is "the old way that doesn't always work".
You also touch on that with charCodeAt/codepointAt, which resolves the problem with the output type, but doesn't address the problem with the indexing. Similar to the way you differentiated charCode/codepoint, it may be necessary to differentiate charCode/codepoint indices. IMO .fromCharCode doesn't have this problem since it used to fail, but now works, which wouldn't be breaking. Unless we're concerned that now it can return a different UTF-16 length than before.
I don't like the "21" in the name of decodeURI21. Also, the "trick" I think, is encoding to surrogate pairs (illegally, since UTF8 doesn't allow that) vs decoding to UTF16. It seems like decoding can safely detect input supplementary characters and properly decode them, or is there something about encoding that doesn't make that state detectable?
From: es-discuss-bounces at mozilla.org [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Allen Wirfs-Brock
Sent: Monday, May 16, 2011 11:12 AM
To: es-discuss at mozilla.org
Subject: Full Unicode strings strawman
I tried to post a pointer to this strawman on this list a few weeks ago, but apparently it didn't reach the list for some reason.
Feed back would be appreciated:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss