UTF-16 Strings not-strawman

Brendan Eich brendan at mozilla.com
Thu May 19 17:47:02 PDT 2011

On May 19, 2011, at 1:42 PM, Mike Samuel wrote:

> 2011/5/19 Shawn Steele <Shawn.Steele at microsoft.com>:
>> I don’t have time to make a real strawman, but what would people need if we
>> went the UTF-16 route (instead of full-Unicode)?  (This thread is to collect
>> requirements, which are somewhat getting lost in the merits of UTF-16 vs 32
>> bit thread).  Basically, just replace UCS-2 with UTF-16, allowing irregular
>> UTF-16 for compatibility.
>> Things that come to mind immediately are:
>> ·         Some sort of convenience notation for string literals and regular
>> expressions.

We could surely use better string and regexp literal support. I'm going to get the http://wiki.ecmascript.org/doku.php?id=strawman:multiline_regexps done by next week or bust. Perhaps we should have a companion multiline string proposal (or I'll combine them :-P), where we stipulate that both new literal forms support UTF-16 if that is accepted.

>> ·         Extend string.fromCharCode() to allow generating UTF-16 pairs for
>> values 10000-10ffff.
> +1

This is not a compatible change:

js> String.fromCharCode(0x10000)

The heap is shared in the same origin between old and new scripts, so this is borrowing some trouble. Not sure how much, but why borrow if we don't need to?

>> ·         A strict mode that disallows the irregular UTF-16?
> I think this can be best left to JSLint.

If static, yes. Runtime checking is going to cry wolf on all the data hacked into uint16 pieces in strings. That seems the common case, not escaped irregular UTF-16 in string literals. I don't see a need here that we can realistically meet.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110519/a5c8b036/attachment.html>

More information about the es-discuss mailing list