New full Unicode for ES6 idea

Brendan Eich brendan at
Mon Feb 20 08:47:28 PST 2012

Andrew Oakley wrote:
> Issues only arise in code that tries to treat a string as an array of
> 16-bit integers, and I don't think we should be particularly bothered by
> performance of code which misuses strings in this fashion (but clearly
> this should still work without opt-in to new string handling).

This is all strings in JS and the DOM, today.

That is, we do not have any measure of code that treats strings as 
uint16s, forges strings using "\uXXXX", etc. but the ES and DOM specs 
have allowed this for > 14 years. Based on bitter experience, it's 
likely that if we change by fiat to 21-bit code points from 16-bit code 
units, some code on the Web will break.

And as noted in the o.p. and in the thread based on Allen's proposal 
last year, browser implementations definitely count on representation 
via array of 16-bit integers, with length property or method counting same.

Breaking the Web is off the table. Breaking implementations, less so. 
I'm not sure why you bring up UTF-8. It's good for encoding and decoding 
but for JS, unlike C, we want string to be a high level "full Unicode" 
abstraction. Not bytes with bits optionally set indicating more bytes 
follow to spell code points.

> I think this is a nicer and more flexible model than string
> representations being dependent on which heap they came from - all
> issues related to encoding can be contained in the String object
> implementation.

You're ignoring the compatibility break here. Browser vendors can't 
afford to do that.

> While this is being discussed, for any new string handling I think we
> should make any invalid strings (according to the rules in Unicode)
> cause some kind of exception on creation.
This is future-hostile if done for all code points. If done only for the 
code points in [D800,DFFF] both for literals using "\u{...}" and for 
constructive methods such as String.fromCharCode, then I agree.


More information about the es-discuss mailing list