Full Unicode strings strawman

Brendan Eich brendan at mozilla.com
Tue May 17 10:47:50 PDT 2011

On May 17, 2011, at 10:43 AM, Boris Zbarsky wrote:

> On 5/17/11 1:40 PM, Brendan Eich wrote:
>> Where do you read "forcing"? Not in the words you cited.
> In the substance of having strings in different encodings around at the same time.  If that doesn't force developers to worry about encodings, what does, exactly?

Where in the strawman is anything of that kind observably (to JS authors) proposed?

>> Ok, full Unicode means non-BMP characters not being wrongly treated as two uint16 units and miscounted, separated or partly deleted by splicing and slicing, etc.
>> IOW, JS grows to treat strings as "full Unicode", not uint16 vectors. This is a big deal!
> OK, but still allows sticking non-Unicode gunk into the strings, right?  So they're still vectors of "something".  Whatever that something is.

Yes, old APIs for building strings, e.g. String.fromCharCode, still build "gunk strings", aka uint16 data hacked into strings. New APIs for characters. This has to apply to internal JS engine / DOM implemnetation APIs as needed, too.

>> Hope this helps,
> Halfway.  The DOM interaction questions remain unanswered.  Seriously, I think we should try to make a list of the issues there, the pitfalls that would arise for web developers as a result, then go through and see how and whether to address them.  Then we'll have a good basis for considering the web compat impact....

Good idea.


