Full Unicode strings strawman

Brendan Eich brendan at mozilla.com
Tue May 17 10:40:29 PDT 2011


On May 17, 2011, at 10:37 AM, Boris Zbarsky wrote:

> On 5/17/11 1:27 PM, Brendan Eich wrote:
>> On May 17, 2011, at 10:22 AM, Boris Zbarsky wrote:
>> 
>>> Yes.  And right now that's how it works and actual JS authors typically don't have to worry about encoding issues.  I don't agree with Allen's claim that "in the long run JS in the browser is going to have to be able to deal with arbitrary encodings".  Having the _capability_ might be nice, but forcing all web authors to think about it seems like a non-starter.
>> 
>> Allen said "be able to", not "forcing". Big difference. I think we three at least are in agreement here.
> 
> I think we're in agreement on the sentiment, but perhaps not on where on the "able to" to "forcing" spectrum this strawman falls.

Where do you read "forcing"? Not in the words you cited.


>>> See, this is the part I don't follow.  What do you mean by "full Unicode" and how do you envision it flowing?
>> 
>> I mean UTF-16 flowing through, but as you say that happens now -- but (I reply) only if JS doesn't mess with things in a UCS-2 way (indexing 16-bits at a time, ignoring surrogates). And JS code does generally assume 16 bits are enough.
>> 
>> With Allen's proposal we'll finally have some new APIs for JS developers to use.
> 
> That doesn't answer my questions....

Ok, full Unicode means non-BMP characters not being wrongly treated as two uint16 units and miscounted, separated or partly deleted by splicing and slicing, etc.

IOW, JS grows to treat strings as "full Unicode", not uint16 vectors. This is a big deal!

Hope this helps,

/be


More information about the es-discuss mailing list