Full Unicode strings strawman
allen at wirfs-brock.com
Mon May 16 20:13:00 PDT 2011
On May 16, 2011, at 7:18 PM, Brendan Eich wrote:
> On May 16, 2011, at 5:18 PM, Allen Wirfs-Brock wrote:
>> On May 16, 2011, at 5:06 PM, Brendan Eich wrote:
>>> On May 16, 2011, at 2:07 PM, Boris Zbarsky wrote:
>>>> That said, defining JS strings and DOMString differently seems like a recipe for serious author confusion (e.g. actually using JS strings as the DOMString binding in ES might be lossy, assigning from JS strings to DOMString might be lossy, etc). It's a minefield.
>>> Plus, people stuff random data into JS strings, which so far have not UTF-16 validated or indexed, and they could read back arbitrary uint16s in a row.
>>> Breaking this seems web-breaking to me, from what I remember. It's impossible to detect statically (early error).
>> I think I've addressed this in other responses, including in https://mail.mozilla.org/pipermail/es-discuss/2011-May/014307.html
>> See the part about passing a string with >16-bit chars to a parameter that requires a DOMString
> I'm not sure this covers all the cases. Boris mentioned how JS takes strings from many sources, and it can concatenate them, in a data flow that crosses programs. Is it really safe to reason about this in a modular or "local" way?
I think it does. In another reply I also mentioned the possibility of tagging in a JS visible manner strings that have gone through a known encoding process.
If the strings you are combining from different sources have not been canonicalize to a common encoding then you better be damn care how you combine them. The DOM seems seems to canonicalize to UTF-16 (with some slop WRT invalid encoding that Boris and others have pointed out). I don't about other sources such as XMLHttpRequest or the file APIs. However, in the long run JS in the browser is going to have to be able to deal with arbitrary encodings. You can hide such things form many programmers but not all. After all, people actually have to implement transcoders.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss