Full Unicode strings strawman
allen at wirfs-brock.com
Mon May 16 19:57:45 PDT 2011
On May 16, 2011, at 7:22 PM, Boris Zbarsky wrote:
> On 5/16/11 10:20 PM, Allen Wirfs-Brock wrote:
>>> That seems like it'll make it very easy to introduce strings that are a mix of the two via concatenation....
>> Some implementations already use tree structures to represent strings that are built via concatenation. It would be straight forward to have such a tree string representation where some segments have 16-bit cells and others 32-bit (or even 8-bit) cells. That is probably how I would present any long string that that contained only a few supplemental characters.
> I'm not talking about the implementation end. I can see how I'd implement this stuff, or make Luke implement it or something. What I don't see is how the JS program author can sanely work with the result.
In theory, the JS programmer already has to manually keep track of where or not a string value is UTF-16 or UCS-2. As John Tamplin observed in https://mail.mozilla.org/pipermail/es-discuss/2011-May/014319.html most JS programmer simply assume they are dealing with the BMP and trip-up if they actually have to process a surrogate pair that was unexpectedly handed to them form the DOM.
That said, it was be easy enough to expand proposal with a JS programmer visible property on string values that said whether or not the string was known to be UTF-16 encoded or not, similarly a flag for UTF-32 encode. Presumably all values returned from the DOM as DOMStrings would have the property set. Strings produced by a UTF16Decde function or explicitly constructed containing supplementarity characters would get the UTF-32 flag. If you concatenated one of each it would get neither flag.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss