Flexible String Representation - full Unicode for ES6?
rosuav at gmail.com
Fri Dec 21 21:34:03 PST 2012
On Sat, Dec 22, 2012 at 4:09 PM, Erik Arvidsson
<erik.arvidsson at gmail.com> wrote:
> On Fri, Dec 21, 2012 at 6:45 PM, Chris Angelico <rosuav at gmail.com> wrote:
>> There is an alternative. Python (as of version 3.3) has implemented a
>> new Flexible String Representation, aka PEP-393; the same has existed
>> in Pike for some time. A string is stored in memory with a fixed
>> number of bytes per character, based on the highest codepoint in that
>> string - if there are any non-BMP characters, 4 bytes; if any
>> U+0100-U+FFFF, 2 bytes; otherwise 1 byte. This depends on strings
>> being immutable (otherwise there'd be an annoying string-copy
>> operation when a too-large character gets put in), which is true of
>> ECMAScript. Effectively, all strings are stored in UCS-4/UTF-32, but
>> with the leading 0 bytes elided when they're not needed.
> This is how most VMs already work.
> I agree with you that it would be a better world if this was the case
> but I don't hear you suggesting how we might be able to change this
> without breaking the web?
Why, if that's how it's already being done, can't there be an easy way
to expose it to the script that way? Just flip the Big Red Switch and
suddenly be fully Unicode-safe? Yes, it's backward-incompatible, but
if the script can have some kind of marker (like "use strict") to show
that it's compliant, or if the engine can simply be told "be
compliant", we could begin to move forward. Otherwise, we're stuck
where we are.
More information about the es-discuss