New full Unicode for ES6 idea
brendan at mozilla.com
Sun Feb 19 13:34:05 PST 2012
Wes Garland wrote:
> Is there a proposal for interaction with JSON?
From http://www.ietf.org/rfc/rfc4627, 2.5:
To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
> Also because inter-compartment traffic is (we conjecture)
> infrequent enough to tolerate the proxy/copy overhead.
> Not to mention that the only thing you'd have to do is to tweak
> [[get]], charCodeAt and .length when crossing boundaries; you can keep
> the same backing store.
String methods are not generally self-hosted, so internal C++ vector
access would need to change depending on the string's flag bit, in this
> You might not even need to do this is the engine keeps the same
> backing store for both kinds of strings.
Yes, sharing the uint16 vector is good. But string methods would have to
index and .length differently (if I can verb .length ;-).
> This means a script intent on comparing strings from two globals
> with different BRS settings could indeed tell that one discloses
> non-BMP char/codes, e.g. charCodeAt return values >= 0x10000. This
> is the *small* new observable I claim we can live with, because
> someone opted into it at least in one of the related global objects.
> Funny question, if I have two strings, both "hello", from two globals
> with different BRS settings, are they ==? How about ===?
Of course, strings with the same characters are == and ===. Strings
appear to be values. If you think of them as immutable reference types
there's still an obligation to compare characters for strings because
computed strings are not intern'ed.
> R1. To keep compatibility with DOM APIs, the DOM glue used to
> mediate calls from JS to (typically) C++ would have to proxy or
> copy any strings containing non-BMP characters. Strings with only
> BMP characters would work as today.
> Is that true if the "full unicode" backing store is 16-bit code units
> using UTF-16 encoding? (Any way, it's an implementation detail)
Yes, because DOMString has intrinsic length and indexing notions and
these must (pending any coordination with w3c) remain ignorant of the
BRS and livin' in the '90s (DOM too emerged in the UCS-2 era).
More information about the es-discuss