Full Unicode strings strawman
mikesamuel at gmail.com
Mon May 16 15:33:18 PDT 2011
> 2011/5/16 Allen Wirfs-Brock <allen at wirfs-brock.com>:
> I think you have an extra 0 at a couple of places above...
Yep. Sorry. The 0x10000 really is supposed to be five digits though.
> A DOMstring is defined by the DOM spec. to consists of 16-bit elements that
> are to be interpreted as a UTF-16 encoding of Unicode characters. It
> doesn't matter what implementation level representation is used for the
> string, the indexible positions within a DOMString is restricted to 16-bit
There is existing code out there that uses particular implementations
Should the cost of migrating existing implementations be taken into
account when considering this strawman?
> values. At the representation level each position could even be represented
> by a 32-bit cell and it doesn't matter. To be a valid DOMString element
> values must be in the range 0-0xffff.
> I think you are unnecessarily mixing up the string semantics defined by the
> language, encodings that might be used in implementing the semantics, and
> application level processing of those strings.
> To simplify things just think of a ES string as if it was an array each
> element of which could contain an arbitrary integer value. If we have such
> an array like [0xd800, 0xdc00] at the language semantics level this is a two
> element array containing two well specific values. At the language
> implementation level there are all sorts of representations that might be
> used, maybe the implementation Huffman encodes the elements... How the
> application processes that array is completely up to the application. It
> may treat the array simply as two integer values. It may treated each
> element as a 21-bit value encoding a Unicode codepoint and logically
> consider the array to be a unicode string of length 2. It may consider each
> element to be a 16-bit value and that sequences of values are interpreted as
> UTF-16 string encodings. In that case, it could consider it to represent a
> string of logical length 1.
I think we agree about the implementation/interface split.
If DOMString specifies the semantics of a result from
I'm not sure I understand the bit about how the semantics of DOMString
could affect ES programs.
Is it the case that
document.createTextNode('\u+010000').length === 2
'\u+010000' === 1
or are you saying that when DOMStrings are exposed to ES code, ES gets
to defined the semantics of the "length" and "0" properties.
More information about the es-discuss