Full Unicode strings strawman

Mike Samuel mikesamuel at gmail.com
Mon May 16 15:33:18 PDT 2011


> 2011/5/16 Allen Wirfs-Brock <allen at wirfs-brock.com>:
> I think you have an extra 0 at a couple of  places above...

Yep.  Sorry.  The 0x10000 really is supposed to be five digits though.

> A DOMstring is defined by the DOM spec. to consists of 16-bit elements that
> are to be interpreted as a UTF-16 encoding of Unicode characters.  It
> doesn't matter what implementation level representation is used for the
> string, the indexible positions within a DOMString is restricted to 16-bit

Really?

There is existing code out there that uses particular implementations
for strings.
Should the cost of migrating existing implementations be taken into
account when considering this strawman?

> values. At the representation level each position could even be represented
> by a 32-bit cell and it doesn't matter.  To be a valid DOMString element
> values must be in the range 0-0xffff.
> I think you are unnecessarily mixing up the string semantics defined by the
> language, encodings that might be used in implementing the semantics, and
> application level processing of those strings.

> To simplify things just think of a ES string as if it was an array each
> element of which could contain an arbitrary integer value.  If we have such
> an array like [0xd800, 0xdc00] at the language semantics level this is a two
> element array containing two well specific values.  At the language
> implementation level there are all sorts of representations that might be
> used, maybe the implementation Huffman encodes the elements...  How the
> application processes that array is completely up to the application.  It
> may treat the array simply as two integer values.  It may treated each
> element as a 21-bit value encoding a Unicode codepoint and logically
> consider the array to be a unicode string of length 2.  It may consider each
> element to be a 16-bit value and that sequences of values are interpreted as
> UTF-16 string encodings.  In that case, it could consider it to represent a
> string of logical length 1.

I think we agree about the implementation/interface split.

If DOMString specifies the semantics of a result from

I'm not sure I understand the bit about how the semantics of DOMString
could affect ES programs.

Is it the case that

    document.createTextNode('\u+010000').length === 2
    '\u+010000' === 1

or are you saying that when DOMStrings are exposed to ES code, ES gets
to defined the semantics of the "length" and "0" properties.


More information about the es-discuss mailing list