Full Unicode strings strawman

Mike Samuel mikesamuel at gmail.com
Mon May 16 16:37:48 PDT 2011


2011/5/16 Allen Wirfs-Brock <allen at wirfs-brock.com>:
> No. That would be a breaking change in the context of the browser.  Programs creating surrogate that want to be updated to not use surrogate pairs are the only ones that need to retool.  More likely we are talking about new code that can be written without having to worry about surrogate pairs.  If somebody wants to grab a bunch of text from the DOM and manipulate it without encountering surrogate pairs, they will need to explicit perform a decodeUTF16 transformation.

Without this strawman, devs willing to put in the effort can use one
mechanism to loop by codepoint.
Devs who don't put in the effort don't get easy/correct codepoint iteration.

With this strawman, devs who care about supplemental codepoints have
to call decodeUTF16 whenever they access a DOMString property.
Devs who don't put in the effort get easy/correct codepoint iteration.

So this strawman will not provide a single way to iterate correctly by
codepoint in apps that are not written with supplemental codepoints in
mind.

Is that correct?

Knowing where to put decodeUTF16 calls is tough in the presence of
reflective property access.
Consider a bulk property copy

    function (properties, src, dest) {
      for (var i = 0, n = properties.length; i < n; ++i) {
        var k = properties[i]l
        dest[k] = src[k];
      }
    }

A DOM object can have custom properties that are not DOMStrings and
regular properties that are.  How would an application that wants to
make sure all DOMStrings entering the program are properly decoded if
it uses an idiom like this?


More information about the es-discuss mailing list