UTF-16 vs UTF-32

Mike Samuel mikesamuel at gmail.com
Mon May 16 18:02:44 PDT 2011


2011/5/16 Shawn Steele <Shawn.Steele at microsoft.com>:
> It's clear why we want to support the full Unicode range, but it's less clear to me why UTF-32 would be desirable internally.  (Sure, it'd be nice for conversion types).

I don't think anyone says that UTF-32 is desirable *internally*.
We're talking about the API of the string type.

I have been operating under the assumption that developers would
benefit from a simple way to efficiently iterate by code unit.  An
efficient version of the below, ideally one that just works like
current for (var i = 0, n = str.length; i < n; ++i) ...str[i]...

    function iterateByCodeUnit(str, fn) {
      str += '';
      for (var i = 0, n = str.length, index; i < n; ++i, ++index) {
        var unit = str.charCodeAt(i);
        if (0xd800 <= unit && unit < 0xdc00 && i + 1 < n) {
          var next = str.charCodeAt(i + 1);
          if (0xdc00 <= next && next < 0xe000) {
            fn(((unit & 0x3ff) << 10) | (next & 0x3ff), i, index);
            ++i;
            continue;
          }
        }
        fn(unit, i, index);
      }
    }


More information about the es-discuss mailing list