UTF-16 vs UTF-32
Mike Samuel
mikesamuel at gmail.com
Mon May 16 18:02:44 PDT 2011
2011/5/16 Shawn Steele <Shawn.Steele at microsoft.com>:
> It's clear why we want to support the full Unicode range, but it's less clear to me why UTF-32 would be desirable internally. (Sure, it'd be nice for conversion types).
I don't think anyone says that UTF-32 is desirable *internally*.
We're talking about the API of the string type.
I have been operating under the assumption that developers would
benefit from a simple way to efficiently iterate by code unit. An
efficient version of the below, ideally one that just works like
current for (var i = 0, n = str.length; i < n; ++i) ...str[i]...
function iterateByCodeUnit(str, fn) {
str += '';
for (var i = 0, n = str.length, index; i < n; ++i, ++index) {
var unit = str.charCodeAt(i);
if (0xd800 <= unit && unit < 0xdc00 && i + 1 < n) {
var next = str.charCodeAt(i + 1);
if (0xdc00 <= next && next < 0xe000) {
fn(((unit & 0x3ff) << 10) | (next & 0x3ff), i, index);
++i;
continue;
}
}
fn(unit, i, index);
}
}
More information about the es-discuss
mailing list