New full Unicode for ES6 idea

Mark Davis ☕ mark at
Sun Feb 19 16:25:31 PST 2012

First, it would be great to get full Unicode support in JS. I know that's
been a problem for us at Google.

Secondly, while I agree with Addison that the approach that Java took is
workable, it does cause problems. Ideally someone would be able to loop (a
very common construct) with:

for (codepoint cp : someString) {

In Java, you have to do:

int cp;
for (int i = 0; i < someString.length(); i += Character.countChar(cp)) {
  cp = someString.codePointAt(i);

There are good reasons for why Java did what it did, basically for
compatibility. But if there is some way that JS can work around those,
that'd be great.

3. There's some confusion about the Unicode terminology. Here's a quick

code point: number from 0 to 0x10FFFF

character: a code point that is assigned. Eg, 0x61 represents 'a' and is a
character. 0x378 is a code point, but not (yet) a character.

code unit: an encoding 'chunk'.
UTF-8 represents a code point as 1-4 8-bit code units
UTF-16 represents a code point  as 2 or 4 16-bit code units
UTF-32 represents a code point as 1 32-bit code unit.

Mark <>
*— Il meglio è l’inimico del bene —*

On Sun, Feb 19, 2012 at 16:00, Cameron McCormack <cam at> wrote:

> Brendan Eich:
> > To hope to make this sideshow beneficial to all the cc: list, what do
> > DOM specs use to talk about uint16 units vs. code points?
> I say "code unit" as a shorter way of saying "16 bit unsigned integer code
> unit"
> (which DOM4 also links to) and then just "code point" to refer to 21 bit
> numbers that might correspond to a Unicode character, which you can see
> used in
> ______________________________**_________________
> es-discuss mailing list
> es-discuss at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the es-discuss mailing list