How to count the number of symbols in a string?

Mathias Bynens mathias at qiwi.be
Fri Nov 30 12:33:48 PST 2012


ECMAScript 6 introduces some useful new features that make working with astral Unicode symbols easier.

One thing that is still missing though (AFAIK) is an easy way to count the number of symbols / code points in a given string. As you know, we can’t rely on `String.prototype.length` here, as a string containing nothing but an astral symbol has a length of `2` instead of `1`:

> var poo = '\u{1F4A9}'; // U+1F4A9 PILE OF POO
> poo.length
2

Of course it’s possible to write some code yourself to loop over all the code units in the string, handle surrogate pairs, and increment a counter manually for each full code point, but that’s a pain.

It would be useful to have a new property on `String.prototype` that would return the number of Unicode symbols in the string. Something like `realLength` (of course, it needs a better name, but you get the idea):

> poo.realLength
1

Another possible solution is to add something like `String.prototype.codePoints` which would be an array of the numerical code point values in the string. That way, getting the length is only a matter of accessing the `length` property of the array:

> poo.codePoints
[ 0x1F4A9 ]
> poo.codePoints.length
1

Or perhaps this would be better suited as a method?

> poo.getCodePoints()
[ 0x1F4A9 ]
> poo.getCodePoints().length
1

Has anything like this been considered/discussed here yet?


More information about the es-discuss mailing list