How to count the number of symbols in a string?

Phillips, Addison addison at lab126.com
Fri Nov 30 13:06:44 PST 2012


One question would be what you’d want that specific number for? The number of code points in a string is only marginally interesting in a script. It doesn’t, for example, tell you how many screen positions the text consumes (that’s the grapheme count).

Norbert’s proposal [1] includes an iterator over the code points (so counting the code points is straightforward, but not a property of the string itself, or at least I don’t see it anywhere).

Addison

[1] http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html

From: Andrea Giammarchi [mailto:andrea.giammarchi at gmail.com]
Sent: Friday, November 30, 2012 12:39 PM
To: Mathias Bynens
Cc: es-discuss
Subject: Re: How to count the number of symbols in a string?

already raised a while ago ...

https://jp.twitter.com/WebReflection/status/260479508912685056

no answer, if I remember correctly Brendan said that .size() or .size is not a good name but I have suggested .points too and nobody came back on this

On Fri, Nov 30, 2012 at 12:33 PM, Mathias Bynens <mathias at qiwi.be<mailto:mathias at qiwi.be>> wrote:
ECMAScript 6 introduces some useful new features that make working with astral Unicode symbols easier.

One thing that is still missing though (AFAIK) is an easy way to count the number of symbols / code points in a given string. As you know, we can’t rely on `String.prototype.length` here, as a string containing nothing but an astral symbol has a length of `2` instead of `1`:

> var poo = '\u{1F4A9}'; // U+1F4A9 PILE OF POO
> poo.length
2

Of course it’s possible to write some code yourself to loop over all the code units in the string, handle surrogate pairs, and increment a counter manually for each full code point, but that’s a pain.

It would be useful to have a new property on `String.prototype` that would return the number of Unicode symbols in the string. Something like `realLength` (of course, it needs a better name, but you get the idea):

> poo.realLength
1

Another possible solution is to add something like `String.prototype.codePoints` which would be an array of the numerical code point values in the string. That way, getting the length is only a matter of accessing the `length` property of the array:

> poo.codePoints
[ 0x1F4A9 ]
> poo.codePoints.length
1

Or perhaps this would be better suited as a method?

> poo.getCodePoints()
[ 0x1F4A9 ]
> poo.getCodePoints().length
1

Has anything like this been considered/discussed here yet?
_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org<mailto:es-discuss at mozilla.org>
https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20121130/e40c7706/attachment.html>


More information about the es-discuss mailing list