How to count the number of symbols in a string?

Norbert Lindenberg ecmascript at norbertlindenberg.com
Fri Nov 30 13:50:27 PST 2012


There's nothing in the proposal yet because I intentionally kept it small. It's always possible to add functionality, but we need some evidence that it will be widely used. Pointing at Twitter doesn't quite help - it's possible that the number that Twitter shows reflects some limitation in their back-end systems.

Thanks for bringing the issue to this list, btw - tweets aren't as effective in getting TC 39 attention.

Norbert


On Nov 30, 2012, at 13:06 , Phillips, Addison wrote:

> One question would be what you’d want that specific number for? The number of code points in a string is only marginally interesting in a script. It doesn’t, for example, tell you how many screen positions the text consumes (that’s the grapheme count).
>  
> Norbert’s proposal [1] includes an iterator over the code points (so counting the code points is straightforward, but not a property of the string itself, or at least I don’t see it anywhere).
>  
> Addison
>  
> [1] http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html
>  
> From: Andrea Giammarchi [mailto:andrea.giammarchi at gmail.com] 
> Sent: Friday, November 30, 2012 12:39 PM
> To: Mathias Bynens
> Cc: es-discuss
> Subject: Re: How to count the number of symbols in a string?
>  
> already raised a while ago ...
>  
> https://jp.twitter.com/WebReflection/status/260479508912685056
>  
> no answer, if I remember correctly Brendan said that .size() or .size is not a good name but I have suggested .points too and nobody came back on this
>  
> 
> On Fri, Nov 30, 2012 at 12:33 PM, Mathias Bynens <mathias at qiwi.be> wrote:
> ECMAScript 6 introduces some useful new features that make working with astral Unicode symbols easier.
> 
> One thing that is still missing though (AFAIK) is an easy way to count the number of symbols / code points in a given string. As you know, we can’t rely on `String.prototype.length` here, as a string containing nothing but an astral symbol has a length of `2` instead of `1`:
> 
> > var poo = '\u{1F4A9}'; // U+1F4A9 PILE OF POO
> > poo.length
> 2
> 
> Of course it’s possible to write some code yourself to loop over all the code units in the string, handle surrogate pairs, and increment a counter manually for each full code point, but that’s a pain.
> 
> It would be useful to have a new property on `String.prototype` that would return the number of Unicode symbols in the string. Something like `realLength` (of course, it needs a better name, but you get the idea):
> 
> > poo.realLength
> 1
> 
> Another possible solution is to add something like `String.prototype.codePoints` which would be an array of the numerical code point values in the string. That way, getting the length is only a matter of accessing the `length` property of the array:
> 
> > poo.codePoints
> [ 0x1F4A9 ]
> > poo.codePoints.length
> 1
> 
> Or perhaps this would be better suited as a method?
> 
> > poo.getCodePoints()
> [ 0x1F4A9 ]
> > poo.getCodePoints().length
> 1
> 
> Has anything like this been considered/discussed here yet?
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>  
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss



More information about the es-discuss mailing list