How to count the number of symbols in a string?

Andrea Giammarchi andrea.giammarchi at gmail.com
Fri Nov 30 13:59:12 PST 2012


to sanitize, I would say, is the very first use case where if str.length !=
str.points something might require a fix.

A utf-8 friendly "number of allowed chars", as it would be the twitter
case, is another example.

A split able to represent codePoints rather than chars would need points
number too ... the fact developers are already asking for a way to obtain
these codePoints should also indicate the feature might be needed.

Thoughts?


On Fri, Nov 30, 2012 at 1:06 PM, Phillips, Addison <addison at lab126.com>wrote:

> One question would be what you’d want that specific number for? The number
> of code points in a string is only marginally interesting in a script. It
> doesn’t, for example, tell you how many screen positions the text consumes
> (that’s the grapheme count).****
>
> ** **
>
> Norbert’s proposal [1] includes an iterator over the code points (so
> counting the code points is straightforward, but not a property of the
> string itself, or at least I don’t see it anywhere).****
>
> ** **
>
> Addison****
>
> ** **
>
> [1]
> http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html
> ****
>
> ** **
>
> *From:* Andrea Giammarchi [mailto:andrea.giammarchi at gmail.com]
> *Sent:* Friday, November 30, 2012 12:39 PM
> *To:* Mathias Bynens
> *Cc:* es-discuss
> *Subject:* Re: How to count the number of symbols in a string?****
>
> ** **
>
> already raised a while ago ...****
>
> ** **
>
> https://jp.twitter.com/WebReflection/status/260479508912685056****
>
> ** **
>
> no answer, if I remember correctly Brendan said that .size() or .size is
> not a good name but I have suggested .points too and nobody came back on
> this****
>
> ** **
>
> On Fri, Nov 30, 2012 at 12:33 PM, Mathias Bynens <mathias at qiwi.be> wrote:*
> ***
>
> ECMAScript 6 introduces some useful new features that make working with
> astral Unicode symbols easier.
>
> One thing that is still missing though (AFAIK) is an easy way to count the
> number of symbols / code points in a given string. As you know, we can’t
> rely on `String.prototype.length` here, as a string containing nothing but
> an astral symbol has a length of `2` instead of `1`:
>
> > var poo = '\u{1F4A9}'; // U+1F4A9 PILE OF POO
> > poo.length
> 2
>
> Of course it’s possible to write some code yourself to loop over all the
> code units in the string, handle surrogate pairs, and increment a counter
> manually for each full code point, but that’s a pain.
>
> It would be useful to have a new property on `String.prototype` that would
> return the number of Unicode symbols in the string. Something like
> `realLength` (of course, it needs a better name, but you get the idea):
>
> > poo.realLength
> 1
>
> Another possible solution is to add something like
> `String.prototype.codePoints` which would be an array of the numerical code
> point values in the string. That way, getting the length is only a matter
> of accessing the `length` property of the array:
>
> > poo.codePoints
> [ 0x1F4A9 ]
> > poo.codePoints.length
> 1
>
> Or perhaps this would be better suited as a method?
>
> > poo.getCodePoints()
> [ 0x1F4A9 ]
> > poo.getCodePoints().length
> 1
>
> Has anything like this been considered/discussed here yet?
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss****
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20121130/cc270cf6/attachment-0001.html>


More information about the es-discuss mailing list