How to count the number of symbols in a string?
ecmascript at norbertlindenberg.com
Tue Dec 4 14:03:32 PST 2012
On Dec 4, 2012, at 11:43 , David Bruant wrote:
> Le 04/12/2012 20:25, Jason Orendorff a écrit :
>> On Sat, Dec 1, 2012 at 2:09 AM, Mathias Bynens <mathias at qiwi.be> wrote:
>>> My guess would be that in 99% of all cases where `String.prototype.length` is used the intention is to count the code points, not the UCS-2/UTF-16 code units.
>> I don't think this is right. My guess is that in most cases where it matters either way, the intention is to get a count that's consistent with .charAt(), .indexOf(), .slice(), RegExp match.index, and every other place where string indexes are used.
> I think Twitter has a bug as mentioned earlier in the thread and that's unrelated to consistency with the method you're mentioning.
One example isn't enough to support a "99% of all cases" claim. And I agree with Jason - many uses of String.length are related to some sort of iteration over the code units of the String, and then consistency with indices is critical. Showing the length of a string to the user is a rare (although important) case.
> I however agree that if something is added to get the actual length, a whole set of methods needs to be added too.
Which proposal are you referring and agreeing to?
>> That said, of course this is a sensible feature to add; but calling it ".realLength" wouldn't help anyone understand the rather fine distinction at issue.
> Maybe the solution lies in finding the right prefix to define .*length, .*charAt(), .*indexOf(), etc. Maybe "CP" for "code points" .CPlength? .cpLength/cpCharAt/cpIndexOf... ?
"cp" to indicate that code point indices? I think using two parallel index systems would only create confusion. Most string processing, including indexOf, works fine with supplementary characters without doing anything special for them. We need to provide a foundation that lets developers easily support supplementary characters in functionality that needs to be aware of them, but in many applications few changes will be required.
> While you're talking about regexps, I think there is an issue with current RegExps. Mathias will know better. Could a new flag solve the issue?
RegExp does require major changes to support supplementary characters. The proposal accepted for ES6 (although not integrated into the spec yet) is at
Are you aware of issues not addressed there?
More information about the es-discuss