How to count the number of symbols in a string?

Mathias Bynens mathias at qiwi.be
Sat Dec 1 00:09:36 PST 2012


On 30 Nov 2012, at 22:50, Norbert Lindenberg <ecmascript at norbertlindenberg.com> wrote:

> There's nothing in the proposal yet because I intentionally kept it small. It's always possible to add functionality, but we need some evidence that it will be widely used.

My guess would be that in 99% of all cases where `String.prototype.length` is used the intention is to count the code points, not the UCS-2/UTF-16 code units. As for evidence:

> Pointing at Twitter doesn't quite help - it's possible that the number that Twitter shows reflects some limitation in their back-end systems.

That’s the thing — the Twitter back-end does the right thing and doesn’t discriminate between BMP and astral symbols. Each symbol counts as a single “character” towards the 140 character limit. You can post a tweet consisting of 140 astral symbols just fine as long as you use a Twitter client that supports it.

The behavior you’re seeing in the Twitter web client is a bug. They’re simply getting the `length` of the input string rather than accounting for surrogate halves and counting the actual full code points.

I feel adding this functionality to ES6 would 1) help raise awareness of the issue, and 2) give developers an easy way to work around ECMAScript’s UCS-2/UTF-16-ish behavior.


More information about the es-discuss mailing list