Proposal: `String.prototype.codePointCount`

Bob Myers rtm at
Thu Aug 8 16:09:44 UTC 2019

Consider a language such as Kannada, spoken in sourthern India, and the
25th most widely spoken language in the world, with 60M speakers.
"Characters" in the written language are represented in Unicode as elements
(sometimes called "letters") which are then composed at the rendering level
to produce what native speakers would consider "characters" (for clarify,
sometimes called "compound characters", or  *ottakshara).* The portion of
the composition algorithm which figures out which sequence of elements
belong to the same "characters" is found only in rendering engines, and is
itself so complicated that it has resulted in many bugs, including one (in
a different but related language) which caused Macs to crash. (The actual
positional composition, which involves figuring out not only how to arrange
the elements but also how to adjust their size and other details) is even
more complicated.

In any case, for Kannada, what kind of characters do you want to count with
your new string prototype method? If you're interested in knowing this to
make sure that your user does not enter a string longer than will fit in
some fixed-length database field, you're going to tell the user that "Name
can contain no more than 25 "letters", which will mean nosthing to them? If
you want to make sure some name can fit in some space on the screen, you
are going to have to count compound characters (which are fixed width, for
all practical purposes), but how are you going to do that, without
including a huge library to analyze the Kannada strings--a library which is
not even publicly available?


On Thu, Aug 8, 2019 at 8:34 AM Mathias Bynens <mathias at> wrote:

