Proposal: `String.prototype.codePointCount`

Bob Myers rtm at gol.com
Thu Aug 8 16:09:44 UTC 2019


Consider a language such as Kannada, spoken in sourthern India, and the
25th most widely spoken language in the world, with 60M speakers.
"Characters" in the written language are represented in Unicode as elements
(sometimes called "letters") which are then composed at the rendering level
to produce what native speakers would consider "characters" (for clarify,
sometimes called "compound characters", or  *ottakshara).* The portion of
the composition algorithm which figures out which sequence of elements
belong to the same "characters" is found only in rendering engines, and is
itself so complicated that it has resulted in many bugs, including one (in
a different but related language) which caused Macs to crash. (The actual
positional composition, which involves figuring out not only how to arrange
the elements but also how to adjust their size and other details) is even
more complicated.

In any case, for Kannada, what kind of characters do you want to count with
your new string prototype method? If you're interested in knowing this to
make sure that your user does not enter a string longer than will fit in
some fixed-length database field, you're going to tell the user that "Name
can contain no more than 25 "letters", which will mean nosthing to them? If
you want to make sure some name can fit in some space on the screen, you
are going to have to count compound characters (which are fixed width, for
all practical purposes), but how are you going to do that, without
including a huge library to analyze the Kannada strings--a library which is
not even publicly available?

--
Bob

On Thu, Aug 8, 2019 at 8:34 AM Mathias Bynens <mathias at qiwi.be> wrote:

> Prior discussion from 7 years ago:
> https://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string
>
> [...string].length does what you want. But it's definitely not always what
> you need
> <https://mathiasbynens.be/notes/javascript-unicode#other-grapheme-clusters>
> .
>
> On Thu, Aug 8, 2019 at 4:37 AM fanerge <fanerge at qq.com> wrote:
>
>> I expect to be able to add an attribute to String.prototype that returns
>> the number of codePoints of the string to reflect the actual number of
>> characters instead of the code unit
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20190808/3ebba3e0/attachment-0001.html>


More information about the es-discuss mailing list