Working with grapheme clusters

Norbert Lindenberg ecmascript at lindenbergsoftware.com
Fri Oct 25 23:32:02 PDT 2013


On Oct 24, 2013, at 7:38 , Anne van Kesteren <annevk at annevk.nl> wrote:

> On Thu, Oct 24, 2013 at 3:31 PM, Mathias Bynens <mathias at qiwi.be> wrote:
>> Imagine you’re writing a JavaScript library that escapes a given string as an HTML character reference, or as a CSS identifier, or anything else. In those cases, you don’t care about grapheme clusters, you care about code points, cause those are the units you end up escaping individually.
> 
> Is that really a common operation? I would expect formatting,
> searching, etc. to dominate. E.g. whenever you do substr/substring you
> would want that to be grapheme-cluster aware.

There are cases where you don't care about grapheme clusters, e.g. if you want to replace any occurrence of "{" + varname + "}" in a string with the value of the variable named varname.

In cases where you do care about grapheme clusters, it's usually more efficient to search based on code points or code units first, and then check whether the substring found begins and ends on grapheme cluster boundaries (e.g., if a search for "n" finds the first character of Claude's example "n̈", then you'll want to ignore that match).

Norbert


More information about the es-discuss mailing list