Working with grapheme clusters
jason.orendorff at gmail.com
Fri Oct 25 18:35:55 PDT 2013
On Thu, Oct 24, 2013 at 7:38 AM, Anne van Kesteren <annevk at annevk.nl> wrote:
> On Thu, Oct 24, 2013 at 3:31 PM, Mathias Bynens <mathias at qiwi.be> wrote:
> Is that really a common operation? I would expect formatting,
> searching, etc. to dominate. E.g. whenever you do substr/substring you
> would want that to be grapheme-cluster aware.
I think I disagree. Trying to take this apart:
If you're searching, you don't want to use the iterator anyway,
because finding character boundaries or grapheme boundaries is a waste
of time. UTF-16 is designed so that you can search based on code units
alone, without computing boundaries. RegExp searches fall in this
IIUC, "formatting" mostly involves finding patterns to replace—it's a
special case of searching, right?
When you do substr/slice/substring, you should be using offsets that
are on grapheme boundaries, but obtaining offsets by using String
iteration and adding up the lengths will be very rare, I think.
So String iteration is kind of left looking around for a use case. I
can't think of any that compel me to prefer graphemes over characters
out of sheer practicality. Reversing strings, for example, I can't
care about that. Anyone?
More information about the es-discuss