Unicode normalization problem

Jordan Harband ljharb at gmail.com
Wed Apr 1 23:22:03 UTC 2015


Unfortunately we don't have a String#codepoints or something that would
return the number of code points as opposed to the number of characters
(that "length" returns) - something like that imo would greatly simplify
explaining the differences to people.

For the time being, I've been explaining that some characters are actually
made up of two, and the 💩 character (it's a fun example to use) is an
example of two characters combining to make one "code point". It's not a
quick or trivial thing to explain but people do seem to grasp it eventually.

On Wed, Apr 1, 2015 at 4:09 PM, Andrea Giammarchi <
andrea.giammarchi at gmail.com> wrote:

> and now I also gonna hope that `Array.from(foo).length // 2` wasn't by
> accident, instead of `bar` ...
>
> On Thu, Apr 2, 2015 at 1:07 AM, Andrea Giammarchi <
> andrea.giammarchi at gmail.com> wrote:
>
>> ```js
>> foo.length; // 2
>> Array.from(foo).length // 1
>>
>> bar.length; // 2
>> Array.from(bar).length // 2
>> ```
>>
>> I know already everything you wrote ... now, how to explain to JS users
>> out there and how to solve?
>>
>> On Thu, Apr 2, 2015 at 1:04 AM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
>>
>>> On 4/1/15 6:56 PM, Andrea Giammarchi wrote:
>>>
>>>> Why is that
>>>>
>>>
>>> Because those are different things.  The first is a single Unicode
>>> character that happens to be represented by 2 UTF-16 code units.  The
>>> second is a pair of Unicode characters that are each represented by one
>>> UTF-16 code unit, but also happen to form a single grapheme cluster
>>> (because one of them is a combining character).  To complicate things
>>> further, there is also a single Unicode character that represents that same
>>> grapheme cluster....
>>>
>>> String length shows the number of UTF-16 code units.
>>>
>>> Array.from works on Unicode characters.  That explains the foo.length
>>> and Array.from(foo).length results.
>>>
>>>  and how to solve?
>>>>
>>>
>>> Can you clearly explain what problem you are trying to solve?
>>>
>>> -Boris
>>>
>>> _______________________________________________
>>> es-discuss mailing list
>>> es-discuss at mozilla.org
>>> https://mail.mozilla.org/listinfo/es-discuss
>>>
>>
>>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150401/b1b4c79f/attachment-0001.html>


More information about the es-discuss mailing list