Unicode normalization problem

Andrea Giammarchi andrea.giammarchi at gmail.com
Wed Apr 1 23:09:06 UTC 2015


and now I also gonna hope that `Array.from(foo).length // 2` wasn't by
accident, instead of `bar` ...

On Thu, Apr 2, 2015 at 1:07 AM, Andrea Giammarchi <
andrea.giammarchi at gmail.com> wrote:

> ```js
> foo.length; // 2
> Array.from(foo).length // 1
>
> bar.length; // 2
> Array.from(bar).length // 2
> ```
>
> I know already everything you wrote ... now, how to explain to JS users
> out there and how to solve?
>
> On Thu, Apr 2, 2015 at 1:04 AM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
>
>> On 4/1/15 6:56 PM, Andrea Giammarchi wrote:
>>
>>> Why is that
>>>
>>
>> Because those are different things.  The first is a single Unicode
>> character that happens to be represented by 2 UTF-16 code units.  The
>> second is a pair of Unicode characters that are each represented by one
>> UTF-16 code unit, but also happen to form a single grapheme cluster
>> (because one of them is a combining character).  To complicate things
>> further, there is also a single Unicode character that represents that same
>> grapheme cluster....
>>
>> String length shows the number of UTF-16 code units.
>>
>> Array.from works on Unicode characters.  That explains the foo.length and
>> Array.from(foo).length results.
>>
>>  and how to solve?
>>>
>>
>> Can you clearly explain what problem you are trying to solve?
>>
>> -Boris
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150402/3f9c0107/attachment.html>


More information about the es-discuss mailing list