Unicode normalization problem

Andrea Giammarchi andrea.giammarchi at gmail.com
Wed Apr 1 23:39:31 UTC 2015


Jordan the purpose of `Array.from` is to iterate over the string, and the
point of iteration instead of splitting is to have automagically
codepoints. This, unless I've misunderstood Mathias presentation (might be)

So, here there is a different problem: there are code-points that do not
represent real visual representation ... or maybe, the real problem, is
about broken `Array.from` polyfill?

I wouldn't be surprise in such case ;-)

On Thu, Apr 2, 2015 at 1:22 AM, Jordan Harband <ljharb at gmail.com> wrote:

> Unfortunately we don't have a String#codepoints or something that would
> return the number of code points as opposed to the number of characters
> (that "length" returns) - something like that imo would greatly simplify
> explaining the differences to people.
>
> For the time being, I've been explaining that some characters are actually
> made up of two, and the [image: 💩] character (it's a fun example to use)
> is an example of two characters combining to make one "code point". It's
> not a quick or trivial thing to explain but people do seem to grasp it
> eventually.
>
> On Wed, Apr 1, 2015 at 4:09 PM, Andrea Giammarchi <
> andrea.giammarchi at gmail.com> wrote:
>
>> and now I also gonna hope that `Array.from(foo).length // 2` wasn't by
>> accident, instead of `bar` ...
>>
>> On Thu, Apr 2, 2015 at 1:07 AM, Andrea Giammarchi <
>> andrea.giammarchi at gmail.com> wrote:
>>
>>> ```js
>>> foo.length; // 2
>>> Array.from(foo).length // 1
>>>
>>> bar.length; // 2
>>> Array.from(bar).length // 2
>>> ```
>>>
>>> I know already everything you wrote ... now, how to explain to JS users
>>> out there and how to solve?
>>>
>>> On Thu, Apr 2, 2015 at 1:04 AM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
>>>
>>>> On 4/1/15 6:56 PM, Andrea Giammarchi wrote:
>>>>
>>>>> Why is that
>>>>>
>>>>
>>>> Because those are different things.  The first is a single Unicode
>>>> character that happens to be represented by 2 UTF-16 code units.  The
>>>> second is a pair of Unicode characters that are each represented by one
>>>> UTF-16 code unit, but also happen to form a single grapheme cluster
>>>> (because one of them is a combining character).  To complicate things
>>>> further, there is also a single Unicode character that represents that same
>>>> grapheme cluster....
>>>>
>>>> String length shows the number of UTF-16 code units.
>>>>
>>>> Array.from works on Unicode characters.  That explains the foo.length
>>>> and Array.from(foo).length results.
>>>>
>>>>  and how to solve?
>>>>>
>>>>
>>>> Can you clearly explain what problem you are trying to solve?
>>>>
>>>> -Boris
>>>>
>>>> _______________________________________________
>>>> es-discuss mailing list
>>>> es-discuss at mozilla.org
>>>> https://mail.mozilla.org/listinfo/es-discuss
>>>>
>>>
>>>
>>
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150402/ce075e36/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emoji_u1f4a9.png
Type: image/png
Size: 1954 bytes
Desc: not available
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150402/ce075e36/attachment.png>


More information about the es-discuss mailing list