Unicode normalization problem

Andrea Giammarchi andrea.giammarchi at gmail.com
Wed Apr 1 22:56:20 UTC 2015


I think the concern on how people seeing what they see can be understood
from JS is more than valid ...

```js
var foo = '𝐀';
var bar = 'Й';
foo.length; // 2
Array.from(foo).length // 1

bar.length; // 2
Array.from(foo).length // 2
```

Why is that and how to solve?


On Wed, Apr 1, 2015 at 10:32 PM, Mathias Bynens <mathias at qiwi.be> wrote:

> On Wed, Apr 1, 2015 at 10:30 PM, monolithed <monolithed at gmail.com> wrote:
> >> What you’re seeing there is not normalization, but rather the string
> >> iterator that automatically accounts for surrogate pairs (treating them
> as a
> >> single unit).
> >
> > ```js
> > var foo = '𝐀';
> > var bar = 'Й';
> > foo.length; // 2
> > Array.from(foo).length // 1
> >
> > bar.length; // 2
> > Array.from(foo).length // 2
> > ```
> >
> > I think this is strange.
> > How to safely work with strings?
>
> It depends on your use case. FWIW, I’ve outlined some examples here:
> https://mathiasbynens.be/notes/javascript-unicode
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20150402/44146c0a/attachment.html>


More information about the es-discuss mailing list