Full Unicode based on UTF-16 proposal

Norbert Lindenberg ecmascript at norbertlindenberg.com
Sat Mar 24 16:50:54 PDT 2012


On Mar 23, 2012, at 7:12 , Lasse Reichstein wrote:

> On Fri, Mar 23, 2012 at 2:30 PM, Steven Levithan
> <steves_list at hotmail.com> wrote:
>> I've been wondering whether it might be best for the /u flag to do three
>> things at once, making it an all-around "support Unicode better" flag:
> 
> ...
> 
>> 3. [New proposal] Makes /i use Unicode casefolding rules.
> 
> Yey, I'm for it :)
> Especially if it means dropping the rather naïve canonicalize function
> that can't canonicalize an ASCII character with a non-ASCII character.
> 
>> /ΣΤΙΓΜΑΣ/iu.test("στιγμας") == true.
> 
> I think a compliant implementation should (read: ought to) already get
> that example, since "στιγμας".toUpperCase() == "ΣΤΙΓΜΑΣ".toUpperCase()
> in the browsers I have checked, and the ignore-case canonicalization
> is based on toUpperCase. Alas, most of the implementations miss it
> anyway.

According to the ES5 spec, /ΣΤΙΓΜΑΣ/i.test("στιγμας") must be true indeed. Chrome and Node (i.e., V8) and IE get this right; Safari, Firefox, and Opera don't.

Note that toUpperCase allows mappings from 1 to multiple code units, while RegExp canonicalization in ES5 doesn't, so /SS/i.test("ß") === false even though "SS".toUpperCase() === "ß".toUpperCase().

Norbert



More information about the es-discuss mailing list