Locale sensitivity in toLowerCase/toUpperCase

Norbert Lindenberg ecmascript at norbertlindenberg.com
Mon Aug 22 18:14:12 PDT 2011


The specification of String.prototype.toLowerCase in ES 5.1 (which is also referenced in String.prototype.toUpperCase) refers to the Unicode character database for case mappings, explicitly including "not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it in Unicode 2.1.8 and later".

The SpecialCasings.txt file includes not only a large number of locale-insensitive mappings, but also a few locale-sensitive mappings. In particular, the Turkish mappings for the Latin letters "I" and "i" (which map to "ı" (U+0131) and "İ" (U+0130) in Turkish) have been in the file since Unicode 2.1.8, while additional ones were added later.

The specification of String.prototype.toLocaleLowerCase in ES 5.1, however, seems to imply that String.prototype.toLowerCase should not use the locale-sensitive mappings: "This function works exactly the same as toLowerCase except that its result is intended to yield the correct result for the host environment‘s current locale, rather than a locale-independent result. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings."

Shouldn't the specification for String.prototype.toLowerCase explicitly exclude the locale-sensitive mappings in SpecialCasings.txt?

SpecialCasing.txt in Unicode 2.1.8:
http://www.unicode.org/Public/2.1-Update3/SpecialCasing-1.txt

SpecialCasing.txt in Unicode 2.1.9, which corrected the Turkish mapping for "I":
http://www.unicode.org/Public/2.1-Update4/SpecialCasing-2.txt

SpecialCasing.txt in Unicode 6.0:
http://www.unicode.org/Public/6.0.0/ucd/SpecialCasing.txt

Thanks,
Norbert



More information about the es-discuss mailing list