Locale sensitivity in toLowerCase/toUpperCase

Mike Samuel mikesamuel at gmail.com
Mon Aug 22 19:15:43 PDT 2011


2011/8/22 Norbert Lindenberg <ecmascript at norbertlindenberg.com>:
> The specification of String.prototype.toLowerCase in ES 5.1 (which is also referenced in String.prototype.toUpperCase) refers to the Unicode character database for case mappings, explicitly including "not only the UnicodeData.txt file, but also the SpecialCasings.txt file that accompanies it in Unicode 2.1.8 and later".
>
> The SpecialCasings.txt file includes not only a large number of locale-insensitive mappings, but also a few locale-sensitive mappings. In particular, the Turkish mappings for the Latin letters "I" and "i" (which map to "ı" (U+0131) and "İ" (U+0130) in Turkish) have been in the file since Unicode 2.1.8, while additional ones were added later.

JQuery and a lot of other JS code would break in pretty obvious ways
if any major browser did this.  http://code.jquery.com/jquery-1.6.2.js
contains

    rscript = /<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi,

and the "i" flag handling delegates to toUpperCase per 15.10.2.8 so
would never actually match "<SCRIPT", instead matching only the
version with a dotted upper-case I.

I seem to remember that some version of Rhino did this though,
delegating both toUpperCase and toLocaleUpperCase to
java.lang.String.toUpperCase() which uses the default locale.


More information about the es-discuss mailing list