Case transformations in strings

Waldemar Horwat waldemar at google.com
Mon Mar 23 16:41:04 PDT 2009


Allen Wirfs-Brock wrote:
> Any input from our other Unicode experts would be appreciated...
> 
> Here's what I found (running on Windows Vista):
> IE, FF, Opera
> "\u00DF".toUpperCase()  returns "\u00DF"
> Safari, Chrome
> "\u00DF".toUpperCase()  returns "SS"
> 
> It would be interesting if somebody could try the above for FF and Opera on a non-Windows machine to check whether this is a byproduct of using the Windows provided conversion routines.
> 
> Question 1: Is the specified length invariant essential or just noise in the ES3 spec. If it's not we could could eliminate that invariant and say that each S character is replaced in the result by the corresponding character(s) from the Unicode case mappings.
> 
> Question 2: If the observed variance is indeed a result of using the Windows mapping do we really want to require every implementation to provide its own internal mappings data and algorithms (as Safari and Chrome may be doing) if the underlying host is not fully Unicode compliant?
> 
> Question 3: Do we need to explicitly provide for some implementation variance here.  That appears to be the current reality of the web.  Do we want to try to stamp out the variance or to acknowledge and allow it.
> 
> Question 4: Is Chrome correct with: 
> "\u03A3\u03A3".toLowerCase() == σς, not σσ
> And everybody else is wrong?  This sounds like a reasonable interpretation of the explicit mention of SpecialCasing.txt in the note (but that the note is not normative). If so, should be explicit mention in step 3 that the translation must be appropriately context sensitive.
> 
> Finally, is any of the above going to actually influence anything.  If not, maybe carrying the exact ES3 specification forward is ok.

The reason the ES3 specification was the way it was is because converting one character to many during case conversions would be incompatible with regular expressions.  The regular expression algorithm refers to String.prototype.toUpperCase.

    Waldemar


More information about the Es-discuss mailing list