Case transformations in strings

James Graham jgraham at
Tue Mar 3 10:03:04 PST 2009

The March 2nd draft has this to say about String.prototype.toLowerCase:

"The following steps are taken:
1.  Call CheckObjectCoercible passing the this value as its argument.
2.  Let S be the result of calling ToString, giving it the this value as 
its argument.
3.  Let L be a string of the same length as S where each character of L 
is either the Unicode lowercase
     equivalent of the corresponding character of S or the actual 
corresponding character of S if no
     Unicode lowercase equivalent exists.
4. Return L.
The result should be derived according to the case mappings in the 
Unicode character database (this
explicitly includes not only the UnicodeData.txt file, but also the 
SpecialCasings.txt file that
accompanies it in Unicode 2.1.8 and later)."

The other algorithms such as string.prototype.toUpperCase then refer to 
this one. However, afaict, the statement that L is the same length as S 
is incorrect for many of the mappings listed in SpecialCasings.txt. An 
obvious example is is the German lowercase sharp character under 

"\u00DF".toUpperCase() == "SS"

If the intention is that these characters where the string changes 
length are to be mapped to themselves then the note should explicitly 
mention this. However since returning a string of a different length 
seems to already be supported in several implementations, it would be 
disappointing if this was the intent.

A further question concerns characters with context-sensitive case 
mappings. Are implementations expected to apply the context-sensitive 
case transformation or act as if each character appeared in isolation? 
For example with Greek capital letter sigma, SpecialCasings.txt suggests:

"\u03A3\u03A3".toLowerCase() == σς, not σσ

V8 is the only implementation I tested that agreed with 
SpecialCasings.txt here. It would be useful if the spec was explicit 
about what should happen in these cases.

More information about the Es-discuss mailing list