Case transformations in strings
jgraham at opera.com
Tue Mar 3 10:03:04 PST 2009
The March 2nd draft has this to say about String.prototype.toLowerCase:
"The following steps are taken:
1. Call CheckObjectCoercible passing the this value as its argument.
2. Let S be the result of calling ToString, giving it the this value as
3. Let L be a string of the same length as S where each character of L
is either the Unicode lowercase
equivalent of the corresponding character of S or the actual
corresponding character of S if no
Unicode lowercase equivalent exists.
4. Return L.
The result should be derived according to the case mappings in the
Unicode character database (this
explicitly includes not only the UnicodeData.txt file, but also the
SpecialCasings.txt file that
accompanies it in Unicode 2.1.8 and later)."
The other algorithms such as string.prototype.toUpperCase then refer to
this one. However, afaict, the statement that L is the same length as S
is incorrect for many of the mappings listed in SpecialCasings.txt. An
obvious example is is the German lowercase sharp character under
"\u00DF".toUpperCase() == "SS"
If the intention is that these characters where the string changes
length are to be mapped to themselves then the note should explicitly
mention this. However since returning a string of a different length
seems to already be supported in several implementations, it would be
disappointing if this was the intent.
A further question concerns characters with context-sensitive case
mappings. Are implementations expected to apply the context-sensitive
case transformation or act as if each character appeared in isolation?
For example with Greek capital letter sigma, SpecialCasings.txt suggests:
"\u03A3\u03A3".toLowerCase() == σς, not σσ
V8 is the only implementation I tested that agreed with
SpecialCasings.txt here. It would be useful if the spec was explicit
about what should happen in these cases.
More information about the Es-discuss