Case transformations in strings
Allen.Wirfs-Brock at microsoft.com
Thu Mar 5 22:20:21 PST 2009
>From: es-discuss-bounces at mozilla.org [mailto:es-discuss-
>bounces at mozilla.org] On Behalf Of James Graham
>A further question concerns characters with context-sensitive case
>mappings. Are implementations expected to apply the context-sensitive
>case transformation or act as if each character appeared in isolation?
>For example with Greek capital letter sigma, SpecialCasings.txt
The NOTE following toUpperCase (188.8.131.52) says:
Because both toUpperCase and toLowerCase have context-sensitive behaviour, the functions are not symmetrical. In other words, s.toUpperCase().toLowerCase() is not necessarily equal to s.toLowerCase().
This text is a carry over from ES3 and would seem to imply that context sensitive processing is expected.
On an related issue, I'm starting to worry that the current specification of both toUpperCase and toLowerCase is problematic given the Unicode related changes in the ES3.1 spec. that essentially say that strings contain 16-bit Unicode code units (not "Unicode characters" or code points) and that any UTF-16 interpretation of such strings/code units is left to application code. The algorithm step:
3. Let L be a string of the same length as S where each character of L is either the Unicode lowercase equivalent of the corresponding character of S or the actual corresponding character of S if no Unicode lowercase equivalent exists.
seems inadequate in that context. Don't we need to either say that for the purposes of this translation that the string elements need to be treated as 16-bit truncated code point values or alternatively we might say that for the purposes of these operations the string needs to be interpreted assuming UTF-16 encoding? (For the first alternative, I'm guessing that there aren’t any toUpper/toLower Unicode transformations that require the 16-bit to/from >16-bit code point translations.)
More information about the Es-discuss