String.prototype.normalize, case folding and sort keys

Nebojša Ćirić cira at google.com
Fri Oct 25 15:27:55 PDT 2013


Having sort keys in the collator would allow user to be more flexible in
comparing strings, but your* approach is good enough for now.

* toUpperCase spec as it stands


2013/10/24 Mihai Niță <mnita at google.com>

> "Does this sufficiently cover the locale independent case folding use
> case?"
> I think it does.
> Mihai
>
>
> On Wed, Oct 23, 2013 at 4:19 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:
>
>>
>> On Oct 23, 2013, at 3:09 PM, Nebojša Ćirić wrote:
>>
>> String.prototype.normalize(form) spec is here -
>> http://people.mozilla.org/~jorendorff/es6-draft.html#sec-
>> string.prototype.normalize. It offers all 4 forms of normalization.
>>
>> We did mention additional CF and CFNKFC forms for case folding, but they
>> were not added to the spec. They case fold string in a locale independant
>> way (see http://www.unicode.org/faq/casemap_charprop.html#2).
>>
>> Should we:
>> 1. Add those two new forms to the spec of
>> String.prototype.normalize(form) method?
>> 2. Add a new String.prototype.toFoldCase(form) method?
>> 3. Add Intl.Collator.prototype.sortKey(string)->string method?
>>
>> We could do 1 and 3, or 2 and 3, or just 3.
>>
>> Use case would be: user inputs M words, and we would like to see if some
>> of them match N predefined words (say to trigger an action). With current
>> Intl.Collator.prototype.compare() we need MxN comparisons. With
>> toFoldCase/sortKey we would need only O(M) queries to the hash with N keys.
>>
>> Mihai and I lean towards 3. because it gives more control to the user on
>> what you want to check. For example, it doesn't make sense to ignoreCase
>> for locales that don't have case distinction. Or user may want to preserve
>> accents in the comparison...
>>
>> --
>> Nebojša Ćirić
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>> Also see
>> http://people.mozilla.org/~jorendorff/es6-draft.html#sec-string.prototype.tolowercase
>>
>>
>> In my working draft, the paragraph that immediately follows the algorithm
>> has been modified to read:
>>
>> The result must be derived according to the *locale-insensitive* case
>> mappings in the Unicode Character Database (this explicitly includes not
>> only the UnicodeData.txt file, but also *all locale-insensitive mappings
>> in* the SpecialCasings.txt file that accompanies it).
>>
>>
>> This change is in response to
>> https://bugs.ecmascript.org/show_bug.cgi?id=206
>>
>> Does this sufficiently cover the locale independent case folding use case?
>>
>> Allen
>>
>
>


-- 
Nebojša Ćirić
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131025/eb966314/attachment.html>


More information about the es-discuss mailing list