String.prototype.normalize, case folding and sort keys
cira at google.com
Fri Oct 25 15:27:55 PDT 2013
Having sort keys in the collator would allow user to be more flexible in
comparing strings, but your* approach is good enough for now.
* toUpperCase spec as it stands
2013/10/24 Mihai Niță <mnita at google.com>
> "Does this sufficiently cover the locale independent case folding use
> I think it does.
> On Wed, Oct 23, 2013 at 4:19 PM, Allen Wirfs-Brock <allen at wirfs-brock.com>wrote:
>> On Oct 23, 2013, at 3:09 PM, Nebojša Ćirić wrote:
>> String.prototype.normalize(form) spec is here -
>> string.prototype.normalize. It offers all 4 forms of normalization.
>> We did mention additional CF and CFNKFC forms for case folding, but they
>> were not added to the spec. They case fold string in a locale independant
>> way (see http://www.unicode.org/faq/casemap_charprop.html#2).
>> Should we:
>> 1. Add those two new forms to the spec of
>> String.prototype.normalize(form) method?
>> 2. Add a new String.prototype.toFoldCase(form) method?
>> 3. Add Intl.Collator.prototype.sortKey(string)->string method?
>> We could do 1 and 3, or 2 and 3, or just 3.
>> Use case would be: user inputs M words, and we would like to see if some
>> of them match N predefined words (say to trigger an action). With current
>> Intl.Collator.prototype.compare() we need MxN comparisons. With
>> toFoldCase/sortKey we would need only O(M) queries to the hash with N keys.
>> Mihai and I lean towards 3. because it gives more control to the user on
>> what you want to check. For example, it doesn't make sense to ignoreCase
>> for locales that don't have case distinction. Or user may want to preserve
>> accents in the comparison...
>> Nebojša Ćirić
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> Also see
>> In my working draft, the paragraph that immediately follows the algorithm
>> has been modified to read:
>> The result must be derived according to the *locale-insensitive* case
>> mappings in the Unicode Character Database (this explicitly includes not
>> only the UnicodeData.txt file, but also *all locale-insensitive mappings
>> in* the SpecialCasings.txt file that accompanies it).
>> This change is in response to
>> Does this sufficiently cover the locale independent case folding use case?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss