String.prototype.normalize, case folding and sort keys

Allen Wirfs-Brock allen at wirfs-brock.com
Wed Oct 23 16:19:27 PDT 2013


On Oct 23, 2013, at 3:09 PM, Nebojša Ćirić wrote:

> String.prototype.normalize(form) spec is here - http://people.mozilla.org/~jorendorff/es6-draft.html#sec-string.prototype.normalize. It offers all 4 forms of normalization.
> 
> We did mention additional CF and CFNKFC forms for case folding, but they were not added to the spec. They case fold string in a locale independant way (see http://www.unicode.org/faq/casemap_charprop.html#2).
> 
> Should we:
> 1. Add those two new forms to the spec of String.prototype.normalize(form) method?
> 2. Add a new String.prototype.toFoldCase(form) method?
> 3. Add Intl.Collator.prototype.sortKey(string)->string method?
> 
> We could do 1 and 3, or 2 and 3, or just 3.
> 
> Use case would be: user inputs M words, and we would like to see if some of them match N predefined words (say to trigger an action). With current Intl.Collator.prototype.compare() we need MxN comparisons. With toFoldCase/sortKey we would need only O(M) queries to the hash with N keys.
> 
> Mihai and I lean towards 3. because it gives more control to the user on what you want to check. For example, it doesn't make sense to ignoreCase for locales that don't have case distinction. Or user may want to preserve accents in the comparison...
> 
> -- 
> Nebojša Ćirić
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

Also see http://people.mozilla.org/~jorendorff/es6-draft.html#sec-string.prototype.tolowercase 

In my working draft, the paragraph that immediately follows the algorithm has been modified to read:

The result must be derived according to the *locale-insensitive* case mappings in the Unicode Character Database (this explicitly includes not only the UnicodeData.txt file, but also *all locale-insensitive mappings in* the SpecialCasings.txt file that accompanies it).

This change is in response to https://bugs.ecmascript.org/show_bug.cgi?id=206 

Does this sufficiently cover the locale independent case folding use case?

Allen 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20131023/c0f19b60/attachment.html>


More information about the es-discuss mailing list