ECMAScript collation question
ecmascript at norbertlindenberg.com
Thu Aug 30 18:30:04 PDT 2012
I changed the subject because this question also affects the ECMAScript Language Specification.
Section 220.127.116.11, String.prototype.localeCompare (that), has said since ES3: "the function is required ... and to return 0 when comparing two strings that are considered canonically equivalent by the Unicode standard."
I assume this requirement goes back to Unicode Technical Standard #10, Unicode Collation Algorithm, whose conformance clause C1 says (and has said since 1999): "Given a well-formed Unicode Collation Element Table, a conformant implementation shall replicate the same comparisons of strings as those produced by Section 4, Main Algorithm. In particular, a conformant implementation must be able to compare any two canonical-equivalent strings as being equal, for all Unicode characters supported by that implementation."
How can the default behavior of ICU be reconciled with this conformance clause?
I brought up the issue of collation and normalization before, but didn't get much feedback:
On Aug 30, 2012, at 15:17 , Nebojša Ćirić wrote:
> my implementation fails this collation test:
> for this pair (a+umlaut+underdot):
> "ä\u0323", "a\u0323\u0308"
> If I turn normalization on then test passes.
> Mandatory normalization introduces higher processing cost (up to 30% slower in ICU). ICU team decided to avoid normalization for some locales where they don't expect problematic characters to occur.
> My question is, do we want normalize all strings by default or not, in compare() method? I think we said no to default normalization at one of the i18n meetings, but I am not 100% sure.
> Nebojša Ćirić
> es-discuss mailing list
> es-discuss at mozilla.org
More information about the es-discuss