ECMAScript collation question

Norbert Lindenberg ecmascript at
Thu Aug 30 18:30:04 PDT 2012

I changed the subject because this question also affects the ECMAScript Language Specification.

Section, String.prototype.localeCompare (that), has said since ES3: "the function is required ... and to return 0 when comparing two strings that are considered canonically equivalent by the Unicode standard."

I assume this requirement goes back to Unicode Technical Standard #10, Unicode Collation Algorithm, whose conformance clause C1 says (and has said since 1999): "Given a well-formed Unicode Collation Element Table, a conformant implementation shall replicate the same comparisons of strings as those produced by Section 4, Main Algorithm. In particular, a conformant implementation must be able to compare any two canonical-equivalent strings as being equal, for all Unicode characters supported by that implementation."

How can the default behavior of ICU be reconciled with this conformance clause?

I brought up the issue of collation and normalization before, but didn't get much feedback:


On Aug 30, 2012, at 15:17 , Nebojša Ćirić wrote:

> Hi,
>  my implementation fails this collation test:
> intl402/ch10/10.3/10.3.2_CE.js
> for this pair (a+umlaut+underdot):
> "ä\u0323", "a\u0323\u0308"
> If I turn normalization on then test passes.
> Mandatory normalization introduces higher processing cost (up to 30% slower in ICU). ICU team decided to avoid normalization for some locales where they don't expect problematic characters to occur.
> My question is, do we want normalize all strings by default or not, in compare() method? I think we said no to default normalization at one of the i18n meetings, but I am not 100% sure.
> -- 
> Nebojša Ćirić
> _______________________________________________
> es-discuss mailing list
> es-discuss at

More information about the es-discuss mailing list