Internationalization: Normalization and canonical equivalence in string comparison
ecmascript at norbertlindenberg.com
Mon Jun 18 22:36:25 PDT 2012
The ECMAScript Internationalization API Specification currently has normalization as an optional feature in collation. However, it requires that the compare function "return 0 when comparing Strings that are considered canonically equivalent by the Unicode standard". Canonical equivalence, I thought, is usually implemented through normalization. Does it make sense to keep normalization as a separate and optional feature then? Is anybody planning to implement canonical equivalence through other mechanisms, such that the lack of normalization would be visible in the comparison of non-equivalent strings?
BTW, the requirement that canonically equivalent strings compare as equal has been part of the specification of String.prototype.localeCompare since ES3. When testing with a handful of string pairs pulled from chapter 3 of the Unicode Standard and from UTS 10, however, I found that only Opera on the Mac detects their equivalence correctly. Firefox on the Mac and the V8 systems (Chrome, Node) fail to detect any equivalence; Safari, Explorer and the Windows versions of Opera and Firefox detect some and miss others. Obviously people haven't been paying much attention to localeCompare...
More information about the es-discuss