Internationalization API issues and updates

Phillips, Addison addison at lab126.com
Tue Apr 17 12:20:30 PDT 2012


A few comments follow.

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.



On Mar 26, 2012 4:59 PM, "Norbert Lindenberg" <ecmascript at norbertlindenberg.com<mailto:ecmascript at norbertlindenberg.com>> wrote:
While everybody is reviewing the draft specification of the ECMAScript Internationalization API [1] in preparation for this week's TC 39 meeting, here are a few issues that have come up, with proposed resolutions:


Issue 1, IsWellFormedLanguageTag (6.2.2), raised by Allen:

The specification referenced here, RFC 5646 section 2.1, says nothing about the case of duplicate extension subtags (example: the duplicate -u- in "de-u-nu-latn-u-ca-gregory"), although RFC 5646 section 2.2.9 and section 3.7 say tat duplicate extension subtags are invalid. The ResolveLocale abstract operation (Globalization API, 9.2.1) will only consider the first extension subtag sequence it sees, and ignore others, without giving applications any hint as to what's going on.

AP> Probably a reasonable way to handle that. I assume that you actually mean, btw, “ResolveLocale … will only consider the first extension subtag sequence it recognizes”. E.g. it would ignore the newly-added “-t-“ extension and find the “-u-“ in a tag like “en-us-t-something-u-ca-gregori”


Should IsWellFormedLanguageTag be enhanced to check for duplicate extension subtags? And then probably also duplicate variant subtags?

AP> Strictly speaking, according to BCP 47, the term “well-formed” encompasses *only* the check against the ABNF, which does not include these checks.


The one thing I'm sure we don't want is validation against the IANA Language Subtag Registry.

AP> That’s “valid”, not “well-formed” in BCP 47 parlance. I agree with this.


My proposed resolution: Add checking for duplicate extension subtags and duplicate variant subtags, and throw exception if they exist.

AP> … and actually I agree with this. Structural validity checking seems like a reasonable and useful addition.



Issue 2, CanonicalizeLanguageTag (6.2.3), raised by Allen:

The spec used to say (before February 23): "Implementations are allowed, but not required, to also canonicalize each extension subtag sequence within the tag according to the canonicalization specified by the standard registering the extension, such as RFC 6067 section 2.1.1."

Allen points out that the result is visible to ECMAScript code, and that this is the sort of situation were TC39 prefers to mandate a consistent result across all implementations.

Counterarguments to requiring extension subtag sequence canonicalization:
1. New extensions are being defined that implementations may not know about (and have no need to know about).
2. For the extension that this API cares about, the -u- extension, a comparison of language tags as complete strings isn't very useful because different functionality cares about different extension keys - Collator about -co- and a few others, NumberFormat about -nu-, and DateTimeFormat about -ca-. ResolveLocale picks out the extension keys that are relevant for its caller.

Note that canonicalization according to BCP 47 is mandatory; only the additional rules created by extension specifications are currently optional.

AP> Good.


My proposed resolution: Clarify that the quoted statement is only about canonicalization rules that go beyond those of BCP 47; don't change the behavior. The new wording in the February 23 draft is:
"The specifications for extensions to BCP 47 language tags, such as RFC 6067, may include canonicalization rules for the extension subtag sequences they define that go beyond the canonicalization rules of RFC 5646 section 4.5. Implementations are allowed, but not required, to apply these additional rules."

AP> Good. Note that the BCP 47 rules cannot be in conflict with the rules in extensions. For example, the canonical form is always lowercase for extensions.


Issue 3, InitializeDateTimeFormat and ToDateTimeOptions (13.1.1), raised by Nebojša:

The spec doesn't allow the creation of a formatter with time elements only. InitializeDateTimeFormat calls ToDateTimeOptions with arguments true for date and false for time. Specifying time elements (hour, minute, second) won't trigger needDefault = false since time = false (step 6.), so we go to the section that defines date properties (step 7). You end up having date elements you didn't ask for in addition to any time elements you did ask for.

My proposed resolution: Replace the date and time parameters of ToDateTimeOptions with:
- required: which component groups are required, values "date", "time", "any"
- defaults: which component groups should be filled in if required components aren't there, values "date", "time", "all".
Update the ToDateTimeOptions algorithm as well as the calls to it from InitializeDateTimeFormat and Date.prototype.toLocale(|Date|Time)String accordingly.

AP> Excellent!


Issue 4, InitializeDateTimeFormat (13.1.1), raised by me:

The algorithm looks for a property "formatMatch" in the options argument. The name agreed on in the November 15 meeting of the internationalization team was "formatMatcher", and there is a parallel property "localeMatcher".

My proposed resolution: Rename the property to "formatMatcher". Rename the *Match abstract operations in parallel.


Regards,
Norbert

[1] http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts

_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org<mailto:es-discuss at mozilla.org>
https://mail.mozilla.org/listinfo/es-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20120417/0420889d/attachment.html>


More information about the es-discuss mailing list