Comments on internationalization API

Mark Davis ☕ mark at macchiato.com
Wed Jul 20 09:46:44 PDT 2011


I have comments on some of these.

Mark
*— Il meglio è l’inimico del bene —*


On Tue, Jul 19, 2011 at 01:29, Norbert Lindenberg <
ecmascript at norbertlindenberg.com> wrote:

> Hi all,
>
> I'm sorry for not having been able to contribute to the
> internationalization API earlier. I finally have reviewed the straw man [1],
> and am pleased to see that it contains a good subset of internationalization
> functionality to start with. Number and date formatting and collation are
> issues that most applications have to deal with. Collation especially, but
> also date formatting with support for multiple time zones and calendars are
> hard to implement as downloadable libraries.
>
> I have some comments on the details though:
>
> 1. In the background section, it might be useful to add that with Node.js
> server-side JavaScript is seeing a rebound, and applications don't really
> want to have to call out to a non-JavaScript server in order to handle basic
> internationalization.
>
> 2. In the goals section, I'd qualify the "reuse of objects" goal as a reuse
> of implementation data structures, or even better replace it with measurable
> performance goals. Reuse of objects that are visible to applications has
> security and privacy implications, especially when loading third party code
> (apps or ads) onto pages [2]. I'd recommend letting applications freely
> construct Collator, NumberFormat, and DateTimeFormat objects, but have these
> objects share implementation objects (such as ICU objects) as much as
> possible. If the API does return shared objects, the security issues need to
> be dealt with, e.g., by specifying that the shared objects are immutable.
>

I think it is reasonable to rephrase this as "implementation data
structures".


> 3. I'm very uncomfortable with the LocaleInfo class. It seems to pretend
> being the central source of all locale-related information, but can't live
> up to that claim because its design is limited to number and date formatting
> and collation. Developers will need to create other functionality such as
> text segmentation, spelling checking, message lookup, shoe size conversion,
> etc. LocaleInfo appears to perform some magic to derive regions, currencies,
> and possibly time zones, but doesn't specify it, and makes none of it
> available to other internationalization classes. It also does duty as a
> namespace, which looks odd in an EcmaScript standard that otherwise doesn't
> know namespaces.
>

I don't think it is ideal; I share some of your qualms about it. However, it
is what we were able to compromise on. Because the LocaleInfo class does do
the resolution, and that information is available after creation, the
information is available for other services. And people could (being ES)
hang services off of their own LocaleInfo class.


>
> Other internationalization libraries have a core that anybody can build on
> to create internationalization functionality. In Java, for example, the
> Locale and Currency classes handles a variety of identifier mappings, while
> the ResourceBundle class handles loading of localized data with fallbacks
> [3]. In the Yahoo User Interface library, the Intl module does language
> negotiation and collaborates with the YUI loader in loading localized data
> [4]. I'd suggest separating similar functionality in LocaleInfo from the
> formatting and collation functionality and making it available to all. I
> suspect though that some of the current magic will turn out to be misguided
> when looked at in the clear light of a specification and will need to be
> discarded.
>
> 4. Language IDs in the library should be those of BCP 47, not of Unicode
> LDML. The two are similar, but there are subtle differences, as described in
> the LDML spec: LDML excludes some BCP 47 tags and subtags, adds a separator
> and the root locale, and changes the semantics of some tags [5]. Since BCP
> 47 is the dominant standard for language identification, internationalized
> applications have to support it. If an implementation of the
> internationalization API is based on LDML, it should handle the mapping
> from/to BCP 47 itself rather than burdening applications with it.
>

Every LDML language ID is also a BCP 47 language tag. LDML eliminates some
of the deadwood in BCP47 (the old irregular forms) but has the same
expressive power and somewhat more. There are some codes that are not
defined in BCP47 that turn out to be very important for implementations,
like the Unknown region.

I'm well familiar with both, being an author of each.


> 5. The specification mentions that a few Unicode extensions in BCP 47
> (-u-ca-, -u-co-, can be used for specific purposes, but is silent on whether
> other extension are encouraged/allowed/ignored/illegal. This should be
> clarified.
>

Agreed. What it should add is one line saying that the implementation of any
other BCP47 extensions are implementation dependent.


>
> 6. Region IDs should be those of ISO 3166. The straw man references "LDML
> region subtags" instead; I haven't been able to find a definition of this
> term.


No. ISO 3166 IDs are notoriously badly managed; they cavalierly reuse codes
for different countries over time. That is one of the reasons why BCP47 had
to put in place a registry and mechanism for dealing with the instabilities
introduced by ISO. The LDML region subtags should be more property phrased
as "unicode_region_subtag". They are based on BCP47 but add (at the time of
this writing) 2 codes.


> If "ZZ" is really necessary for the API, then it should be called out
> directly in the API spec. But what information does "ZZ" convey that
> EcmaScript's "undefined" doesn't?
>

You can't write (de-undefined) as a valid language subtag / code.


> 7. The priority list matching algorithm is not well specified. It doesn't
> seem to match the BCP 47 Lookup algorithm however [6], and I'd expect that
> algorithm to be available at least as a baseline (enhancements might be
> offered as well).
>

That algorithm is not particularly good. It could be mentioned as one of the
possible algorithms, however.


>
> 8. The specifications of NumberFormat and DateTimeFormat list several
> optional features: Support for scientific notation in NumberFormat; support
> for various styles and skeletons in DateTimeFormat. How can applications
> find out which of these optional features are supported by an actual
> implementation?
>

I don't think there is a mechanism currently. It is a 'best effort'.


>
> 9. Currency formatting should require applications to explicitly specify
> the currency, using an ISO 4217 currency code, when constructing a currency
> number format. Currencies are really part of the value; they're not a
> presentation preference. Imagine a European e-commerce site calculating its
> prices in euro, but then displaying the values with the Korean won symbol
> just because the user configured his browser to send "Accept-Language:
> de-DE-u-cu-KRW" or ""Accept-Language: de-KR"... [7].
>

No argument there. However, applications also want to be able to access the
default currency for a given country. We tossed around different ideas for
doing that, and came up with the current mechanism.


>
> 10. Are the limits described for the NumberFormat parameters defaults or
> hard limits? It doesn't seem to make sense to impose hard limits such as
> "max 3 fraction digits, min 0".
>

That should be clarified. These are defaults, not hard limits.


>
> 11. The description of the DateTimeFormat constructors refers to
> "LocaleInfo.prototype.numberFormat".
>
> 12. DateTimeFormat needs to provide a way for applications to specify the
> time zone, identified by a tz database identifier [8]. Browser-side code may
> need this capability to enforce a site-dependent time zone (e.g., a US
> financial site has to display quotes in New York City time), while
> server-side code may have to use the user's time zone. While it's possible
> to encode the time zone as part of a language ID (e.g., "en-AU-u-tz-auldh"
> to add Australia/Lord_Howe to Australian English), languages and time zones
> are really orthogonal concepts that should be kept separate, and the tz
> database identifiers are the most widely used identifiers for time zones.
>

I firmly agree. However, the committee was split on how to do this, and
decided to do that in a follow-up.


>
> 13. DateTimeFormat also needs to let applications specify whether and how
> to include a time zone display name in the output. In CLDR, that's typically
> tied to the time style - long and full have the time zone, while short and
> medium don't. In reality, applications need to indicate the time zone to
> users if (and only if) it's not obvious from the context, and that's
> orthogonal to whether they want seconds.
>

Ditto.


>
> 14. There are a few additional DateTimeFormat skeletons that I think would
> be commonly used in applications:
> - MMMdEEE, MMMMdEEEE: month, day, weekday in either abbreviated or full
> width; intended for dates in the current year.
> - jmm: hour and minute, in 12-hour or 24-hour format as appropriate for the
> locale.
> - jjjmmm: hour and minute, and if necessary am/pm, but with the appropriate
> characters for hour and minute rather than a colon in languages where that's
> commonly used, such as Chinese/Japanese/Korean: 오후 11시 5분. Falls back to jmm
> in other languages.
> - z, zzzz: time zone names.
> Other notes:
> - yyyyMMMMd, "era only if necessary": should explain what that means, e.g.,
> "era only for those calendars that need eras in order to uniquely identify
> all years after 1900".
> - It must be possible to combine skeletons for date, time, and time zone
> (at most one each).
>

Agreed, but we were just able to agree on a core set. Others could be
supplied, but the result would be a 'best-effort' according to the
implementation.


>
> 15. It seems that the correct handling of missing dateStyle or timeStyle
> parameters would be to omit the date or time from the formatted output.
>

I agree, I think we should fix that.


>
> 16. DateTimeFormat.prototype.getAmPm is described as "array of eras".
> Beyond that typo, is this function really useful, given that many locales
> don't have am/pm strings, and LDML has deprecated the corresponding element?
>

am/pm is still used in LDML; there is just an alternate element that is
preferred (dayPeriods). However, I think the result should be a map, eg
var am = x.getAmPm()["am"]



> 17. Error handling needs to be specified in detail. I assume this will be
> done once the functionality is settled, so I won't go into much detail now.
> However, contrary to the current statement "invalid language ids or
> non-string elements should be ignored" (in priority lists), I think the
> library should throw errors for erroneous input. Language tags should at
> least be String objects and well-formed according to BCP 47 [9]. Similarly,
> an exception should be thrown if some value other than a Date object is
> passed into DateTimeFormat.prototype.format. Note that exceptions in
> EcmaScript do not oblige the direct caller to use try/catch - they're like
> unchecked exceptions in Java.
>

The group debated how to handle exceptions; there are pluses and minuses to
using a 'best-effort' approach vs throwing an exception. The feeling I got
was that people are generally less in favor of exceptions if there can be a
graceful recovery.


>
> 18. I know there has been a proposal for and discussion of MessageFormat
> functionality - is there a record of why it got removed from the strawman?
>

Again, there was not agreement, and so we postponed it.

>
>
> References:
>
> [1] http://wiki.ecmascript.org/doku.php?id=strawman:i18n_api, version
> 2011-07-01.
> [2] http://code.google.com/p/google-caja/wiki/GlobalObjectPoisoning
> [3]
> http://download.oracle.com/javase/6/docs/technotes/guides/intl/overview.html#locale
> [4] http://developer.yahoo.com/yui/3/intl/
> [5]
> http://unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers
> [6] http://tools.ietf.org/html/rfc4647#section-3.4
> [7] http://finance.yahoo.com/currency-converter/?amt=1&from=EUR&to=KRW
> [8] http://www.twinsun.com/tz/tz-link.htm
> [9] http://tools.ietf.org/html/rfc5646#section-2.2.9
>
> Best regards,
> Norbert
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110720/506a6478/attachment-0001.html>


More information about the es-discuss mailing list