Globalization API Feedback - moar!

Shawn Steele Shawn.Steele at microsoft.com
Mon Nov 28 14:10:00 PST 2011


Re: “Changes” I’d like to also include “Changes between users”.  Some users may prefer variants that aren’t normal for their language for whatever reason.  So a system that is nominally based on ICU or another consistent behavior could still show variations if users are allowed to select their own preferences.

-Shawn

From: es-discuss-bounces at mozilla.org [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Mark Davis ?
Sent: Monday, November 28, 2011 1:13 PM
To: Nebojša Ćirić
Cc: es-discuss at mozilla.org
Subject: Re: Globalization API Feedback - moar!

Some feedback on the API. This is a bit of stream-of-consciousness response, but figured it would be better to get it out than to delay & clean it up.

The internationalization issues that people may not be used to are:

  *   Big data requirements. A collation sequence for Chinese, for example, is quite large.
  *   Changes over time. There are improvements all the time. In CLDR, for example, there is an increase in data by typically 30-50% each year. These can be additions of data for the less-well supported languages (say, Uzbek), or fixes in data.
  *   Changes between platforms. The collation for German on an iPhone (which uses ICU) may differ from one for German on Windows 8, yet both can be completely satisfactory for German users. That may be because the characters that differ (say punctuation) have no fixed user expectations among Germans, or it may be that there are well known acceptable alternatives (phonebook vs dictionary sorting).
  *   Variants. Implementations typically support a main language (such as French or Uzbek), with deltas for some set of variants (Canadian French, Belgian French, ...  Cyrillic Uzbek, Arabic Uzbek,...). But the exact set depends on the implementation (and version). Especially in the case of variants, one service (eg collation) might have no difference between a variant and the principal language, while another service (date formatting) might have a significant difference.
  *   Best Fit. For the majority of implementations, it is far better to return a "best match" than wrong language. So if the request is for French (Canadian) collation, and the best available is French Belgian, then it is best to return that (rather than some system default, like Japanese). However, the caller may need to know exactly what the fallback was, in case some actions do need to be taken.
  *   Initialization overhead. For many of the i18n services, but especially collation, there is a need for individual comparisons to be as fast as possible. The actual mechanics of how to do this across languages are far more complicated than most people realize, so typically you build a service object that allows you to do the fastest job for the given set of options. When no more operations will be done with the service object, it can be tossed. That way the caller can determine the appropriate time to jettison the object. Think of it, if you will, like how a file system works. Typically you do something like

     *   file = open(name);
     *   while (true) {handle(file.readByte());}
     *   file.close();
An alternative would be not having open/close calls, and depending on the OS caching which files are open or not.
     *   while(true) {handle(readByte(name));}
But that would be very cumbersome to support in practice.
   Locale.isLocaleSupported(code)

For these reasons, isLocaleSupported doesn't really work right. There can be different levels of support for different services, and even for the same service I can get back a result which isn't precisely what I asked for, but is sufficient for my web application. For example, if I were to create a given service (a collator for example) for a given locale (say German for Austria), with a given set of parameters (such as phonebook order), I might not get an exact match for what I requested: I could get a collator for German for Germany with phonebook order, or a collator for German for Austria with dictionary order. That's why the current API returns not only the service, but also the set of parameters for the best fit match.

It would have to be something like

resultingOptions = Locale.getCollationSupport(options)
or
resultingOptions = Locale.getSupport("collation", options)

That would have to go through the logic for figuring out the best match for the options that you would use when creating the service, but then just not create the actual service.

   locale.compare("foo", "bar");

This can be done, but means that for performance, internally there will be a service object for collation that needs to be cached and managed.  Or maybe performance isn't a concern in an EcmaScript context.

Mark
— Il meglio è l’inimico del bene —

[https://plus.google.com/114199149796022210033]


On Mon, Nov 28, 2011 at 17:15, Nebojša Ćirić <cira at google.com<mailto:cira at google.com>> wrote:
It's my fault - I read "discussed with Norbert" as if you and Norbert agreed on this approach.

We are essentially choosing beween two approaches:

1. Create an object, query its properties, use object methods (original approach)
2. Create Locale object, call various methods like isSupportedXXX, if service is supported use formatYYY or compare

They are very similar in what the user has to do (query capabilites, call formatters) to get the final result, so I don't think we would lose any functionality picking any of those.

With 1. we might end up having lots of isSupportedDate, isSupportedNumber... after couple of iterations of adding new features (like segmentation, calendars, spell check...), but if we follow the same naming style I don't see a problem with that.

As for the namespace issue, I don't see much difference between Locale and Globalization. We discussed Modules on the other thread and came up with:

Object.system.load('@g11n', callback()) {}

or synchronous call

var global = Object.system.load('@g11n') {
  return __Globalization__;
}

This would become module global import '@g11n' in the future. This approach eliminates the need of finding proper name for the namespace (and possibly for Locale()?).

Thank you for helping out. Your proposal is exactly what we need at this point - to help us refine our work and make it palatable to TC39 members :).

24. новембар 2011. 16.47, Nicholas C. Zakas <standards at nczconsulting.com<mailto:standards at nczconsulting.com>> је написао/ла:

Again, my apologies - I didn't mean to imply that Norbert agreed with any of this, just that a few ideas have been more solidified in my mind after speaking with him.

As a web developer who has built large-scale web sites that have been internationalized to dozens of countries, my main purpose in contributing to this discussion is to provide feedback on what I would have liked to see in such an API to make it useful to me.

The current proposal doesn't feel very JavaScript-like, and so I've been trying to offer alternatives that make it more JavaScript-like and, therefore, more likely to be used by more developers. I'm a bit concerned that design decisions seem to have been guided by considering the most complex use cases instead of the most common.

It is my opinion (and I can only speak for myself) that a single object to encompass  would represent a better API for JavaScript than adding a namespace, which hasn't been done to this point, and several new types, all of which just do one thing. That's a very Java-like approach, and I think JavaScript deserves better.

As I told Norbert, I'm very happy to lend my experience and insights to this process. I realize I may end up bringing things up that you all have discussed before - but considering that you did have a single Locale object at one point, I'd like to claim "great minds" think alike and continue discussing it. :)

Happy Thanksgiving.

-Nicholas



On 11/24/2011 2:44 PM, Norbert Lindenberg wrote:
I didn't agree with this approach, and Nicholas didn't claim that I did :-)

I'm very glad though that Nicholas is taking the time to provide feedback, come up with his own ideas, and discuss them with us. In the end, the Globalization API can only be successful if people like him are comfortable using the API in their projects, and explain it to others so that they're comfortable doing so. Right now, the feedback from him, Rick, several TC39 members, and others indicates significant discomfort, so we have some work to do. Some of that work may be changes to the API, but some may also be better explanation of how to use the API, directly from applications or in higher-level libraries.

More after Thanksgiving.

Norbert


On Nov 23, 2011, at 15:15 , Nebojša Ćirić wrote:

23. новембар 2011. 14.32, Nicholas C. Zakas<standards at nczconsulting.com<mailto:standards at nczconsulting.com>>  је написао/ла:
On 11/23/2011 12:57 PM, Nebojša Ćirić wrote:
Similar approach was proposed (with locale as a top object, others under it) and I have nothing against it, but there are some issues with your approach:

(code == localeID)
Sorry for being unclear - I didn't intend for this to be a complete alternate proposal, just a starting point. There are definitely still issues that would have to be resolved.

I just feel we are going in circles sometimes :). I am surprised Norbert agreed with this approach - I think he was against top level Locale object.
1. An implementation may support NumberFormat for localeID x, but not support DateFormat for x (it would need to do a fallback to less specific one, or default). That's why we have supportedLocaleOf method on each object.
So what you're saying is that there needs to be some way to feature detect support for number and date formats separately. That could be handled in any number of ways. One that pops to mind would be isDateFormatSupported()/isNumberFormatSupported() as an instance method.

That would probably work. We could add more methods in the future - say one that tells you closest locale to the current one that does support service in question.
2. How do you convey status of option/locale resolution to the developer? Which options were resolved and to what value (say I ask for 'islamic' calendar, but we only have 'gregory' for a given locale). In our current proposal we expose resolvedOptions accessor on i.e. DateTimeFormat object instance that has 'calendar' property, so a developer can decide what to         do.
Thanks, I was having trouble understanding what resolvedOptions was used for. Could the use case be handled by having a similar object on a Locale instance? It seems like you could include options for available calendars and anything else that developers could query against, such as:


    var locale = new Locale();
    if (locale.supportedOptions.islamicCalendar){
        //foo
    }

You could also go a more traditional direction (at least in terms of DOM objects), by doing something like:

    Locale.CALENDAR_ISLAMIC = 1;
    Locale.CALENDAR_GREGORIAN = 2;


    var locale = new Locale();
    locale.isSupported(Locale.CALENDAR_ISLAMIC);

I think feature detection is an easily solved problem if everything else is in place.

Sometimes options can influence each other. For example:

1. Ask for th locale (Thai)
2. There are two calendars available - buddhist and gregory
3. There are two numbering systems available - thai and latin

but only buddhist + thai and gregory + latin combinations are supported.

If you ask locale.isSupported('calendar': 'buddhist') you'll get true. If you ask locale.isSupported('numberingSystem', 'latin') you'll get true again. If you try to format date using that combination (thai + latin) you'll get something you didn't expect.

I would propose sligthly different isSupported method:

locale.returnSupported(serviceName, options), where serviceName is one of 'dateFormat', 'numberFormat', 'collator', options object contains requested settings (calendar, numbering system, collation options...) and method returns the object with supported features for a given service.
3. This approach would require internal caching of collator/dateformatter/numberformatter objects.
That's an implementation detail. I'm more interested in defining an usable and relatively intuitive API before worrying about optimization.

I agree, it's implementation detail, but you will need to pass format parameter to each call of .format() method. I don't think that's a big problem. Also I would move format and option parameters to the last position (can be optional), so that user can specify only value and rely on defaults.
23. новембар 2011. 12.09, Nicholas C. Zakas<standards at nczconsulting.com<mailto:standards at nczconsulting.com>>  је написао/ла:
After meeting with Norbert to discuss the use cases and design decision rationale, I've come to a different understanding of the goals of the globalization API. Some things I learned:

1. Augmenting native types with some default locale support may be dangerous. Consider the case where a single web page displays two modules with different locales. Which one wins? Therefore, "default" locale behavior for native types is impractical.
2. Locale information is most frequently used for formatting numbers and dates as well as comparing strings. The locale information doesn't permeate the entire execution context.
3. Developers are likely to want to define locale information once and then reuse that multiple times through a script.

Given this, I'd like to propose an alternate approach to the one currently taken in the API and also different from my initial email. It goes like this:

Have a single, top-level type called Locale defined as:

   function Locale(code){

       //whatever has to happen to process the code

       this.code = code;
   }

   /*
    * Determine if a locale is supported.
    * @param code The code to check.
    * @return True if supported, false if not.
    */
   Locale.isLocaleSupported = function(code){
       ...
   };

   /*
    * Replaces supportedLocalesOf
    * @param code The code to check.
    * @return Array of supported locales.
    */
   Locale.getSupportedLocales = function(code){
       ...
   };

   /*
    * Replaces Globalization.Collator
    * @param a The first item.
    * @param b The second item.
    * @param options (Optional) The options to use when comparing.
    * @return -1 if a comes before b, 0 if they're equal, 1 otherwise
    */
   Locale.prototype.compare = function(a, b, options){
       ...
   };

   /*
    * Replaces Globalization.NumberFormat
    * @param format A pattern format string for outputting the number.
    * @param value The value to format.
    * @return The number formatted as a string.
    */
    Locale.prototype.formatNumber = function(format, value){
       ...
   };

   /*
    * Replaces Globalization.DateFormat
    * @param format A pattern format string for outputting the date.
    * @param value The date to format.
    * @return The number formatted as a string.
    */
   Locale.prototype.formatDate = function(format, value){
       ...
   };

You would then be able to create a single Locale instance and have that be used in your script. If the constructor is used without an argument, then default locale information is used:

   var locale = new Locale();

If you provide a code, then that is used:

   var locale = new Locale("en-us");

If you provide multiple codes, then the first supported one is used:

   var locale = new Locale(["en-us", "en-gb"]);

Then, you can use that locale information for the other operations you want to do:

   locale.formatDate("DMYs-short", new Date());
   locale.formatNumber("##.##", 55);
   locale.compare("foo", "bar");

By the way, not saying this is the format pattern string that should be used, it's just for discussion.

I like having a single object to deal with instead of multiple for everything the API is trying to do. It seems a lot more intuitive than needing to manage a LocaleList that is passed into new instances of NumberFormat and DateFormat all the time (that's a bunch of housekeeping for developers).

Thoughts?

-Nicholas




On 11/21/2011 11:12 AM, Nicholas C. Zakas wrote:
As promised, more verbose feedback for the Globalization API. My general feeling is that the API is overly verbose for what it's doing. I'll state my bias up front: I'm not a fan of introducing a bunch of new types to handle formatting. I'd much rather have additional methods that perform formatting on existing objects. My feedback is mostly about eliminating the new constructors - which has an added bonus of eliminating the Globalization namespace because there would be only one constructor left: Collator.

1. LocaleList

I'm not sure why this type is necessary. I don't believe that locale resolution is an expensive operation, and even if it is, I'd expect the implementation to cache the results of such resolution for later use. I'd just leave this as an internal construct and instruct developers to use arrays all the time.

2. supportedLocalesOf

I find this method name strange - I've read it several times and am still not sure I fully understand what it does. Perhaps "getSupportedLocales()" is a better name for this method? (I always prefer methods begin with verbs.)

3. NumberFormat

Number formatting seems simple enough that it could just be added as a series of methods on Number.prototype. The three types of formatting (currency, decimal, percent) could each have their own method. Currency formatting has relatively few options to specify, so it's method can be:

   /*
    * Formats the number as if it were currency
    * @param code Currency code, e.g., "EUR"
    * @param type (Optional) The way to format the currency code, "code", "symbol" (default),
    * @param locales - (Optional) Array of locales to use.
    */
   Number.prototype.toCurrencyString = function(code, type, locales) {
       ...
   };

   var num = 500;
   console.log(num.toCurrencyCode("EUR", "code"));    //"EUR 500.00"


Decimal and percent formatting options are slightly different in that they include significant digits options. For that, I prefer to use a formatting string rather than the multitude of optional properties as currently defined (see http://www.exampledepot.com/egs/java.text/FormatNum.html). The formatting string indicates must-have digits as 0 and optional digits as #, allowing you to very succinctly specify how you want your number to be output. For example:

   /*
    * Formats the number as a decimal string.
    * @param format Format string indicating max/min significant digits
    * @param locales (Optional) Array of locales to use.
    */
   Number.prototype.toDecimalString = function(format, locales){
       ...
   };

   /*
    * Formats the number as a percent string.
    * @param format Format string indicating max/min significant digits
    * @param locales (Optional) Array of locales to use.
    */
   Number.prototype.toPercentString = function(format, locales){
       ...
   };

   var num = 1234.567;
   console.log(numtoDecimalString("000##.##")); "01234.57"

4. DateTimeFormat

As with NumberFormat, it seems like this could more succinctly be implemented as a method on Date.prototype. As its easiest:

   /*
    * Format a date
    * @param options The already-defined options for DateTimeFormat
    * @param locales (Optional) Array of locales to use.
    */
   Date.prototype.toFormatString = function(options, locales){
       ...
   };

In an ideal world, I'd like to see options overloaded so it can be an options object as specified now or a formatting string. I understand that there was a sentiment against formatting strings due to their limitations and edge case errors. However, I'd like to point out that any internationalized web application is highly likely to already be using formatting strings for dates, since this is pretty much how every other language handles date formatting. That means supporting format strings in JavaScript would allow application developers to reuse the settings they already have. As it stands now, you'd need to create two different ways of formatting dates for a web app: one for your server-side language and one for your client-side language (until the day everything is running on Node.js, of course). I'd prefer my client-side code to reuse settings and configuration that the server-side code uses, otherwise I end up with two very different pieces of code doing the exact same thing, and there be dragons.

-Nicholas

_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org<mailto:es-discuss at mozilla.org>
https://mail.mozilla.org/listinfo/es-discuss



--
Nebojša Ćirić



--
Nebojša Ćirić
_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org<mailto:es-discuss at mozilla.org>
https://mail.mozilla.org/listinfo/es-discuss




--
Nebojša Ćirić

_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org<mailto:es-discuss at mozilla.org>
https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20111128/da8907f4/attachment-0001.html>


More information about the es-discuss mailing list