[Go Faster] follow up from our l10n requirement discussion...

Axel Hecht axel at mozilla.com
Thu Sep 3 16:48:19 UTC 2015

Let me start with a thing that I don't think we've said so far.

      Localization of Firefox needs to change!

Put this in a blink tag, animated CSS gradients, even a marquee tag is OK.

The question is not if, but where we want to get to.

We do have an extensive list of good parts about our current ecosystem. 
But we also have an extensive list of bad parts. The go-faster 
conversations have got me thinking, and some ideas are slowly 
condensing. I'm going to pick Mossop and rnewman as my architects of 
choice. I actually revamp a few assertions in our l10n life over the 
past few days, and I'll hit those architects as well a few key 
localizers in the first rounds of feedback. I'll keep this group posted.

On 8/31/15 11:13 PM, Toby Elliott wrote:
> Sorry, I was on vacation and mostly away from emails for a few days. It's also why I haven't had a chance to update the document; will be doing so in the next few days and will let you know when the next revisions are done.
> Some inline notes:
> On Aug 28, 2015, at 11:00 AM, Chris Hofmann<chofmann at mozilla.com>  wrote:
>> Just a couple of points I think we went over and I think we agreed where
>> important as part of those requirements and probably should be included
>> in the doc somewhere.
>> 1) the localization editing tools should be as agnostic as possible of
>> backend storage repository.  That makes changing out repositories as
>> painless as possible
> Agree and I need to update to reflect this. I think there's actually an interesting model where Pontoon maintains a core database and exports agnostically, but that should be invisible for all intents and purposes and not relevant to this question. Certainly, the goal is to be as flexible as possible, as we're likely to be driven by Dave Townsend's needs as he tackles this challenge in Firefox.
A core database doesn't help. We have years of experience with those. If 
someone or even I is interested in data that's in our database, the 
"easiest" way to get it is to ssh into the production server, and

    ./manage.py shell

and then query things. Of course, that only works on systems that I 
wrote myself, as I know what's where, and also only on the boxes that I 
have access to.

The databases in pontoon and pootle today carry golden nuggets. And I 
can't discover them, because I don't have access, and I don't know the 

Our source code is in the open, and we need to get the gold out of 
columns in databases into public repos.

This requires making exports timely, and to make them such that 
attribution works. "Timely" here means that we need to enable localizers 
to get translations done and tested, fixed and verified within the 
timeframe that go-faster sets.
Also, taking contributions from external sources through VCS is 
important. Depending on Timing, this includes figuring out a merge 
>> 2) Localizers will always want to choose from a variety of editing tools
>> that match the way they want to work.  Forcing standardization or only
>> one tool would be similar to forcing code developers to work using VI,
>> or forcing them to work using EMACs.  Design choices that allow either
>> are are better.
> Ironically, I think this goal is somewhat in contradiction to the last goal. Vi and Emacs can coexist because their fundamental output - a text file - is the same. The advantage of a single tool is that it allows for simpler backend abstraction, as you only have to update one tool to completely revamp the underlying infrastructure. This analogy seems more akin to saying that we don't want to force developers to only use python, or perl, or whatever language. Sure, you can process logs in all of them, and there are definite advantages in terms of developer comfort, but it comes at the cost of shared tooling infrastructure and reuse, new-coder onboarding, and a bigger firedrill when the log format changes.
There's no tool that does all the things. Hammers are hammers, 
screwdrivers are drinks.

We're coming across developers doing unknown things on a regular basis. 
You can't change your database schema at that point, that schema is just 
going to be in the way then.

Also note that things like l20n don't have any answers in terms of 
schemas yet. It's not clear if SQL will ever be a fit for the things 
that localization needs to excel in Quality.

It's great to make tools great at what they're great at. Restricting us 
to just one thing is in the way of progress.

Not that I don't understand where this is coming from. It's actually 
obvious, as any of our tools that currently exist restrict anyone that 
want's to contribute to that one tool. This is what we call 
interoperability, or the lack thereof.

There's opportunity to get more tools to be interoperable.

Then we'll get localizers to use the tool that's good at the job they're 
interested in right now.

If we're just focusing on one tool, they're using a tool looking for a job.

> That being said, it's unlikely that were going to take steps to preclude someone from doing it their own way.
>> 3) In fact most of the larger and top 20 fast moving locales (like IT,
>> FR, RU...) that localize any new sting change on M-C every night are
>> using direct access to HG and utf-enabled text editors.
> OK. Is an hg account a prerequisite for suggesting string translations?
mercurial/git are distributed version control systems, nobody needs an 
account anywhere to commit.

The question here is one of discoverability, and learning and mentoring. 
And one of taking contributions, much more than one of contributing.
>> Given all three of these points, and probably a few more, these
>> requirements following become critial:
>> 1) The localization system should mimic and plug into all the current
>> ways that localization happens to enable the greatest participation
>> that includes
>> a) pluggable to dashboards likehttps://l10n.mozilla.org/teams/fr
>> b) connection to translation memories for the locale and other locales
>> c) pootle installation for mozilla products
>> http://mozilla.locamotion.org/  - andhttp://mozilla.locamotion.org/es_MX/
> I am confused. Why are there two dashboards for es_MX (https://l10n.mozilla.org/teams/es-MX  andhttp://mozilla.locamotion.org/es_MX/) and they don't agree?
They don't agree because they're not showing the same kind of data, and 
they're not showing the same projects.

There's a fundamental Heisenberg about stats in localization. One 
reality reports on words to translate, the other reality reports on 
localizable entities to translate.

These two are not reconcilable, as they're both true, but different. No 
infrastructure reports on both, though, so you commonly get a bit of a 
lie and a bit of the truth.

They also differ in which projects they expose. 
http://mozilla.locamotion.org/ generally exposes aurora, as that's our 
key l10n repo, and we're only exposing one to reduce confusion. elmo 
reports on all of them, as that's informing us about the things we ship.

This is one example of where elmo as a decision making infrastructure 
differs from the localization editing tool.
> It looks to me like l10n.moz is watching hg. If you assume that's the fundamental un-abstracted endpoint, then dashboards should continue to work as they do currently.
>> d) direct access to addon string repos to enable text editing.
>> ... more to be provided later
>> 2) The definition of "Storage Database" on your diagram should be HG,
>> and it should consist of two parts.   One part is a collection of files
>> that contain en-US strings, and the other part is a collection of files
>> that contain translated strings for all the possible locales that we
>> might ship.
> "Storage database" is the agnostic representation of hg. We can call out hg explicitly if you prefer.
I find "agnostic representation" to be a bold claim. I can go into any 
particular detail of hg, and make that a requirement.

Our version control and the text files within that are the one true data 
source. It's what we're using to build products.

They're not a tangential artifact.
>> Let me know if you want to make these changes to DLC localization, or if
>> you want to integrate and Axel and I can review before it gets wide
>> circulation.
> I will get them in as soon as I can.
>> We probably also need to get clear with Ben on Releng requirements that
>> System Addons will contain strings and the build system needs to
>> accomodate the creation and distribution of translated material, and
>> that system addons with strings or without need to find their way to
>> installation of all of our supported locale.  This came up in the recent
>> thread on the gofaster mail list but I'm not sure if thats a topic for
>> your doc or some other.  cc'ed ben for ideas on that.
> I think a lot of this will come from Dave's efforts in looking at how system addons are localized. He's planning on beginning that work in Q4.
The idea that we're doing "system addons" isn't helpful. We don't have 
the resources to do a bunch of things, let alone doing things that may 
or may not work out. Or things that are contradicting each other.

We need to do things that work.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/gofaster/attachments/20150903/6b2d5e3e/attachment.html>

More information about the Gofaster mailing list