Now creating L10n nightlies a whole new way!

Since last Wed (01oct2008), we’ve been producing *two* sets of L10n nightly builds every night on the FF3.0 line, built in slightly different ways.

The builds produced the “new” way are at:

The builds produced the “old” way are at:

If you notice any problem with the new l10n nightly builds, please first check to see if the “old” nightly build has the same problem. If you find  a problem in the new l10n nightlies that is not a problem in the old l10n nightlies, please file a bug in, and we’ll get right on it.

Nothing has came up so far, in our weeks of running on staging, testing so far or even in Friday’s testday focus on these builds. If all continues to go well, we’re planning to stop producing the 1.9 nightlies the “old” way this coming Wednesday (08oct2008). We’ll probably wait a few more days before mothballing all the various custom machines that were being used for the “old” way. After that, we’ll begin similar changes on the mozilla-central / FF3.1 systems.

This is a really really huge deal, and very exciting for RelEng and for the l10n community. Details are below, but trust me, this is big. 🙂

Some background for the curious:

(Some details covered during Monday’s Weekly Update call.)
(Some details in Armen’s blog and Seth’s blog.)
(Lots more details are in this presentation from the Mozilla2008 summit.)

Every night, we do a full compile and link of the en-US version of Firefox. We then take that en-US version, open it up, replace the locale strings with the strings for a different locale, and rebundle everything back together. This is called a “nightly l10n repack”. We then repeat the process for all the other locales, on all o.s. If there are string changes during the day, we have a slightly different system that does repacks during daytime. For official releases we have yet another slightly different system that does repacks for releases.

The change we are rolling out is important for several reasons:

– Doing repacks is really slow. Each individual repack is quick… approx 1 minute. However, we now have 60+ locales (and counting). Multiply that by 3 different o.s. and it adds up to 180+ repacks. One minute per repack adds up to 3 hours for the entire set.

– The “old” system treats all 180+ repacks all as one giant “block”, starting alphabetically with linux/af and ending with win32/zh-TW. With the “old” system, a locale arriving after code freeze would force us to throw out all repacks and start the entire set for 180+ repacks again. Or keep that set, and repack the late arriving locales manually, which is error prone and not that much faster. The “new” system treats all repacks as independent jobs which can be done in parallel. The consequences of this are huge:
— We can now handle a locale arriving after code freeze during a release as just another job-in-the-queue, without disrupting the existing set of locales in progress, and without needing to do the late arriving repack manually.
— We can share the 180+ repacks across the pool of available build machines, which will give us huge time gains. Instead of trying to improve overall time by trying to shave a few seconds from each repack, we’re breaking this into discrete pieces that can be tackled in parallel.

– Now, all L10n repacks (whether for releases, nightlies or incremental during the day), will all be created the same way, running on the same pool of machines that are also producing en-US release/nightly/incremental builds. This eliminates bugs caused when a nightly l10n machine is somehow slightly differently configured then a release machine. (During the FF2.0.0.7 release, for example, we had to scramble to fix a surprise CR-LF problem on win32 l10n builds caused by exactly this.)

– Moving from using dedicated specialized machines to a pool of machines is important for reliability and uptimes. Until now, if a specific l10n machine crashed, there was no failover, it just closed the tree, and required a late night pager for IT and RelEng to fix before the tree could be reopened. Now if machine in the pool dies, repacks will continue on another available machine in the pool.

– This brings us closer to the ultimate goal here… producing updates for people who using l10n nightlies. We currently produce nightly updates for users of en-US nightlies, but we’ve never produced updates for l10n nightlies… yet! Stay tuned….

(Oh, and there always the good hygiene of removing lots of clunky/obsolete code. We’ve got so much legacy/complex systems making it hard to figure out whats safe to change, that any cleanup/streamlining really helps simplify other work being done by others in the group. This recurring payback is great!)

There’s a ton of cleanup work behind the scenes here that made this possible. I have to point out that Armen has been working on this, and only this, non-stop since late April. Its been amazing having him patiently pick through the various tangled knots to figure out how to make all this happen. Also, many thanks to axel, bhearsum, coop, nthomas and seth for their help untangling folklore and various historically important systems and code weirdnesses to get to this point.

(If I’ve missed anyone over the last 5 months of this project, sorry, poke me and I’ll correct!)

2 thoughts on “Now creating L10n nightlies a whole new way!

  1. Thanks a lot for this post John and of course for making this new process possible. I’m a localizer myself, but I only had a vague understandig of how repackaging worked. I was just frustrated, that somethimes I had to stay up late to see if my checkin had broken anything. Thanks everybody for making life better for us 🙂