Infrastructure load for August 2009

Summary:

  • There were 1,221 code changes to our mercurial-based repos. Over the month, these triggered:
    • 13,653 build/unittest jobs, or ~18 jobs per hour.
    • 6,484 talos jobs, or ~8.7 talos jobs per hour.
  • The try server data before 19th Aug was lost, because the try server hg repo had to be reset. Even without 19 days of TryServer data, we are almost the same as last month, and last month was a record month. This was definitely a busy month.
  • The mozilla-1.9.2 project branch was added during August and is now being tracked.
  • Its interesting that more checkins are happening during the PST day; its the first time I’ve seen such a night-and-day difference in traffic (pun intended). I don’t know how much the developer work week held in Mountain View skewed the numbers, but its worth noting.
  • We are still not tracking any l10n repacks, nightly builds, release builds or any “idle-timer” builds.

Details:

Here’s how the math works out:

The builds/unittest/talos jobs triggered by each individual push are:

  • mozilla-central: 13 jobs per push (L/M/W opt, L/M/W leaktest, L/M/W unittest, linux64 opt, linux-arm, WinCE, WinMo) and 6 talos jobs
  • mozilla-1.9.1: 12 jobs per push (L/M/W opt, L/M/W leaktest, L/M/W unittest, linux64 opt, linux-arm, WinMo) and 5 talos jobs
  • mozilla-1.9.2: 13 jobs per push (L/M/W opt, L/M/W leaktest, L/M/W unittest, linux64 opt, linux-arm, WinCE, WinMo) and 5 talos jobs
  • electrolysis: 12 jobs per push (L/M/W opt, L/M/W leaktest, L/M/W unittest, linux64 opt, linux-arm, WinMo) and no talos.
  • mobile-browser: 5 jobs per push (WinMO m-c, linux-arm m-c, Fennec linux desktop, linux-arm tracemonkey, WinMo electrolysis) and 2 talos jobs.
  • places: 12 jobs per push (L/M/W opt, L/M/W leaktest, L/M/W unittest, linux64 opt, linux-arm) and 6 talos jobs.
  • tracemonkey: 10 jobs per push (L/M/W opt, L/M/W leaktest, L/M/W unittest, linux-arm) and 6 talos jobs.
  • try: 9 jobs per push (L/M/W opt, L/M/W unittest, linux-arm, WinCE, WinMo) and 6 talos jobs.

UPDATE:fixed math for #build/unittest jobs, after nthomas pointed out that WinCE was missing.Thanks Nick!! joduinn 02oct209

Firefox 3.5.3 by the (wall-clock) numbers

Mozilla released Firefox3.5.3 on Wednesday 09-sep-2009, at 16:11PST. From “Dev says go” to “release is now available to public” was approx 16 days (16d 18h 35m) wall-clock time, of which Release Engineering took ~1.5 days (1d 13h 11m).

21:37 23aug: Dev says “go” for FF3.5.3
08:45 24aug: FF3.5.3 builds started
11:50 24aug: FF3.5.3 linux, mac builds handed to QA
13:38 24aug: FF3.5.3 signed-win32 builds handed to QA
05:41 25aug: FF3.5.3 update snippets available on test update channel
13:14 01sep: Dev & QA says “go” for Beta
13:23 01sep: FF3.5.3 update snippets available on live Beta channel
01:53 09sep: Dev & QA says “go” for Release; Build already completed final signing, bouncer entries
05:34 09sep: mirror replication started
06:42 09sep: mirror absorption good enough for testing
14:49 09sep: website changes finalized and visible. Build given “go” to make updates snippets live.
14:58 09sep: update snippets available on live update channel
16:11 09sep: release announced

Notes:

1) A significant amount of RelEng time was spent idle after the “go” was issued, just waiting. Specifically if “Dev says go” late night PDT, then nothing happens until the RelEng person wakes up in his timezone. If we exclude these long waiting times, the time for Build&Release drops to under a day (0d 22h 45m). I believe that is our fastest yet.
2) Our blow-by-blow scribbles are public, so the curious can read about it, warts and all, here. Those Build Notes also link to our tracking bug#511469.
3) The FF3.5.3 release was done at the same time as the FF3.0.14 release, without causing any delays to either release, and both still being super-fast release(s). Nice to see! 🙂

take care
John.

Firefox 3.5rc3 by the (wall-clock) numbers

Mozilla released Firefox3.5rc3 on Tuesday 30-jun-2008, at 08:00PST. This was the formal Firefox 3.5.0 release. From “Dev says go” to “release is now available to public” was approx 6 days (6d 16h 05m) wall-clock time, of which Build&Release took just over 2 days (2d 8h 45m).

15:55 23june: Dev says “go” for FF3.5rc3build1
16:30 23june: FF3.5rc3build1 builds started
21:48 23june: FF3.5rc3build1 linux, mac builds handed to QA
22:50 23june: Respin declared
01:05 24june: Dev says “go” for FF3.5rc3build2
01:10 24june: FF3.5rc3build2 builds started
04:00 24june: FF3.5rc3build2 linux, mac builds handed to QA
13:15 24june: FF3.5rc3build2 signed-win32 builds handed to QA
??:?? ??june: FF3.5rc3build2 update snippets available on test update channel
22:35 25june: FF3.0.11->FF3.5.0 major update snippets available on test update channel
21:30 29jun: Dev & QA says “go” for Release; Build already completed final signing, bouncer entries
22:05 29jun: mirror replication started
03:10 30jun: mirror absorption good for release
08:00 30jun: website changes finalized and visible. Build given “go” to make updates snippets live.
08:00 30jun: update snippets available on live update channel
08:00 30jun: release announced

Notes:

1) Firefox3.5 was the first time we did a major update at the same time as the release. This caused additional RelEng and QA work and is included in the times above.
2) Our blow-by-blow scribbles are public, so the curious can read about it, warts and all, here. Those Build Notes also link to our tracking bug#499687.
3)The extra long wait before the end of the release, was to wait for feedback of possible issues from beta users of 3.5rc3, as well as technical and media preparations specific to this large scale release event.
take care
John.

Firefox 3.0rc2 by the (wall-clock) numbers

Mozilla released Firefox3.0rc2 on Wednesday 04-jun-2008, at 16:25PST. From “Dev says go” to “release is now available to public” was just under 7days (6d 22h 25m) wall-clock time, of which Build&Release took almost 2.5 days (2d 11h 06m).

18:50 28may: Dev says “go” for FF30rc3
02:16 29may: FF3.0rc2build1 builds started
06:08 29may: FF3.0rc2build1 mac builds handed to QA
14:59 29may: FF3.0rc2build1 linux builds handed to QA
17:11 29may: FF30rc2build1 signed-win32 builds handed to QA
??:?? ??may: respin declared for mac only.
??:?? ??may: FF3.0rc2build2 mac-only builds started
11:18 30may: FF3.0rc2build2 mac builds handed to QA
11:54 30may: FF3.0rc2buid2 update snippets available on betatest update channel
07:33 04jun: Dev & QA says “go” for Release; Build already completed final signing, bouncer entries
07:33 04jun: mirror replication started
11:00 04jun: mirror absorption good for testing to start on releasetest channel
13:13 04jun: QA completes testing releasetest.
14:57 04jun: website changes finalized and visible. Build given “go” to make updates snippets live.
15:43 04jun: update snippets available on live update channel
16:25 04jun: release announced

Notes:

1) Firefox3.0rc2 was the last full-build-across-all-platforms release we did in the run up to the release of FF3.0.0. The formal FF3.0 release was rc3. However, as rc3 required a rebuild of mac only, with no new linux/win32 builds, it didnt make any sense to measure wallclock times on just that!

2) Our blow-by-blow scribbles are public, so the curious can read about it, warts and all, here. Those Build Notes also link to our tracking bug#426307.

3) I couldn’t find the times at which the mac-only respin was declared, or at which the mac-only respin builds started, so I calculated the times as worst-case for RelEng. If anyone has those times, could you please let me know?
4) There were some complications during this release caused by problems with netapp hardware failures while doing the release. For the curious, the details are in bug#435134. This required RelEng and IT to do some nice cross-group coordinated juggling, to avoid delaying the release.
take care
John.

Early Review of Acer Aspire One (11.5″) netbook

All the activity around netbooks which I saw while I was in Japan earlier this year made me curious. While they might be good for casual/student use, would they be sufficient for someone who works at his computer most of the day? Even though I’ve been a very happy with my 17inch MacBookPro for the last few years, I decided to keep an eye on these new netbooks, and be willing to experiment if something suitable came along.

The big thing for me was finding something with a usable keyboard. One good contender was the HP2140; I really liked the keyboard, and the very solid case, but the awkward touchpad/buttons and slightly-too-small-display kept me away. Then I saw the 11.5inch Acer Aspire One which had:

  • fullsize keyboard, I liked it even more then the HP2140 keyboard
  • 11.5″ screen (1366×768)
  • 2GB ram
  • 250GB disk
  • 6cell battery
  • 1.3GHz Intel Atom cpu
  • model# AO751H-1893. (Interesting note is that the Acer Aspire One is actually a series of different netbooks. They all look the same when shopping online, so model#s are important. The first part of the model# is about the screen / physical size, so AO751H-xxxx is for the 11.5″ display, AOD150-xxxx is for the 10″ display and AOA150-xxxx is for the 8.9″ display. The last 4 digits of the model# is about the RAM/disk/O.S. and 3-6cell battery configuration.)

I’m still getting used to the Acer, installing software etc, but here’s some first impressions:

Pro:

  • keyboard: this is great, I liked it even more then the HP2140 keyboard. Somehow Acer managed to get a full sized keyboard onto what was basically a very small, light, device. I’ve used it now for a few multi-screen-long emails/blogs, and still really like it.
  • screen: while this is obviously smaller then my 17″ MBP, it is big enough that I find it surprisingly workable enough without too much scrolling around.
  • size&weight: I know the Acer Aspire One is smaller and lighter than the 17″ Mac Book Pro. However, as a fanatical one-carry-on-bag-only traveler, I was amazed at how much extra room it left when packing my travel bag. Carrying between meetings is now so trivial that I worry about putting it down someplace and forgetting where I left it!
  • 6 cell battery equals *long* life. A typical day of a few hours usage between meetings throughout the day, means that the same one charge lasts across two days. Lets see how much that degrades as the battery gets older, but the difference from my MBP is life-changing. I used to carry an extra battery, and continuously seek out power outlets in meeting rooms and airports. Now, I just use one battery, can sometimes forget to bring the charger with me to the office and its ok.

Con:

  • Vista came pre-installed so I reimaged with Windows7 Release Candidate, which works better then Vista afaict, except for an annoying need for manual refresh of desktop when resuming from suspend/hibernate. I’ll try Windows7 for a while, and see how it goes, before I seriously consider installing linux.
  • slow CPU, little RAM; I’ve had to change my normal habits of running with 200+ tabs/windows and lots of concurrently running applications. Now, I make sure to focus on one thing at a time, and close all tabs/windows/applications as soon as I’m done. I’ve also noticed that with the MBP I’d typically wait to use it once I had a desk to work at, but with this netbook, I’m far more likely to whip out this Acer and do a quick item on-the-spot, regardless of location. It feels far less intrusive to use when others are around. Whether this is a good/bad thing is still unclear, but it is a change in my behavior triggered by the design of this new machine, so I wanted to point it out.
  • thin case: this makes me wonder how rugged it is, and how long this machine would survive the abuse of being tossed into my bag, cycling to work, long distance travel, etc. So far so good, but the MacBookPro, and even the HP2140 feel a lot more solid.
  • screen: A nit. The glossy screen has reflective surface, so backlighting can be annoying. In use so far, its been easy to reposition to avoid this, but I’d definitely prefer a matte screen to reduce this.
  • keyboard: A nit. For dim lighting situations, I miss the keyboard backlighting on the MBP.
  • lack of bluetooth seems silly, especially as some of the bluetooth specific control keys are present. Seems to be an available option in some locales, but why not everywhere? update: I’ve since discovered a little switch on front left underside which toggle bluetooth on/off. This is in addition to the keyboard fn-F3 setting. Not the most intuitive design, but it does work. joduinn 22oct2009

I’m using this Acer to write this blog post, and while traveling this week, and so far I really like it. Lets see how it goes over the coming weeks!
As an aside, I found it interesting how few applications I had to install on this new machine. The complete list of what I installed is:

  • Firefox
  • Thunderbird
  • Skype
  • Pidgin
  • OpenVPN
  • putty (for ssh)
  • OpenOffice (havent used this yet, but seemed useful)
  • (The only thing I miss is “Things”, which is a Mac-only to-do list application)

Its early days still, so if I find myself installing other applications over time, I’ll update this list. However, so far its a very short list, and mostly geared around accessing information/programs/data on other remote machines. After all the talk about “cloud computing”, “hosted services” and “moving to the web”, I was actually kinda surprised to see how close to reality this seems to be… with the obvious endorsement of the whole netbook concept.

I’m curious – what other applications (apart from the ones listed above) do other people in Mozilla use?

Rush of Releases

Between 27July and 27August, we shipped the following releases:

  • Funnelcake 9
  • Firefox 3.6a1
  • Firefox 3.0.13 (with refreshed MU for manual CheckForUpdate)
  • Firefox 3.5.2
  • enabled FF 3.0.13->3.5.2 Major Update for idle-background notification
  • Martin 1.0
  • Thunderbird 2.0.0.23
  • Fennec 1.0b3

…and in progress this week are:

  • Firefox 3.0.14 (with refreshed MU for manual CheckForUpdate)
  • Firefox 3.5.3
  • Fennec 1.0a3

That is quite an impressive list of releases from Mozilla in such a short time, and great to see how all the different groups worked together to make it all happen. However, when you put those releases in context of everything else that happened in RelEng this month: setting up new branches for place, electrolysis, mozilla-192… and recovering from a colo outage in record time… and enabling l10n nightly updates… and solving an imaging problem and then spinning up a bunch of mobile devices… all while some folks were on vacation… *and* while doing all those releases! Wow.

I am immensely proud of how the RelEng group handled the last month – its been quite a rush, and throughout it all, everything was handled calmly, smoothly and professionally.

Making life better for localizers

We now produce l10n nightly updates. Its already been announced elsewhere [1] [2], but I thought people might be interested in some context and additional details.

At night, we make “nightly builds” – complete builds of source code, containing all changes made during the day. Developers and testers use that nightly build to see if a new feature/bugfix works. Which is great – and also part of the problem. Every night we make a new nightly build. If someone wants to keep using the latest code, they have to keep manually downloading and installing a new Firefox build every day. This quickly gets annoying, and after a while, only the most dedicated will continue doing this manually. To make it easier for people to stay on the latest nightly, we generate nightly updates. This means you install a Firefox nightly build once, then every morning you will get updated to the newest nightly build.

Works great. And is an important part of the development process at Mozilla.
We make nightly builds for en-US, and for each of the 75+ locales, on each OS. However, we only made nightly updates for en-US, we never made nightly updates for any localized builds.
This means that people working on en-US nightly builds get updated each morning, but localizers who wanted to stay on the latest nightly build have to keep manually installing a new build every morning. If we generate nightly updates for en-US, why wasn’t this already done for l10n? …and if we’ve never done it reliably before, how hard could it be to start doing this?

Well…

Turns out there was a *lot* of systems refactoring needed to make this possible. Those of you who were at the Mozilla Summit in Whistler might remember this presentation summarizing what was known about the project back then. Refactoring existing l10n code and integrating with the rest of release infrastructure. Migrating/refactoring l10n nightly repack code from various dedicated l10n systems into the production pool-of-slaves. Reconciling toolchain differences. Solving edge cases of l10n nightlies missing newly added strings. Learning how nightly updates are generated and how that is different to the way release updates are generated! The list goes on and on and on… For gory details, have a look at the interlinked bugs starting with this one. (There might be other disconnected bugs – its been quite a project!)

To be fair, we’re still wrapping up some loose ends. For example, creating updates for these l10n nightlies takes time. We’ve gone from generating 3 nightly updates to generating 225 nightly updates (3 OS x 75 locales). Per branch. Per night. As of last week, there are l10n nightly updates available on mozilla-central, and mozilla-1.9.2, and as you can imagine, generating these 450 nightly updates, serially, takes time. Anyone used to getting a new en-US nightly update first thing in the morning is now seeing a delay of a few hours. This is temporary, please bear with us. Once we finish the transition from a dedicated nightly update machine to concurrent jobs on the production-pool-of-slaves, we should have fixed this delay, and also be able to start producing l10n nightly updates on other branches as requested.

(One remaining question is: which branches are being used by localisers and testers? Some prefer to work on mozilla-191, some on mozilla-192, and some on mozilla-central. If you are working on localization work with Mozilla, what branch would it be most helpful for you?)

De-tangling all this was a huge project, but crucial to Mozilla’s global localization efforts.

In terms of manhours, this is the 2nd biggest project the group has taken on in the last few years. This is now the end of August2009; Armen has been working on this since May2008, and Coop since Nov2008.

This is all super cool stuff; well-deserved hugs, kudos and beer deliveries, should be sent to Armen, Coop, Axel and Seth.

take care

John.

Major update to Firefox 3.5 (after 59 days)

Two weeks ago, we prompted users with a Major Update offer to upgrade from FF3.0.x->FF3.5.x. Now that its been out for two weeks, I took a quick look at how many users did upgrade, and how does it compare with the previous major update release?
FF3.0.x -> FF3.5 major update:

  • 45 days from release to 1st prompted MU
  • measure after 14 days of prompted MU
  • 24.7% of users on latest branch before prompted MU
  • (~65% of these users upgraded by doing pave-over download-and-install; ~35% upgraded by manually doing CheckForUpdates)
  • 37.3% of users on latest branch two weeks after prompted MU

FF2.0->FF3.0.x major update:

  • 70 days from release to 1st prompted MU
  • measure after 14 days of prompted MU
  • 35.4% of users on latest branch before prompted MU
  • (100% of these users upgraded by doing pave-over install)
  • 61.4% of users on latest branch two weeks after prompted MU

We’re still working out math here, so bear with us if these numbers get tweaked; its not as easy to figure out as you might hope. However, if these numbers are accurate, it looks like:

  • we made the major update offer sooner after the release
  • people are major upgrading at a slower rate, but consistent rate
  • people have been using CheckForUpdates at about the same rate through each of the dot-releases, not just at the initial release. This confirms the value of doing this, so we will continue to always have unprompted Major updates available for people who want to do manual CheckForUpdates.

Its worth noting that, while these two scenarios are the closest I could find for comparison, there are lots of differences between them:

  • The FF3.0 release had more outreach and publicity, f.e download day, compared to the FF3.5 release.
  • FF2.0->3.0 has more visible improvements then FF3.0->FF3.5, hence more incentive to upgrade (or perversely, more resistance to upgrade?).
  • After FF2.0, it took 18 months to ship FF3.0. After FF3.0, it took 12 months to ship FF3.5. As we continue to speed up the release cycle, is this a factor?
  • The number 3.5 sounds like a smaller upgrade. Would people have upgraded if the same exact code was called 4.0? Would fewer people have upgraded if it had been called 3.1?
  • Anything else people think might be a factor?

Thoughts on the recent colo outage

On the afternoon of Sunday 09Aug2009, our colo overheated and shutdown. The gory details are here, but basically when the air conditioners failed, the room quickly overheated to unsafe levels, and machines took themselves offline before they were physically damaged. All our build/unittest/talos infrastructure, along with large portions of the rest of Mozilla infrastructure, came to an abrupt halt.

Matthew (mrz) phoned me soon after the colo went offline, just to give me a heads up, so I was able to forewarn others in the group. The rough timeline was:

  • 13:30 PDT Sunday afternoon: colo offline
  • 21:30 PDT Sunday evening: Mozilla back online
  • 01:00 PDT Monday morning: RelEng declares build infrastructure back online

While its bad for a colo provider to have failures like this, it was impressive to watch how the RelEng and IT groups pitched in together to get things going again so quickly – reviving ~420 RelEng machines in under 12 hours was quite a feat.