1.4 million Firefox MajorUpdates in 6 hours?!?

Its been exactly 6 hours since we made FF2.0.0.16 -> FF3.0.1 Major Update live.

In those 6 hours, we’ve now served major updates to 1,416,982 users.

Thats impressive when you think that we pushed the updates live exactly 6 hours ago. (7.15pm PST), and we initially had throtting enabled. Now, with throttling off for the last 3 hours, we’re picking up the pace, doing 7095 major updates *per minute*. Thats 118 major updates *per second*. This means the total for the next 6 hours should be higher again.

After the Guinness Book of Records event, its easy to take numbers like that as “ho hum”… but thats a *lot* of people upgrading. Very exciting times!

This Major Update release feels like it went really smoothly. Especially going from start to finish in only 10 days. To be fair, we have been working on this since Dec2007, when it was on the RelEng and QA goals for Q2 2007. At the time, we wanted to make sure that, as much as possible, any major-upgrade-blockers were found and fixed *before* we released Firefox 3.0. This was our 5th complete end-to-end “test run”, and it worked great. It also forced us to cross-check partner builds, and all sorts of interesting upgrade scenarios, long before we released, …not after! See bug#394046 for some details. All this behind-the-scenes-homework was already done, so this rapid turnaround was totally possible.

Big tip-o-the-hat to nthomas (after months of work, he was on leave when the actual release happened!), bhearsum (for stepping in at the last minute), Tomcat, juanb and last-but-not-least to rhelmer (who came back to help with moral support and sanity checking!).

Friday 22nd August 2008 was a big day because…

1)ร‚ย  …it was schrep‘s last day at Mozilla. Karen organized, amongst other things, a poster we all signed. As I was leaving my scribbles on the poster, I couldn’t help notice the number of other people who used the word “privilege” when talking about working with schrep. Very true. Very very true.

2) …Armen and Lukas both finished their internships here at Mozilla. One of the cool (and sometimes scary!) thing about doing internships at Mozilla, is that we dont have people come in and do mundane silly work – we put them on real projects, on the front lines, just like any other full time member of the team. Armen took on some really knotty problems around l10n build infrastructure, which made him quite the popular guy at the Summit, with his well-attended and lively presentation session. Lukas took on owning *all* of our unittest infrastructure, and started streamlining underlying infrastructure at the same time as working with many different folks to help stabilize test results.
This was our first time ever having interns here in Release Engineering, so there were a lot riding on this “experiment”. They both rocked (sometimes quite literally, thanks to the new wii Rockband setup downstairs in Building K). So we’re very grateful to be able to say “Goodbye Armen and Lukas as amazing interns” and “Hello Armen and Lukas as part time contractors”!

3) …after months of behind-the-scenes preparation, we finally rolled out Major Update from Firefox -> FF 3.0.1. There’s always lots of details to fret about when doing major updates. But it feels a lot more organized than last time. We’ve been building and testing this since December, so we were totally ready for this when we decided to bring Major Update forward on the schedule. (Trivia: in fact, we’d already handed the major update snippets to QA the day *before* the schedule was re-juggled!)

4) …after work, later that night, I finally started packing for Burning Man. ๐Ÿ™‚

Sessions at Mozilla Summit 2008

There’s been lots of changes to our Release infrastructure over the last year, so it was really great to be able to give a few different presentations at the Summit. We wanted to explain further details behind some changes already live, as well as see what ideas people had about some of the new still-in-progress work that would be showing up soon.

The discussions after these sessions, and also in various corridors / bar / meals afterwards were really helpful. Really wonderful. And all in that casual, spontaneous, “hey I was thinking about what you said and…” Mozilla way. Yes, its doable on irc/email, but having us all there “trapped by bears and rockslides” in Whistler really made this much more valuable, imho.

The slides will be a little hard to follow without the voice-over and hand waving, but hopefully, they still make sense. Let me know if you have any questions/comments, ok?

Dodging the VMWare “12 August” bullet

On the way to work this morning, I saw an email from bhearsum.

Turns out the latest update from VMware (ESX 3.5 update2) contained a licensing bug. Starting at 00:01 on 12th August (yes, today!), the licensing software would incorrectly think that VMWare licenses had expired. Any running VMs would stay running. However, any VMs being powered up, restarted or moved would refuse to start; instead it would throw an “Internal Error” and remain down.
More details from VMware, and selected press coverage, are here:

Thankfully, we had not installed that latest update from VMware, so we dodged that bullet by sheer luck.

(Today could have been a *lot* more exciting!)

Release Automation already works on cvs, so how hard is it to do on hg?

Short version:

The switch from cvs to mercurial involves a lot more then just changing all occurrences of “exec cvs” to “exec hg”.

Long version:

1) Removing bootstrap and tinderbox from the release automation.
For Firefox3.0, we have BuildBot+Bootstrap+Tinderbox on cvs. Modifying the existing BuildBot+Bootstrap+Tinderbox code to work on hg would be much more work than just removing the bootstrap+tinderbox layers completely – which is something we want to do anyway. Rather then do slower, throwaway, work, we’re instead removing bootstrap+tinderbox as part of this transition from cvs to hg. This should *significantly* simplify life afterwards, and simplify future development.

2) The way our code is organized in CVS and Mercurial is very different, and worth noting as it forces us to rethink what used to work before.

Background: In cvs, the en-US code is in one repo, and all l10n locales are in one other repo (total of 2 repos). This means that all shipping bits could be uniquely identified by two timestamps. However, in hg, the en-US code is spread across 3 hg repos (moz-central, comm-central and mobile-browser), and all l10n locales are spread across 59 repos and growing. (today’s total is 62 repos and growing).

This means:
a) we have to redo automation to handle additional repos, and do this in a way that can easily support additional repos, as new locales are added. There’s already been 11 new locales in the 8 weeks since the FF3.0 release and we expect this rate of growth to continue.
b) because hg uses changesets, not timestamps, we need to keep track of the 62 different changesets. We used to rely on an email from release-drivers with two timestamps. However, this doesnt scale so well up when dealing with 62 or more unique changesets and risks typos / cut-paste errors. (See thread in dev.planning about “How do we sign-off on Mercurial based releases’ code+locales?”)
c) we need to figure out the branch / tag situation in hg. Whether we branch&tag in the same repo, or create a new “release” cloned repo, or do something completely different, is still being discussed in newsgroups. (See thread in dev.planning about “Decision Time – Re: Branching for Firefox releases in Mercurial”).

I’ve explained this verbally or on whiteboards a few times before and during the summit, so thought it would be useful to blog about here.

Hopefully that answers more questions then it raises!?

De-tangling timestamps: part3 – the new BuildID!

Today we’ve changed the BuildID from being YYYYMMDDHH to YYYYMMDDHHMMSS.

Extending the BuildID so its not just accurate to the hour, but now accurate to the second might sound like a small simple change, but its quite significant and something we’ve been working on for several months.

Whats the big deal? Why bother put all this effort into something so small? Why is it so important?

First, some background: the BuildID was designed to identify a build to within an hour. Having year/month/day/hour information in YYYYMMDDHH format is perfectly sufficient, so long as we never start more than one build in the same hour. That *used* to be true. In fact when builds normally took over an hour, people could safely assume the BuildID was a unique identifier – after all, it almost always was!

However, in recent years, this became more of a problem because:

  • the linux and mac builds take well under an hour, and its easily possible to accidentally create two builds with the same BuildID. This promptly causes weird breakages, and leaves some users unable to update. Win32 builds take much longer, so this problem less likely to happen on win32.
  • the AUS updates are sent out to users based on their BuildID. Having two different builds with the same BuildID means that AUS will possibly send you updates for the “other” build. See bug#431866 and bug#432014 for recent examples.
  • the Tinderbox, Builds, Unittest, Talos and GraphServer systems all needed ways to uniquely identify builds, so each system invented their own (different!) way of doing this… all because the old BuildID was not good enough. Matching up these different systems – how to post a talos result on tinderbox for example – is quite a headache. We’ve already done some cleanup (here and here), but this new BuildID should let us simplify *much* more of that cross-system integration.
  • the graph server is used as a way to track performance, and identify which changes need to be backed out. With the old BuildID, 9.01am looked identical to 9.59am, so the Sheriff had to pad out the entire hour on each side of the regression. Having more accurate ranges reduces the number of changes to consider backing out.

The new BuildID contains year/month/day/hour/minutes/seconds in YYYYMMDDHHMMSS format. We simply added 4 digits to the end. Its the least intrusive change that we could do, and still addresses all the problems above. Hope all that makes sense – of course, please ping me if you have any questions/comments.

Many many thanks to Armen, Bhearsum, Coop, Nthomas and Ted/luser for all their help.

take care


ps: We’ve been working on this for a long while now, in bug#431270. Hopefully, we’ve caught all the edge-cases, but if you see any related weirdness/fallout, please file a bug in mozilla.org:ReleaseEngineering. Thanks ๐Ÿ™‚

[update 08aug08: fixed typo in first line, where YYMM should have been YYYYMM. Thanks to Stรƒยฉphane for catching that. ]

Mozilla Summit 2008

The Mozilla Summit 2008 is over.

I’ll be doing other blogs on the many specific topics that came up during the Summit, but here are some quick impressions now that I’m back in SFO.

  • Smooth efficient organization getting us there, (and back!); nice hotel, nice town, nice scenery.
  • Bears:ร‚ย  Gary, from Hong Kong, couldn’t sleep because of jet lag so went outside to catch some fresh air at dawn… and discovered bears trying to get into the hotel kitchen doors. [link] ร‚ย  “Coincidentally”, the early morning joggers start buddy-signup lists ๐Ÿ™‚
  • Rockslide closes the main road from Vancouver to Whistler [link], damaging a bus [link], but no-one was hurt.
  • Power outage when laundry trunk hit power transformer for a few hours. [link, link]
  • Improvised presentations given in dimly lit conf rooms with emergency lighting powered by the hotel generators. Some had people gathered around a borrowed laptop (Thanks Asa!), others were done from memory at a flipchart.
  • Truck rollover blocked the remaining road (called a “paved goat trail” by a local taxi driver) for a few hours.
  • Dinner on the mountaintop was spectacular; the ghostly trees and pylons gliding silently past the gondolas in the fog; the glowing robots at the window, with snow falling in the background outside; getting the local band to rickroll us.
  • schrep’s going away speech was very personal and moving. It caught all of us off guard, and there were quite a few bashful engineers around me discretely dabbing eyes and pretending not to be crying. I remain very honored to have worked with schrep, and wish him the best in his new role.
  • Meeting localizers and developers from around the world was absolutely priceless. Everywhere I looked, there were impromptu breakout groups; there were even some around RelEng and l10n, and one *very* late Wed night breakout session with Nick dialed in from New Zealand.
  • A few people stayed at Whistler after the general group left. After the hectic times of the last few weeks, I was hoping for a quiet few days. Imagine our surprise to discover the hotel bar packed with a rowdy “Art Walk” crowd; some were the usual art gallery types, but quite a few were also Burners. Suffice to say, the antics and dress-code were surreal.
  • Fog cut out the floatplanes option for most of Fri and Sat. Thankfully, I woke up Sunday morning to crisp, clear skies and just about perfect flying weather. Made it into Vancouver, and across town in plenty of time for my connection to SFO.

I was really impressed by how everyone handled this week, and rolled with the punches. Its easy to be a gentleman when everything is going perfectly; a true measure of a gentleman is how he behaves when things are not going perfectly. Everyone happily chipped in whenever needed. Before coming to the summit, I was already impressed by the sheer number of people who instinctively “do the right thing”; so now I’m even *more* impressed by everyone at Mozilla. Makes me proud! I dont care what ZDNet says about the summit; I was there, and it was fan-tas-tic.

The Mozilla Summit 2008 is over. The impact of Mozilla Summit 2008 is *not* over.

[updated 03aug08 to fix some typos, formatting, and link to bug#448604]

How to search Thunderbird emails with Spotlight on a MacBookPro (OSX 10.4.11)

Ever since I started using Thunderbird on a mac (over a year ago), its been annoying that Spotlight searches other files on my laptop, but not my emails. I finally had time to put aside an evening to try this, and got it working in just a few minutes.

Here’s what I did:

  • Shutdown Thunderbird. The more cautious should backup their files – I was feeling cavalier so didn’t bother.
  • Go to https://bugzilla.mozilla.org/show_bug.cgi?id=290057#c110 and download the attached Thunderbird.mdimporter.zip zipfile.
  • Open the zipfile on your desktop, and extract the file Thunderbird.mdimporter.
  • Move Thunderbird.mdimporter to either “~/Library/Spotlight/” (which I did) or “/Library/Spotlight/” (as suggested in some other posts)
  • Start Thunderbird, and go to Thunderbird->Preferences. In the Preferences dialog, go to the Advanced tab, and then at the bottom of the General sub-tab, click on the “Config Editor…” button. Search for mail.spotlight.enable, and double-click it in the search results to change the value from “false” to “true”.
  • Close Thunderbird.
  • Open a terminal shell, and run "/usr/bin/mdimport -L" to verify that the new Thunderbird importer is correctly found and now running. If Thunderbird.mdimporter is not here, go back and verify the steps above.
  • Some posts commented that you needed to restart your machine, but I cant remember if I needed to do this.
  • After waiting a few minutes, use Spotlight to search for an email. Try searching for something obvious – like “@” or your email address – the point is to see if any of the indexing has started. If the indexes are still being built, you might find very few results, but should at least get something. Allow time for indexes to get built on all emails.
  • Observe that Spotlight lists email messages in “Mail Messages” section of search results. Observe that clicking on an email message in search results will open a new Thunderbird window of that actual email.
  • Thats it – enjoy! ๐Ÿ™‚

Tip of the hat to razal.de, dennis.ca, rosshollman.com, macosxhints.com for pointing the way; I ended up doing a subset/combination of parts of each of their instructions, so hope folks find the steps I followed useful.

For the record, I was using the following:

  • MacBookPro running OSX 10.4.11
  • Thunderbird