Major update to Firefox 3.5

We’re doing something quite new things with updates as part of the FF3.5 release. Something that, as far as I know, has never been done before in Mozilla, and which is really really cool.

  1. On the day of the Firefox 3.5 release, existing Firefox3.0 users will be able to upgrade to FF3.5 simply by doing “Help->CheckForUpdate”.
  2. The release of FF3.5 starts a 6month End-of-Life period for FF3.0.x. For those 6 months, we’ll have major update offers available all the time to those FF3.0.x users.

Sounds boring, and kinda simple. To understand just how massive this improvement is, we need to compare this with what happened for FF3.0 and FF2.

2.0.0.x 3.0.x 3.5.x
Days between release and initial MU offer: 248 65 0
Percentage of EOL period that MU was available: 0% 21% 100%

This means people should be able to migrate from FF3.0->FF3.5 faster then we have historically seen people migrate from FF1.5->FF2.0 or FF2.0->FF3.0. How how much faster will people migrate? We don’t know yet, we’ve never done this before.

Obviously, we need to make a product that is compelling and which people *want* to migrate to. But at least now, with these infrastructure changes in place, any user who wants to upgrade will always be able to!

Let us know what you think over the coming months.

For the curious, here’s details on how we did it:

1) Background data on previous dates

Firefox1.5 -> Firefox2:
=======================
24Oct2006: FF2.0.0.0 released, start of FF1.5 End-of-Life.
30May2007: end of FF1.5 End-Of-Life.
In those 219 days, users were never able to major update. Our first major update available for FF1.5 users was 29June2007, a month *after* the formal End-Of-Life.

Or, put another way: Major update were available 0% of the End-of-Life
period.

Firefox2.0 -> Firefox3.0:
=========================
17Jun2008: FF3.0.0 released, start of FF2 End-Of-Life.
17Dec2008: end of FF2 End-Of-Life.
In 183 days, users could only major update 39 out of 183 days. None of those 39 days were during the initial peak of public attention around release day.

Major update were available 21% of the End-of-Life period.

Firefox3.0 -> Firefox3.5:
=========================
30Jun2009: FF3.5.0 released, start of FF3.0 End-Of-Life.
31Dec2009: (approx) end of FF3.0 End-Of-Life.
In those 184 days, we expect major update to always be available, including during initial public attention around release day.

Major updates should be available 100% of the End-of-Life period.

2) WebDev made a small, but important, change to the update infrastructure. This change means that manual CheckForUpdates major update can now be throttled differently to “background-idle” major update.

This means we can issue, and re-issue, major updates as often as we like to users who manually CheckForUpdates… without having to worry that we are annoying “background-idle” users by re-prompting them again and again with a major update dialog box each time.

Users who passively wait for major updates will now only be shown a major update dialog box when Beltzner/Sam ask for the “background-idle” major update to be unthrottled. They can make that decision based on what they think is best for the product, the user experience, and their user-update-fatigue discussions.

Furthermore, as most of the RelEng and QA work was already done earlier, as part of the CheckForUpdates work, this means that Beltzner/Sam can make those “background-idle” decisions without worrying about causing much extra work for RelEng or QA.

(For details on race conditions where people dont see the major update dialog box and on the “update fatigue” debate, see: here, here, here, here, and finally here.)
3) Nick Thomas led a bunch of significant cleanup in RelEng infrastructure, so we can now create major updates quite easily and reliably.

We used this improved infrastructure to create the FF2.0.x-> FF3.0.x major update offers.

We also used this to create “fake” FF3.0.x -> FF3.5beta/rc major update offers several times over the last 6 months in advance of the FF3.5 release. QA were able to test each of these, and file blockers in FF3.5 as needed. By the time it comes to release day, QA have already tested major update several times, including on the latest FF3.5rc3.

We will also be using this to create a new major update offer from FF3.0.x -> FF3.5.x., every time as we ship a new FF3.0.x security release.

There are a few different scenarios we had to handle for that (see the photo of whiteboard before we moved office, for red lines in scenarios A, B, C, D!) but they’re all covered.
This change is important because it fixes a problem described here where users could see a major update offer only until we shipped the next security release. The new security release blocked access to the pre-existing major update. Now, by re-issuing a new major update at the same time as the new security release, users will *always* be able to see a major update offer.

Thats it.

Hopefully all that made sense. I know its a obscure corner in the infrastructure, but I hope this post explains why all this is strategically important to Mozilla and to Firefox.

No more missing entities in the l10n nightlies

While most attention is focused on FF3.5, I wanted to echo what Coop said recently about a boring-sounding, yet really important, change to how we produce l10n nightly builds.

Every l10n nightly is now guaranteed to not be missing any entities.

Whats that mean? Why is that so important?

  1. All the nightly builds (whether they are en-US or any other locale) have the same actual code functionality. ok, so what?
  2. When running the en-US version of Firefox, the code displays the en-US version of a string. When running es-ES version of firefox, the code gets the es-ES version of the string. Yeah, ok, so what?

There is an interesting race condition problem here though.

Between the time when the new string is added in en-US and when a localizer gets to land the equivalent localized string in their locale, we are still producing nightly builds. This means that for some days/weeks, the generated l10n nightly build has problems when Firefox goes to load the localized string to display, and finds the string missing for that locale. The only symptom the nightly l10n user will see is an internal error message or blocked functionality, or refuse-to-start bustages or a crash-with-stackdump.

This has been a problem with l10n nightlies since before I got here, and as far as I know, has always been a problem since the Mozilla project started. This problem also has some significant consequences, detailed below.

What have we changed?

When a new string is added to an en-US build, the en-US nightly has that new string (as you would expect). The l10n nightlies will now also have that new en-US string (as you might not expect) but only until the localized string is created by localizer. (Its also tied into Axel’s L10n dashboard so he can track those non-localized strings, and make sure we don’t accidentally ship with them!). Once a localized version of string is added, the new l10n string will be used.

There are 4 important consequences of this change:

  1. Localizers can now safely download the latest nightly without having to first manually read through the checkin logs, and l10n dashboard to figure if an l10n nightly is safe to use, or whether that new l10n nightly will crash out with missing strings.
  2. Localizers can now see the exact location and usage of what they are translating, in the exact context. Much better than looking at a list of strings in a text file, and having to install en-US to see where the new string is being used. This is a really big deal when figuring out language subtleties.
  3. Fixing this was the last big pre-req before we can start producing nightly updates for l10n builds. Recall that >50% of users are on non-en-US Firefox. However, only en-US nightlies have nightly updates. Until now localizers have to manually download a new nightly after they have manually figured out if it is safe to install. Now that we know each nightly has all entities, it makes it safe to offer automatic nightly updates for l10n, just like we do for en-US. And now that other infrastructure cleanup has been done, this is now finally possible. Here’s even a French nightly update that Armen has running in staging right now. 🙂 The curious can follow along in bug#449828.
  4. This *might* help simplify how localizers get changes into place during the release cycle before a string freeze is announced for a release. Until now, some localizers prefer to wait until all new strings are in place, and string freeze declared before they start doing all translations as quickly as possible. Now, with each l10n nightly being as safe to use as the en-US nightly, localizers might start using l10n nightlies as their default browser. That means translated strings could be added when localizer has time, the automatic update to next nightly would show that translated string in use, and the localizer could adjust if needed for screen size, font bugs, etc.

All in all, its awesome stuff by Armen, Coop and Axel.

Better: A Surgeon’s Notes on Performance by Atul Gawande

Short summary: I loved this book. I’ve read it end-to-end twice, and dipped back into specific sections a few times since. While the book is written by a surgeon, and naturally focuses on medical situations, he does so in a style that is totally readable by everyone, and on topics that I think are important to everyone, not just other surgeons. To give you an flavor of the book, here’s two quick points that really resonated with me personally.

The opening chapter gives a great description of all his non-medical preparations for a specific surgery case of a patient of his; describing all his dealing with hospital administrators, the patient, nurses, other doctors in the hours required to plan the surgery, and the unforeseen circumstances which change those plans without warning. All this is before he finally gets to do what he trained to do – pick up a knife to start cutting.

This hit a chord with me, because I find it interesting how much it also applied to other fields. Software development is often portrayed as people sitting at desks typing source code, and thats it. Much like surgery is portrayed as a solitary surgeon with a knife. Or a live concert by U2 is portrayed as the four band members who get on stage. The amount of behind-the-scenes work in each of these fields is colossal, and coordinating all those people is a seriously complex task in itself. Throughout the book he outlines work that people are doing to improving existing complex large-group processes. People who were, literally, “making things better”. I found it all very inspirational.

Later in the book, he described a logistical situation in Karnataka, India, reacting to a confirmed case of Polio. To stop this one case becoming a Polio epidemic, the World Health Organization did a vaccination program in the area. Sounds boring and routine. Aid agencies have been doing vaccination programs for years, it should routine, right?The numbers quoted from Brian Wheeler, Chief Operations Officer for WHO’s polio program, just blew my mind.

They had to vaccinate every child under 5 years of age in an area of 50,000 square miles centered around that single Polio case. Anything less then 90% coverage of the target population – the percentage needed to shut down transmission enough to stop the spread – would be a failure. To do this, they needed to hiring and train 37,000 vaccinators, 4000 health care supervisors, rent 2000 vehicles, supply 18,000+ insulated vaccine carriers, get everyone to the actual location in rural India and have the workers go door to door to vaccinate 4.2 million children.

In three days.

And they didn’t have much advance notice either – from the first confirmed report of Polio to people on the ground, starting the Polio vaccination program was only 32 days.

How do you make all that more efficient for future outbreaks? Everything from rapid escalation processes, so WHO gets involved sooner,  to dealing with cultural/social/educations issues. And they’re still figuring it out.

Try to read a few pages; I suspect you just wont be able to put it down. Thats what happened to me with both of his books so far. His previous book “Complications” was great, and this new book was as good, or maybe even… better?

Nostalgia and excitement

…seems the best description of the last few days.

Friday was a big day for several reasons. We:

  • started building FF3.5beta99 (build#1)
  • aborted FF3.5beta99 (build#1) after a blocker was found, and started FF3.5beta99 (build#2)
  • pushed FF3.0.11build#2 to beta users
  • started building TB2.0.0.22
  • oh, and moved office.

The first four would have counted as a busy day. A really busy day. Add to that the contingency planning to make sure that we could still be ready whenever we finally got the “go” to start FF3.5rc1, regardless of when the physical building move really happened. Both FF3.5rc1 date and building move date changed quite a bit, so we just made plans for the worse case – doing it all on the same day. There was some last minute changes to the contingency plans when we added FF3.5beta99 to the schedule late last week.

While I dont normally like respins, in this one case, I was happy for the FF3.5beta99 respin, as it suddenly gave us ~3 hours before we would need the signing machine again. So, Aki, John Ford and myself quickly moved the mobile devices from Aki’s desk, and the signing machine keymaster out of the server room, into two cars, and drove over in careful slow convoy to the new building. Thankfully Aki thought to put all the mobile devices into a portable guitar base pedal case, which made it “easy” to carry.

(Healthy paranoia caused *me* to carry keymaster, and the signing keys, so I would deal with the consequences if it got dropped in the move. )

…although keymaster looked too unsecured in the back seat like that, so Aki sat in the back, and physically held it for the drive.

Earlier today, I went back to Building K, tracking down some loose ends. It was weird and nostalgic walking around the ghost of the empty building all by myself. Its been my home-from-home for the last two years, and it was surreal to see it all empty like this.

ps: if anyone knows who this crutch belongs to, could they let us know?

Big tip of the hat to Rhian, Chris Beard, Karen, Erica and IT for a phenomenal job on all this. I’ve done moves like this in previous companies, and there’s always a million-and-one loose details. But they seemed to have everything all calmly taken care of. Quite amazing!

Taking Mobile Unittest and Talos offline

Quick note repeating what I mentioned in the Monday Foundation call, the Tuesday developer meeting and what Aki blogged about here

We’re powering off all the mobile linux-arm Unittest and Talos machines tomorrow (Friday  5th June 2009) to box them up and move them to their new home. With any luck they’ll be back online late Friday, but it might take until Monday 8th, depending on a bunch of stuff beyond our control right now. They’ll be in a server room in the new building, and Aki can finally get some desk space! 🙂

Please be gentle with the mobile linux-arm builds while these devices are offline!

TryServer now supports linux-arm and WinCE mobile builds

Late last week, John Ford enabled mobile builds on TryServer. This is a huge boost to the mobile developers working on Fennec because:

  1. Fennec developers on linux-arm or WinCE can now confirm that their patches work on both mobile o.s. before they land, without needing to maintain two environments on their desk, or swap patches back-forth with other mobile developers.
  2. Firefox developers could never tell if their patch was going to break mobile builds. If a patch did break mobile builds, the developer would have to guess a bustage fix, quickly land it,  watch to see if that fixed the entire problem and maybe repeat. Meanwhile mobile developers were stuck with a broken build for the duration.

Obviously, doing these extra builds per TryServer adds extra load to our TryServer pool-of-slaves, but we’ve just added 20 extra slaves there to offset the extra load. We’ll continue to watch wait times, but if you see any problems, please file a bug.

Way to go, John – and all done by week#4 of your internship. Exciting start to the summer!!