Firefox 3.0beta2 by the (wall-clock) numbers

Mozilla released Firefox 3.0 beta2 on Tuesday 18-dec-2007, at 7:40pm PST.

From “code freeze” to “release is now available to public” was 10 days 20 hours wall-clock time, of which Build&Release took 3 days 16 hours.

23:59 04dec: Dev closes the tree, allowing only blockers from now on
23:59 07dec: Dev declares code freeze for 3.0beta2
11:15 10dec: Dev says “go” to Build
12:05 10dec: rc1 builds started
16:20 10dec: mac builds handed to QA
17:05 10dec: linux builds handed to QA
02:50 12dec: signed win32 builds handed to QA
18:30 12dec: linux & mac updates available on betatest; win32 updates still being debugged
08:10 13dec: linux & mac & win32 updates available on betatest
17:30 17dec: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
10:00 18dec: final signing, bouncer entries done; mirror replication started
12:10 18dec: mirror replication completed
19:40 18dec: website updates finished, announced

1) This was our first attempt using Build automation on trunk, so we had some teething problems keep us on our toes. The curious can look through: 407077, 404340, 405042, 405153, 406601, 406613, 406640, 406602 407177 407275 407333 407351 407670 407825 407962 407988 408058 408610 408928 408934.
2) One of our automation VMs failed out with NetApp I/O error during the builds. This was after tagging+branching completed, so we were able to reopen trunk while debugging this. Problem recurred after a reboot, but went away after a 3rd reboot. We’ve seen this intermittently in the past with a few VMs, but so far its still unresolved. Current theory is that we need a kernel update on our VM, being applied now during this downtime after the release. See details in bug#407796.
3) The tree closure work by Dev in the previous week really helped with stability. For Beta1, we had to create rc1, rc2, rc3. However, for Beta2, we only had to create rc1. Nice! Thank you!!
4) From the time Dev said “go” at 11:15, until Build reopened trunk at 14.10pm was approx 3 hours. This speedy reopening of the tree meant that Dev didnt have a huge backlog of pending changes that needed to land in a carefully coordinated manner, like what happened after the week-long closure during the Beta1 release. This was good news for Dev, and a direct consequence of using automation.
5) Having the builds done early meant that QA had time to test before we went into TestDay. It was a nice bonus to go into a TestDay with bits that QA had already tried out, and to get a full TestDay of testing during this release cycle!
6) As part of beta2, we generated updates to bring forward beta1 users to beta2. As automation was going ok, we decided to also bring forward users from every previous alpha and beta (gecko1.9a1, 1.9a2, 1.9a3…1.9a8, 3.0b1) up to 3.0b2. Testing found problems when updating from 1.9a2, 1.9a3, 1.9a4. Updating from all other alphas, and from beta1, were all confirmed to be ok, so this was deemed not a showstopper. We deleted the generated updates for 1.9a2, 1.9a3, 1.9a4, and pushed out updates for all the other versions. After some debugging, it turned out to be a problem in Firefox3.0 places code introduced between alpha1-2 and fixed between alpha4-5. We expect this to be fixed in time for Beta3, for details see bug#408443.
7) Kubla, our new website Content Management System, was used for the first time to get firstrun, release notes and announcements visible on our websites. We had a couple of teething problems with this, but overall this looks like a nice step forward for us. Wil posted some details about Kubla here; its well worth a read.
8) An interesting side effect of speeding the release up this much was having to wait for ReleaseNotes and website changes to catch up! A good problem to have! 🙂 Seriously, the real fix here is to make sure that schedule changes on the release are widely advertised across all people working on the release; somehow only certain subsets of people were informed of the earlier ship date. This meant some people who were ahead of schedule for the Friday release date suddenly had to play rush-catch-up for the Tuesday release date. It was one of the topics covered in our postmortem.
9) Because this was a Beta release, we did not do any “beta period” before releasing the beta! 🙂
10) Mirror absorption took just over 2 hours to reach the > 60% threshold, faster than usual.

Overall, the Build team had a hectic week chasing down various automation teething problems & bugs, but the automation worked quite nicely. While shipping Beta2 still took longer then setting up the China Data Center (!), Beta2 was much smoother and quicker than Beta1!
take care
John.

When to use the Beta Update channel vs the Release Update channel?

Here’s something I posted in bug#405584 today which others might find interesting.
“Can you let us know a few days before shipping, when a new FF is coming, so we can test it and confirm it doesn’t break with our site – *before* you ship FF”?

Well, actually, we already do this. Let me explain with some background…

We have 3 different channels for sending out updates to users. These channels are currently called: nightly channel, beta channel and release channel. The nightly channel keep you updated with new nightly builds as they are produced – the “bleeding edge” of browser development, so to speak, and typically of most interest to FF developers and testers. Its also the most unstable. However, I’d like to talk more about the “beta” and “release” channels.
Read More »

Firefox 2.0.0.11 by the (wall-clock) numbers

Mozilla released Firefox 2.0.0.11 on Friday 30-nov-2007, at 1:30pm PST.

From “do we need a release” to “release is now available to public” was almost 3 days (71.5 hours) wall-clock time, of which Build&Release took 36 hours.

13:50 27nov: decide bug#405584 regression in FF2.0.0.10 justifies producing a quick FF2.0.0.11 to address
16:00 27nov: Dev says “go”
17:55 27nov: 2.0.0.10rc1 builds started
19:45 27nov: linux builds handed to QA
21:45 27nov: mac builds handed to QA
12:50 28nov: win32 signed builds handed to QA
16:00 28nov: update snippets on betatest update channel
10:45 29nov: QA says “go” for Beta
14:30 29nov: update snippets on beta update channel
17:00 29nov: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
19:00 29nov: final signing, bouncer entries done
07:00 30nov: mirror replication started
13:30 30nov: update snippets on live update channel; website changes finalized and visible; release announced
Notes:
1) This was a really fast release!! Despite the fast turnaround, it felt like things were still running in a controlled calm manner, we still covered everything we usually do, and even improved on the process a little. All great things to see. The Build Automation used in FF2.0.0.11 was identical to what we used for FF2.0.0.10, so this was still not yet a “100% human free” release.
2) bug#405643 was reported as another regression in FF2.0.0.10. However, we confirmed it was actually a feature of fixing security bug#369814, and proposed a workaround for LotusDomino servers.
3) For this one fix, we decided not to wait for a beta period, as it was a one line fix already landed on trunk back on 11th Oct. However, we still wanted to move people who were using FF2.0.0.10beta forward to FF2.0.0.11beta. This meant we still needed to push update snippets out on the beta channel and test appropriately.
4) During 2.0.0.10, we had to hold the release a few hours, waiting for some website changes to be finished. In a process change for FF2.0.0.11, we started the website and release note work much earlier, starting when QA says “go” for beta. This change helped, and we plan to do this for future releases.
5) We waited until morning to start pushing to mirrors. This was done so mirror absorption completed as QA were arriving in the office to start testing update channels. We did this because we wanted to reduce the time files were on the mirrors untested; in the past, overly excited people have post the locations of the files as “released” on public forums, even though they are not finished the last of the sanity checks. Coordinating the mirror push like this reduced that likelihood just a bit.
6) Mirror absorption took 3 hours to reach all values >= 60%.
take care
John.

Firefox 2.0.0.10 by the (wall-clock) numbers

Mozilla released Firefox 2.0.0.10 on Monday 26-nov-2007, at 6.30pm PST.

From “do we need a release” to “release is now available to public” was 14 days 2 hours wall-clock time, of which the Beta period took 6.75 days, and Build&Release took 34 hours.

16:25 12nov: decide regressions introduced in FF2.0.0.9 justify producing a quick FF2.0.0.10 to address
20:20 14nov: Dev says “go”
03:40 15nov: 2.0.0.10rc1 builds started
07:05 15nov: linux builds handed to QA
11:05 15nov: linux, mac and win32 signed builds handed to QA
07:00 16nov: update snippets on betatest update channel
14:00 19nov: QA says “go” for Beta
15:00 19nov: update snippets on beta update channel
09:15 26nov: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
11:00 26nov: final signing, bouncer entries done; mirror replication started
18:30 26nov: update snippets on live update channel; website changes finalized and visible; release announced

While Build Automation in FF2.0.0.10 was much smoother than FF2.0.0.9, this was still not yet a “human free” release:
1) We still manually do signing, adding bouncer entries, starting mirror replication and monitoring mirror replication, pushing snippets to beta channel, pushing snippets to release channel. Combined, these took 5.5 hours of the Build time, and are not yet automated.
2) We had to hold the release, waiting for some website changes to be made and then published. This delay was caused by an internal human communication snafu within Mozilla – some folks had not been notified we were releasing that day, so some website changes were not ready. We eventually raised them on cellphones after they landed off a plane, and made the website changes, but this delay cost us approx 3 hours. We’re tweaking the human processes to try to avoid this in future.
3) Mirror absorption took 3 hours to reach all values >= 60%. We’ve been experimenting with the last few releases, to see what absorption value is “good enough” without hammering individual mirrors. So far, a lower limit of 70%, 65% and 60% have been tried. Without any real evidence, I just feel nervous about trying any lower percentage, as fewer mirrors would be sharing the overall load, maybe burning that mirror’s bandwidth. Open to persuasion though, if people have suggestions?!!

take care
John.

Firefox 3.0beta1 by the (wall-clock) numbers

Mozilla released Firefox 3.0 beta1 on Monday 19-nov-2007, at 11.10pm PST.

From “code freeze” to “release is now available to public” was 19 days 23 hours wall-clock time, of which Build&Release took 9 days and 1 hour.

23:59 31oct: code freeze for 3.0beta1
15:00 06nov: Dev says “go” to Build
18:25 06nov: rc1 builds started
20:30 06nov: win32 builds failed out (bug#346214)
22:30 06nov: win32 builds restarted after bug#346214 fixed on release branch
14:30 07nov: linux, mac and unsigned win32 builds handed to QA
17:25 07nov: rc1 abandoned (see details below)
17:25 07nov: rc2 builds started
17:20 08nov: rc2 builds abandoned (bug#402999)
19:05 08nov: rc3 builds started after bug#402999 fixed on release branch
17:30 09nov: linux & mac builds handed to QA
14:55 12nov: win32 signed builds handed to QA
18:15 16nov: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
14:45 19nov: final signing, bouncer entries done; mirror replication started
23:10 19nov: announced

1) There is no Build automation running on trunk, so this release was done manually.
2) The rc1 builds were abandoned because of a manual error in how cvs was tagged. Two Build engineers were working in parallel to speed things up: one engineer typed PDT timezone into one machine, while the other engineer typed PST into another computer, so the two machines had an hour difference in what source timestamp to use for the builds. That one hour difference meant the generated builds missed one important last minute showstopper bugfix. This was totally a manual snafu within the Build team, and would have been avoided if automation was in place on trunk. (Ironically, daylight savings time only changed this same week; a week earlier this same snafu would have passed blissfully unnoticed!)
3) During rc1, there was a 4h20m delay while the Build team investigated a regxpcom test error at the end of the win32 build. Turns out the build was actually fine. The error was caused by a collision between the hourly build and production build processes running on the same machine at the same instant. Killing the hourly build and rerunning production worked fine.
4) By prior agreement, we did not create update snippets for this beta. Any users on earlier Alpha releases would not get updates refreshing them forward to beta1; instead Alpha users would have to manually download and install beta1. We do plan on creating update snippets for beta2 and beta3.
5) Because this was a Beta release, we did not do any “beta period” before releasing the beta! 🙂
6) Mirror absorption took 8 hours to reach the 70% threshold, not the usual 3 hours. In a random coincidence, there was a problem with one of the central rsync hubs in the mirror farm, and also one of the major mirrors, further compounded by problems when switching to backup servers. Dave Miller has all the drama details of late night pagers, and various mirror owners jumping to help (shout out to Shane!).

take care
John.