Firefox 3.0beta2 by the (wall-clock) numbers

Mozilla released Firefox 3.0 beta2 on Tuesday 18-dec-2007, at 7:40pm PST.

From “code freeze” to “release is now available to public” was 10 days 20 hours wall-clock time, of which Build&Release took 3 days 16 hours.

23:59 04dec: Dev closes the tree, allowing only blockers from now on
23:59 07dec: Dev declares code freeze for 3.0beta2
11:15 10dec: Dev says “go” to Build
12:05 10dec: rc1 builds started
16:20 10dec: mac builds handed to QA
17:05 10dec: linux builds handed to QA
02:50 12dec: signed win32 builds handed to QA
18:30 12dec: linux & mac updates available on betatest; win32 updates still being debugged
08:10 13dec: linux & mac & win32 updates available on betatest
17:30 17dec: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
10:00 18dec: final signing, bouncer entries done; mirror replication started
12:10 18dec: mirror replication completed
19:40 18dec: website updates finished, announced

1) This was our first attempt using Build automation on trunk, so we had some teething problems keep us on our toes. The curious can look through: 407077, 404340, 405042, 405153, 406601, 406613, 406640, 406602 407177 407275 407333 407351 407670 407825 407962 407988 408058 408610 408928 408934.
2) One of our automation VMs failed out with NetApp I/O error during the builds. This was after tagging+branching completed, so we were able to reopen trunk while debugging this. Problem recurred after a reboot, but went away after a 3rd reboot. We’ve seen this intermittently in the past with a few VMs, but so far its still unresolved. Current theory is that we need a kernel update on our VM, being applied now during this downtime after the release. See details in bug#407796.
3) The tree closure work by Dev in the previous week really helped with stability. For Beta1, we had to create rc1, rc2, rc3. However, for Beta2, we only had to create rc1. Nice! Thank you!!
4) From the time Dev said “go” at 11:15, until Build reopened trunk at 14.10pm was approx 3 hours. This speedy reopening of the tree meant that Dev didnt have a huge backlog of pending changes that needed to land in a carefully coordinated manner, like what happened after the week-long closure during the Beta1 release. This was good news for Dev, and a direct consequence of using automation.
5) Having the builds done early meant that QA had time to test before we went into TestDay. It was a nice bonus to go into a TestDay with bits that QA had already tried out, and to get a full TestDay of testing during this release cycle!
6) As part of beta2, we generated updates to bring forward beta1 users to beta2. As automation was going ok, we decided to also bring forward users from every previous alpha and beta (gecko1.9a1, 1.9a2, 1.9a3…1.9a8, 3.0b1) up to 3.0b2. Testing found problems when updating from 1.9a2, 1.9a3, 1.9a4. Updating from all other alphas, and from beta1, were all confirmed to be ok, so this was deemed not a showstopper. We deleted the generated updates for 1.9a2, 1.9a3, 1.9a4, and pushed out updates for all the other versions. After some debugging, it turned out to be a problem in Firefox3.0 places code introduced between alpha1-2 and fixed between alpha4-5. We expect this to be fixed in time for Beta3, for details see bug#408443.
7) Kubla, our new website Content Management System, was used for the first time to get firstrun, release notes and announcements visible on our websites. We had a couple of teething problems with this, but overall this looks like a nice step forward for us. Wil posted some details about Kubla here; its well worth a read.
8) An interesting side effect of speeding the release up this much was having to wait for ReleaseNotes and website changes to catch up! A good problem to have! 🙂 Seriously, the real fix here is to make sure that schedule changes on the release are widely advertised across all people working on the release; somehow only certain subsets of people were informed of the earlier ship date. This meant some people who were ahead of schedule for the Friday release date suddenly had to play rush-catch-up for the Tuesday release date. It was one of the topics covered in our postmortem.
9) Because this was a Beta release, we did not do any “beta period” before releasing the beta! 🙂
10) Mirror absorption took just over 2 hours to reach the > 60% threshold, faster than usual.

Overall, the Build team had a hectic week chasing down various automation teething problems & bugs, but the automation worked quite nicely. While shipping Beta2 still took longer then setting up the China Data Center (!), Beta2 was much smoother and quicker than Beta1!
take care
John.

When to use the Beta Update channel vs the Release Update channel?

Here’s something I posted in bug#405584 today which others might find interesting.
“Can you let us know a few days before shipping, when a new FF is coming, so we can test it and confirm it doesn’t break with our site – *before* you ship FF”?

Well, actually, we already do this. Let me explain with some background…

We have 3 different channels for sending out updates to users. These channels are currently called: nightly channel, beta channel and release channel. The nightly channel keep you updated with new nightly builds as they are produced – the “bleeding edge” of browser development, so to speak, and typically of most interest to FF developers and testers. Its also the most unstable. However, I’d like to talk more about the “beta” and “release” channels.
Read More »

Firefox 2.0.0.11 by the (wall-clock) numbers

Mozilla released Firefox 2.0.0.11 on Friday 30-nov-2007, at 1:30pm PST.

From “do we need a release” to “release is now available to public” was almost 3 days (71.5 hours) wall-clock time, of which Build&Release took 36 hours.

13:50 27nov: decide bug#405584 regression in FF2.0.0.10 justifies producing a quick FF2.0.0.11 to address
16:00 27nov: Dev says “go”
17:55 27nov: 2.0.0.10rc1 builds started
19:45 27nov: linux builds handed to QA
21:45 27nov: mac builds handed to QA
12:50 28nov: win32 signed builds handed to QA
16:00 28nov: update snippets on betatest update channel
10:45 29nov: QA says “go” for Beta
14:30 29nov: update snippets on beta update channel
17:00 29nov: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
19:00 29nov: final signing, bouncer entries done
07:00 30nov: mirror replication started
13:30 30nov: update snippets on live update channel; website changes finalized and visible; release announced
Notes:
1) This was a really fast release!! Despite the fast turnaround, it felt like things were still running in a controlled calm manner, we still covered everything we usually do, and even improved on the process a little. All great things to see. The Build Automation used in FF2.0.0.11 was identical to what we used for FF2.0.0.10, so this was still not yet a “100% human free” release.
2) bug#405643 was reported as another regression in FF2.0.0.10. However, we confirmed it was actually a feature of fixing security bug#369814, and proposed a workaround for LotusDomino servers.
3) For this one fix, we decided not to wait for a beta period, as it was a one line fix already landed on trunk back on 11th Oct. However, we still wanted to move people who were using FF2.0.0.10beta forward to FF2.0.0.11beta. This meant we still needed to push update snippets out on the beta channel and test appropriately.
4) During 2.0.0.10, we had to hold the release a few hours, waiting for some website changes to be finished. In a process change for FF2.0.0.11, we started the website and release note work much earlier, starting when QA says “go” for beta. This change helped, and we plan to do this for future releases.
5) We waited until morning to start pushing to mirrors. This was done so mirror absorption completed as QA were arriving in the office to start testing update channels. We did this because we wanted to reduce the time files were on the mirrors untested; in the past, overly excited people have post the locations of the files as “released” on public forums, even though they are not finished the last of the sanity checks. Coordinating the mirror push like this reduced that likelihood just a bit.
6) Mirror absorption took 3 hours to reach all values >= 60%.
take care
John.

Firefox 2.0.0.10 by the (wall-clock) numbers

Mozilla released Firefox 2.0.0.10 on Monday 26-nov-2007, at 6.30pm PST.

From “do we need a release” to “release is now available to public” was 14 days 2 hours wall-clock time, of which the Beta period took 6.75 days, and Build&Release took 34 hours.

16:25 12nov: decide regressions introduced in FF2.0.0.9 justify producing a quick FF2.0.0.10 to address
20:20 14nov: Dev says “go”
03:40 15nov: 2.0.0.10rc1 builds started
07:05 15nov: linux builds handed to QA
11:05 15nov: linux, mac and win32 signed builds handed to QA
07:00 16nov: update snippets on betatest update channel
14:00 19nov: QA says “go” for Beta
15:00 19nov: update snippets on beta update channel
09:15 26nov: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
11:00 26nov: final signing, bouncer entries done; mirror replication started
18:30 26nov: update snippets on live update channel; website changes finalized and visible; release announced

While Build Automation in FF2.0.0.10 was much smoother than FF2.0.0.9, this was still not yet a “human free” release:
1) We still manually do signing, adding bouncer entries, starting mirror replication and monitoring mirror replication, pushing snippets to beta channel, pushing snippets to release channel. Combined, these took 5.5 hours of the Build time, and are not yet automated.
2) We had to hold the release, waiting for some website changes to be made and then published. This delay was caused by an internal human communication snafu within Mozilla – some folks had not been notified we were releasing that day, so some website changes were not ready. We eventually raised them on cellphones after they landed off a plane, and made the website changes, but this delay cost us approx 3 hours. We’re tweaking the human processes to try to avoid this in future.
3) Mirror absorption took 3 hours to reach all values >= 60%. We’ve been experimenting with the last few releases, to see what absorption value is “good enough” without hammering individual mirrors. So far, a lower limit of 70%, 65% and 60% have been tried. Without any real evidence, I just feel nervous about trying any lower percentage, as fewer mirrors would be sharing the overall load, maybe burning that mirror’s bandwidth. Open to persuasion though, if people have suggestions?!!

take care
John.

Firefox 3.0beta1 by the (wall-clock) numbers

Mozilla released Firefox 3.0 beta1 on Monday 19-nov-2007, at 11.10pm PST.

From “code freeze” to “release is now available to public” was 19 days 23 hours wall-clock time, of which Build&Release took 9 days and 1 hour.

23:59 31oct: code freeze for 3.0beta1
15:00 06nov: Dev says “go” to Build
18:25 06nov: rc1 builds started
20:30 06nov: win32 builds failed out (bug#346214)
22:30 06nov: win32 builds restarted after bug#346214 fixed on release branch
14:30 07nov: linux, mac and unsigned win32 builds handed to QA
17:25 07nov: rc1 abandoned (see details below)
17:25 07nov: rc2 builds started
17:20 08nov: rc2 builds abandoned (bug#402999)
19:05 08nov: rc3 builds started after bug#402999 fixed on release branch
17:30 09nov: linux & mac builds handed to QA
14:55 12nov: win32 signed builds handed to QA
18:15 16nov: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
14:45 19nov: final signing, bouncer entries done; mirror replication started
23:10 19nov: announced

1) There is no Build automation running on trunk, so this release was done manually.
2) The rc1 builds were abandoned because of a manual error in how cvs was tagged. Two Build engineers were working in parallel to speed things up: one engineer typed PDT timezone into one machine, while the other engineer typed PST into another computer, so the two machines had an hour difference in what source timestamp to use for the builds. That one hour difference meant the generated builds missed one important last minute showstopper bugfix. This was totally a manual snafu within the Build team, and would have been avoided if automation was in place on trunk. (Ironically, daylight savings time only changed this same week; a week earlier this same snafu would have passed blissfully unnoticed!)
3) During rc1, there was a 4h20m delay while the Build team investigated a regxpcom test error at the end of the win32 build. Turns out the build was actually fine. The error was caused by a collision between the hourly build and production build processes running on the same machine at the same instant. Killing the hourly build and rerunning production worked fine.
4) By prior agreement, we did not create update snippets for this beta. Any users on earlier Alpha releases would not get updates refreshing them forward to beta1; instead Alpha users would have to manually download and install beta1. We do plan on creating update snippets for beta2 and beta3.
5) Because this was a Beta release, we did not do any “beta period” before releasing the beta! 🙂
6) Mirror absorption took 8 hours to reach the 70% threshold, not the usual 3 hours. In a random coincidence, there was a problem with one of the central rsync hubs in the mirror farm, and also one of the major mirrors, further compounded by problems when switching to backup servers. Dave Miller has all the drama details of late night pagers, and various mirror owners jumping to help (shout out to Shane!).

take care
John.

Interesting commuter driving on Golden Gate Bridge

This morning (06:50am 28-nov-2007), a commuter went unconscious while driving her sport utility vehicle on the Golden Gate Bridge. With the sole-occupant driver unconscious behind the wheel, the car swerved out of its lane, and towards the oncoming traffic on the other side of the bridge. Another commuter reacted quickly, used his pickup truck to force the still moving sport utility vehicle away from oncoming traffic and over to the side of the road.

[Link to full story on sfgate.com]

Quick thinking and very nice driving.

Thunderbird 2.0.0.9 by the (wall-clock) numbers

Mozilla released Thunderbird 2.0.0.9 on Wednesday 14-nov-2007, at 5.10pm PST.

From “Dev says code ready to release” to “release is now available to public” was 15 days 22.5 hours wall-clock time, of which the Beta period took 6 days 8 hours, and Build&Release took just over 2.5 days (62.5 hours).

17:30 30oct: Dev say go
09:40 31oct: mac builds handed to QA
10:00 31oct: linux builds handed to QA
17:55 31oct: win32 signed builds handed to QA
06:50 02nov: update snippets available on betatest update channel
14:30 06nov: QA says “go” for Beta
16:10 06nov: update snippets available on beta update channel
00:30 13nov: Dev & QA says “go” for Release; Build starts final signing, bouncer entries
08:25 13nov: final signing, bouncer entries done; mirror replication started
09:40 13nov: Build announced enough mirror coverage for QA to use releasetest channel
12:40 13nov: win32 installer bug#403670 discovered
14:00 13nov: declare bug#403670 as showstopper, put TB2.0.0.9 on hold.
18:20 13nov: root cause and fix of bug#403670 found.
05:05 14nov: one rebuilt win32 installer handed to QA to verify bugfix
05:40 14nov: QA confirmed new win32 installer is ok.
08:30 14nov: all rebuilt win32 installers handed to QA
10:10 14nov: QA signoff on rebuilt win32 installers, mirror replication started
15:00 14nov: mirror replication confirmed complete on new win32 installers
16:00 14nov: update snippets available on release update channel (for end users)
17:10 14nov: release announced

1) This was not a “human free” release. The automation work done for FF2.0.0.9 has not been tested for TB2.0.0.9. In theory it should work just fine, but we just havent had time to test it, so we chose to play safe and do this release manually. Hence this took more time for Build to produce. All of that time was manually intensive Build work.
2) bug#403670 was caused by a combination of factors. One factor was human error, I incorrectly setup a workarea on a signing machine, the same incorrect setup works fine for Firefox releases; the signing doc has now been updated. The other factor was a long-standing-but-previously-unknown error handling problem in one of our signing scripts, how to fix this is being debated within the Build team. Note: this problem was with the windows installer only, not with any Thunderbird code, and not the linux/mac installers. Overall, this delayed the release by approx 22hours.
3) Mirror absorption times were messed up by the stop-and-restart caused by bug#403670.
4) The daylight savings PST change happened during this release, giving us an extra hour. That is counted in the overall times above.

take care
John

Keeping perspective: 34hours vs 37hours

It took 34 hours to produce Firefox3.0beta1 rc1.

Those 34 hours were frantic. Two people, tag teaming day & night, working with the nervous tension of knowing that a single one character typo could invalidate the entire build, and force us to start all over again. Those 34 hours only got us as far as producing unsigned builds on each platform – roughly 1/3 of the overall Build work needed to do a release – before we hit a problem. A typo. At the beginning of it all, one person typed PDT into one computer, while the other person typed PST into another computer. That typo meant rc1 did not include a last minute important bugfix. So, we scrapped rc1 and started all over again, building rc2. (I note that the D and S are even next to each other on the keyboard [sigh!]. And if it wasnt for the timezone change last week, it would have not mattered either[sigh! sigh!])

To put that 34 hours in perspective, Build took 37 hours to do everything needed for the complete FF2.0.0.9 release… and most of that was actually just watching the automation chugging along. Active human work was down to a handful of hours for signing, bouncer/mirror updates, and a little nervous manual rechecking of the automated checks, just to be sure, to be sure.

Why the night and day difference?

We’ve been focusing on automation for the FF2.0.0.x branch over the last few months, shipping FF2.0.0.7, FF2.0.0.8 and FF2.0.0.9 each time with automation improved from the previous release. Sadly, none of this automation work is live on trunk yet. All the trunk releases, like the alphas, and now this FF3.0beta1, are done the old fashioned way. By hand. One command at a time.

This week was a stark reminder of what things used to be like, and gave perspective on how much we’ve accomplished so far this year.

Free Software 2.0.0.9 builds now also available…

… at ftp://ftp.mozilla.org/pub/firefox/releases/2.0.0.9/contrib/free-software/.

This special build of Firefox2.0.0.9 uses the exact same code cutoff time and cvs branch as the regular Firefox2.0.0.9 release, but was compiled with branding, logos and talkback removed.

As an aside, I didnt know much about this special build until recently, hence there was no plan to include this in our build automation work. However, looking back on ftp.mozilla.org, I see quite a few of them, and asking around, it was done manually once the dust settled on a given Firefox release. We are now tracking automating these FreeSoftware builds in bug#385783, with some related cleanup in bug#402582.