So, how exactly do all the automated build and test systems connect together?

Trying to describe how our various build, unittest and talos systems connect together is tricky. The Release Engineering group spent a week all together recently, with lots of diagrams on whiteboards, just to explain it to each other.

Trying to describe it *without* a whiteboard is even more tricky… and there’s always lots of hand waving.

Trying to describe it in clear concise article is…wow. Ben Hearsum and John Resig did a really nice overview here.  Well worth a read, in case you missed it.

“Software Update Channel” != “Software Distribution Channel”

Recent blog posts by John, Asa and Matt happened as my home WinXP computer offered to “update” Safari… something I have never installed!?!

Most comments on their blogs can be paraphrased as “you’re only complaining because its a competing browser”… or “you’re only complaining because it somehow costs Mozilla money”.

Thats missing the point completely.

Here’s a quick non-browser example.

Suppose Microsoft Windows Automatic Updates (which delivers O.S. security fixes) suddenly also offered to download and install Microsoft’s GearsOfWar game? And defaulted to “yes”. Even if you never owned that game before. If you have your preferences set to “ask me”, then you get a chance to uncheck the checkbox, *if* you notice. But if your preferences are set to “apply automatically”, which is the default, you’ll just get GearsOfWar installed automatically.

The very first time this happens to me, I’d assume that the vendor considers “software update channel” to be the same as “software distribution channel”, and they want to sell me their other products. So, I’d turn off updates. Which, by the way, means I no longer get O.S. security fixes. If I was really annoyed, I might turn off updates for other vendors while I’m at it, so I no longer get Norton Anti-Virus updates either.

Agreeing to receive updates is agreeing to letting a trusted other person quickly fix problems on my computer, before I even know its a problem. Sometimes its fixes bugs in software, so users dont keep hitting problems that were fixed last year; anyone remember downloading patches for Win31? (heck, anyone remember ftp-ing downloads pre-1995?) Sometimes, the speed at which the fix is distributed is critical to protect users; anti-virus updates, browser security fixes, and O.S. security fixes are great examples of this.

If people stop trusting updates, because a few vendors abuse that trust, its bad for the software industry and its bad for users.

Its that simple.

Firefox 2.0.0.13 by the (wall-clock) numbers

Mozilla released Firefox2.0.0.13 on Tuesday 25-mar-2008, at 16:30pm PST. From “Dev says go” to “release is now available to public” was 15.25 days (15d 5h 55m) wall-clock time, of which Build&Release took just over 2.33 days (2d 8h 10m).

10:35 10mar: Dev says “go” for rc1
14:50 11mar: FF2.0.0.13 builds started
16:55 11mar: FF2.0.0.13 linux builds handed to QA
19:00 11mar: FF2.0.0.13 mac builds handed to QA
07:10 12mar: FF2.0.0.13 signed-win32 builds handed to QA
14:40 12mar: FF2.0.0.13 update snippets available on betatest update channel
11:30 18mar: Dev & QA says “go” for Beta
12:25 18mar: update snippets on beta update channel
09:10 25mar: Dev & QA says “go” for Release; Build already completed final signing, bouncer entries
10:25 25mar: mirror replication started
11:20 25mar: mirror absorption good for testing to start on releasetest channel
14:20 25mar: QA completes testing releasetest.
15:00 25mar: website changes finalized and visible. Build given “go” to make updates snippets live.
16:00 25mar: update snippets available on live update channel
16:30 25mar: release announced

Notes:

1) This was Ben Hearsum’s first time doing a release. He works in the Release group, and he’s smart, but he’s never done a release for Mozilla. Ever. The fact that he jumped into doing this release with absolutely no advance notice, and was able to use our existing automation without needing to ask any questions at all says lots about both Ben and how things are improving.

2) From Build’s point of view, this was a fast release. We took 2 days 8 hours, which is one of our fastest releases ever. Note: between the “Dev says go to build” and “build started” was a delay of 1 day 4 hours where Build did nothing. This delay was because we were busy with 3.0beta4 and also trying to balance out some other workloads across the group. I counted this delay as part of our 2days 8 hours, but I have to point out that if we had been ready, our total time for FF2.0.0.13 would actually been halved; we would have only needed a totally screaming fast 1day 4hours.

3) For better or worse, we are putting all our blow-by-blow scribbles public, so the curious can read about it, warts and all, here. Those Build Notes also link to our tracking bug#422122.

4) Like before, we waited until morning to start pushing to mirrors. This was done so mirror absorption completed as QA were arriving in the office to start testing update channels. We did this because we wanted to reduce the time files were on the mirrors untested; in the past, overly excited people have post the locations of the files as “released” on public forums, even though they are not finished the last of the sanity checks. Coordinating the mirror push like this reduced that likelihood just a bit.

5) Mirror absorption took 1 hour to reach all values >= 50%, slightly faster and slightly lower then our usual threshold.

take care

John.

Firefox 3.0beta4 by the (wall-clock) numbers

Mozilla released Firefox3.0beta4 on Monday 10-mar-2008, at 17:25pm PST. From “Dev says go” to “release is now available to public” was just over 7 days (7d 6h 10m) wall-clock time, of which Build&Release took just over 3 days (3d 2h 05m).

11:15 03mar: Dev says “go” for rc1
16:10 03mar: 3.0b4rc1 builds started
23:15 03mar: 3.0b4rc1 mac builds handed to QA
00:05 04mar: 3.0b4rc1 linux builds handed to QA
06:00 04mar: 3.0b4rc1 signed-win32 builds handed to QA
11:15 04mar: 3.0b4rc1 three missing linux locales were resolved and handed to QA. See bug#419771 and bug#407796 for details.
15:35 04mar: 3.0b4rc1 update snippets available on betatest update channel
08:35 07mar: 3.0b4rc1 showstopper: discovered win32 was compiled without PGO. Need to respin win32 builds. Mac and linux confirmed ok.
11:50 07mar: 3.0b4rc2 win32 builds started
00:05 08mar: 3.0b4rc2 signed-pgo-win32 builds handed to QA
14:00 08mar: 3.0b4rc2 update snippets available on betatest update channel
20:00 09mar: Dev & QA says “go” for Beta; Build already completed final signing, bouncer entries
07:00 10mar: mirror replication started
09:15 10mar: mirror absorption good for testing to start on releasetest channel
13:15 10mar: QA completes testing releasetest.
14:45 10mar: website changes finalized and visible. Build given “go” to make updates snippets live.
15:50 10mar: update snippets available on live beta update channel
17:25 10mar: QA completes testing beta channel. Release announced

Notes:

1) The Build Automation used in FF3.0b4 included a bunch of fixes landed after FF3.0b3, which helped make things smoother. Despite the respin, yet again, all the housekeeping of the last few weeks paid off.

2) For better or worse, we are putting all our blow-by-blow scribbles public, so the curious can read about it, warts and all, here. Those Build Notes also link to our tracking bug#418926.

3) It took us much longer then usual to start the builds.We had been distracted on other projects during the prior week, and not done *any* of the prerequesite setup work in advance of this release.
4) We hit bug#419771 and bug#407796 as fallout from the recent kernel updates on this machine, which delayed announcing win32 builds by a few hours.

5) In 3.0b4rc1, the win32 builds were confirmed to be compiled *without* the PGO compiler optimizer. This was a problem caused by how the new PGO compiler was being enabled in tinderbox, and was completely a Build snafu. The same changes were required to two copies of an identical config file, but we only updated one, and forgot about the other. We had to completely rebuild the win32 builds from the beginning, and verified the bits as they were being produced. Note that mac and linux builds did not have to be rebuilt, but to avoid confusion, we symlinked linux-rc1 -> linux-rc2 and mac-rc1 -> mac-rc2.

6) Like before, we waited until morning to start pushing to mirrors. This was done so mirror absorption completed as QA were arriving in the office to start testing update channels. We did this because we wanted to reduce the time files were on the mirrors untested; in the past, overly excited people have post the locations of the files as “released” on public forums, even though they are not finished the last of the sanity checks. Coordinating the mirror push like this reduced that likelihood just a bit.

7) Mirror absorption took 2 hours 15mins to reach all values >= 60%, slightly faster then our usual threshold.

take care

John.