Firefox 3.6.12 and Firefox 3.5.15 by the (wall-clock) numbers

Firefox3.6.12 was released on Wednesday 27-oct-2010, at 16:48PST. This was yet another release shipped inside of 24hours, and set a new speed record for RelEng.

From “Dev says go” to “release is now available to public” was 21h 32m wall-clock time. The Release Engineering portion of that was 10h 25m. This was faster than our previous fastest ever release FF3.6.6, and well inside of 24 hours from start to finish. For FF3.6.12, the wall clock times were:

19:20 26oct: Dev says “go” for FF3.6.12
19:48 26oct: FF3.6.12 builds started
21:55 26oct: FF3.6.12 linux, mac, unsigned-win32 builds handed to QA
00:05 27oct: FF3.6.12 signed-win32 builds handed to QA
03:35 27oct: FF3.6.12 update snippets available on test update channel
09:05 27oct: Dev & QA says “go” for Release; ok to start mirror absorption
10:50 27oct: mirror absorption started
10:55 27oct: mirror absorption good enough for testing
16:00 27oct: website changes finalized and visible. Build given “go” to make updates snippets live.
16:21 27oct: update snippets available on live update channel
16:48 27oct: release announced

While doing FF3.6.12, I note that we also simultaneously shipped FF3.5.15 in the same super-fast way.

19:20 26oct: Dev says “go” for FF3.5.15
19:46 26oct: FF3.5.15 builds started
21:55 26oct: FF3.5.15 linux, mac, unsigned-win32 builds handed to QA
11:05 26oct: FF3.5.15 signed-win32 builds handed to QA
04:35 27oct: FF3.5.15 update snippets available on test update channel
09:05 27oct: Dev & QA says “go” for Release; ok to start mirror absorption
10:10 27oct: mirror absorption started
10:50 27oct: mirror absorption good enough for testing
16:00 27oct: website changes finalized and visible. Build given “go” to make updates snippets live.
16:21 27oct: update snippets available on live update channel
16:48 27oct: release announced

Obviously, we don’t want to ship this many releases, this quickly all the time, but its nice to know that we can if we have to. Really really nice. And stay tuned – we’ve got more work in progress to make this even faster! πŸ™‚

Notes:

1) While there’s been some discussion about how this release was done in “2 f*ing days”, I prefer to measure the time between “dev says go” to “release is available”. Explicitly I do *not* measure from “fix is reported” to “release is available”. This is because I do not want to put any further stress on a developer trying to fix a problem under pressure. It feels much better to me to work a little longer to get the fix right instead of adding even more time pressure looking for a quick fix, and then having to do another emergency release a few days later to fix the “quick fix”.

2) The super fast mirror uptake was due to some great work by mrz and justdave. Not that we would do that for every release, but it was great to have the assist when time really mattered!

3) As usual, our blow-by-blow scribbles are public, so you can read all the details here or in tracking bug#607228. For FF3.5.15, the build notes are
here or in tracking bug#607240.

Thank you
John.

Infrastructure load for June 2010

Summary:

June 2010 logged 1,892 pushes – almost our previous record of 1,971 in January. Note this number for June is *under* reporting TryServer usage, as we accidentally lost Try Server usage logs from 01-10june. We assert, without proof, that we would have easily set a new record if we had the missing 10 days of data for TryServer, our busiest branch. Even missing 10-of-30 days of TryServer in June, TryServer was still the busiest branch of the entire infrastructure compared with full month data for other branches.

The numbers for this month are:

  • 1,892 code changes to our mercurial-based repos, which triggered 234,387 jobs:
  • 35,308 build jobs, or ~49 jobs per hour.
  • 111,513 unittest jobs, or ~154 jobs per hour.
  • 87,566 talos jobs, or ~121 talos jobs per hour.

Details:

  • Losing logs for 1/3 of month for our busiest branch means we are underreporting for June. Hopefully the work catlee/nthomas/anamarias are doing to automate reports will be live soon, to prevent this happening again
  • Our Unittest and Talos load continues high, like last month, and we expect this to jump further as more OS are still being added to Talos.
  • We’re still double-running unittests for some OS; running unittest-on-builder and also unittest-on-tester while developers and QA work through the issues. Whenever unittest-on-test-machine is live and green, we disable unittest-on-builders to reduce wait times for builds.
  • The trend of “what time of day is busiest” changed again this month. Not sure what this means, but worth pointing out that each month seems to be different. This makes finding a “good” time for a downtime almost impossible.
  • The entire series of these infrastructure load blogposts can be found here.
  • We are still not tracking down any l10n repacks, nightly builds, release builds or any “idle-timer” builds.

Detailed breakdown is :

Here’s how the math works out (Descriptions of build, unittest and performance jobs triggered by each individual push are here:

[UPDATE: thanks to jhford for catching some copy-paste typos! joduinn 15-jul-2010]