Infrastructure load for April – December 2011

Note: This is the first “infrastructure load” report I’ve done since we switched to the new rapid release model. Its worth noting that because of the rapid release model, we’ve jumped to 30+ active project branches. Plotting all these branches would make the charts too noisy, and hard to interpret. Therefore, while I still look at the metrics for all branches, I now only chart the busiest or the most strategically important branches.

Having said all that, now lets get to the interesting stuff!

  • #checkins-per-month: We finally broke the 3,000 checkins-per-month barrier. August set a new record with 3,089 checkins. Then November set a new record with 3,209 checkins, and finally December set another new record with 3,262 checkins!
  • #checkins-per-day: We set a new record of 169 checkins-per-day on 06-nov-2011, only to then match or exceed that new record 4 times in December (184 on 08-dec, 169 on 15-dec, 171 on 19-dec, 184 on 21-dec).
  • I find these records more impressive because they were set in November and December – when we expected to see *low* checkin volumes. Both months have major vacations in them, so we historically see checkin volume decrease as people take vacations. Further, both months had multiple prolonged tree closures, caused by various colo outages, db server outages, etc. (The trees were able to remain open during the email outage because of the proactive work by RelEng and WebDev to refactor tbpl.mozilla.org earlier this year! Topic for another blog post.) Between the vacations, and the outages, we expected to see significantly decreased checkins in both months, so setting new records like this was unexpected. My current hypothesis is that while the outages caused a backlog of pending fixes, we were able to quickly handle the load spikes when infrastructure came back online, and developers all resumed work… until the next outage. This “full-go, full-stop, full-go” oscillation still managed to set a new record. With some of these issues now resolved, and with developers coming back from vacations, I’m eager to see what numbers we are capable of handling in January.

mozilla-inbound, fx-team:
Its very cool to see how developers have taken to using mozilla-inbound (and more recently fx-team) as integration branches, instead of having everyone landing directly into mozilla-central.

  • In the chart above, note that the number of mozilla-central checkins has decreased significantly, as number of checkins on mozilla-inbound increased.
  • Another interesting effect of this was while unwinding backlog of pending checkins after the outages. In the past whenever we had to unwind a large backlog of pending checkins, we’d keep the trees closed while developers did metered checkins to work through the backlog. However, after each of these recent outages, developer usage of mozilla-inbound and fx-team as integration branches meant that backlogs were cleared out more quickly, and with less manual metering of checkins on mozilla-central.

mozilla-aurora, mozilla-beta:
I note that ~2% of our monthly checkins land into mozilla-aurora, and in turn, about half of those ~1% then also land onto mozilla-beta. Part of me feels this is a healthy low number of fixes landing on aurora, and a healthy even lower number of fixes landing on beta. And this was part of the plan for the rapid release model. At the same time, part of me also feels like too many fixes are still needed on mozilla-beta, and I fear that this is a sign too many bugs are making it through mozilla-central / mozilla-aurora to mozilla-beta before being detected. In which case, what could we do differently in order to change this? Honestly, I cant tell for certain, and I’d be curious what others think. (Oh, and for the record, I’m always glad whenever we catch a problem *before* we ship a release; it avoids us having to do a chemspill release and its better for Firefox users too!)

misc other details:
Pushes per day

Pushes by hour of day

One thought on “Infrastructure load for April – December 2011

  1. Impressive stats, John. I’m always impressed by your team’s ability to move production code through the Mozilla systems. I’m looking to understand the behavioral effects of the rapid release cycle on the developers. Can you comment on code volume per check-in? That is, are we seeing more smaller patches, or is it historically the same amount of code per hg commit from previous years? Is the overall bug fix/close rate matching the %increase of the checkin volume? Thanks!