Infrastructure load for October 2013

  • Overall this month was quieter then usual. I’d guess that this was caused by a combination of fatigue after the September B2G workweek, the October stabilization+lockdown period for B2Gv1.2, and Canadian Thanksgiving. Oh, and of course, Mozilla’s AllHands Summit in early October. Data for November is already higher, back towards more typical numbers. A big win was turning off obsolete builds and tests which reduced our load by 20%.
  • #checkins-per-month: We had 6,807 checkins in October 2013. This is ~10% below last month’s 7,580 checkins.

    Overall load since Jan 2009

  • #checkins-per-day:Overall load was down throughout the month. In October, 15-of-31 days had over 250 checkins-per-day, 8-of-31 days had over 300 checkins-per-day. No day in October was over 400 checkins-per-day.
    Our heaviest day had 344 checkins on 28oct; impressive by most standards, yet well below our single-day record of 443 checkins on 26aug.
  • #checkins-per-hour: Checkins are still mostly mid-day PT/afternoon ET. For 7 of every 24 hours, we sustained over 11 checkins per hour. Our heaviest load time this month was 2pm-3pm PT 12.77 checkins-per-hour (a checkin every 4.7 min) – below our record of 15.73 checkins-per-hour.

mozilla-inbound, b2g-inbound, fx-team:

  • mozilla-inbound continues to be heavily used as an integration branch. As developers use other *-inbound branches, the use of mozilla-inbound at 15.8% of all checkins is much reduced from typical, and also reduced from last month – which was itself the lowest ever usage of mozilla-inbound. The use of multiple *-inbounds is clearly helping improve bottlenecks (see pie chart below) and the congestion on mozilla-inbound is being reduced significantly as people use switch to using other *-inbound branches instead. This also reduces stress and backlog headaches on sheriffs, which is good. All very cool to see.
  • b2g-inbound continues to be a great success, now up to 10.3% of this month’s checkins landing here, a healthy increase over last month’s 8.8% and further evidence that use of this branch is helping.
  • With sheriff coverage, fx-team is clearly a very active third place for developers, with 5.6% of checkins this month, This is almost identical to last month, and may become the stable point for this branch.
  • The combined total of these 3 integration branches is 30.2%, which is fairly consistent. Put another way, sheriff moderated branches consistently handle approx 1/3 of all checkins (while Try handles approx 1/2 of all checkins).

    Infrastructure load by branch

mozilla-aurora, mozilla-beta, mozilla-b2g18, gaia-central:
Of our total monthly checkins:

  • 2.6% landed into mozilla-central, slightly higher than last month. As usual, very few people land directly on mozilla-central these days, when there are sheriff-assisted branches available instead.
  • 3.2% landed into mozilla-aurora, much higher than usual. I believe this was caused by the B2G branching, which had B2G v1.2 checkins landing on mozilla-aurora.
  • 0.8% landed into mozilla-beta, slightly higher than last month.
  • 0.2% landed into mozilla-b2g18, slightly lower then last month. This should quickly drop to zero as we move B2G to gecko26.
  • 0.4% landed into mozilla-b2g26_v1_2, which was only enabled for checkins as part of the B2Gv1.2 branching involving Firefox25. This should quickly grow in usage until we move focus to B2G v1.3 on gecko28.
  • Note: gaia-central, and all other gaia-* branches, are not counted here anymore. For details, see here.

misc other details:
As usual, our build pool handled the load well, with >95% of all builds consistently being started within 15mins. Our test pool is getting up to par and we’re seeing more test jobs being handled with better response times. Trimming out obsolete builds and tests reduced our load by 20% – or put another way – got us 20% extra “free” capacity. Still more work to be done here, but very encouraging progress. As always, if you know of any test suites that no longer need to be run per-checkin, please let us know so we can immediately reduce the load a little. Also, if you know of any test suites which are perma-orange, and hidden on tbpl.m.o, please let us know – those are the worst of both worlds – using up scarce CPU time *and* not being displayed for people to make use of. We’ll make sure to file bugs to get tests fixed – or disabled – every little bit helps put scarce test CPU to better use.

[UPDATE: added mention of Mozilla Summit in first paragraph. Thanks to coop for catching that omission! joduinn 12nov2013.]

6 thoughts on “Infrastructure load for October 2013

  1. A lot of Mozilla projects seem to be using git for version control these days. What’s the ratio of git versus mercurial usage across Mozilla? Will there come a time when mercurial will be dropped purely because most people are using git (which isn’t necessarily a bad thing)?

  2. Notice also the larger-than-normal dip in checkins-per-day around the Summit. There are at least 2 or 3 days of reduced weekday load, and the weekend itself is even lower than normal.

  3. John,

    I’m trying to reproduce your results from the raw pushlog data on the Mercurial servers and my numbers are consistently coming up short – about 25-35% short! I’m scratching my head trying to identify the discrepancy.

    I was hoping you could better explain what a “checkin” is. What trees does it encompass? Are there any non-obvious/private trees in that set? Do you count job retries multiple times?

Leave a Reply