RelEng group gathering in Boston

Last week, 18-22 November, RelEng gathered in Boston. As usual for these work weeks, it was jam-packed; there was group planning, and lots of group sprints – coop took on the task of blogging with details for each specific day (Mon and Mon, Tue, Wed, Thu, Fri). The meetings with Bocoup were a happy, unplanned, surprise.

Given the very distributed nature of the group, and the high-stress nature of the job, a big part of the week is making sure we maintain our group cohesion so we can work well together under pressure after we return to our respective homes. When all together in person, the trust, respect, love for each other is self-evident and something I’m truly in awe of. I dont know how else to describe this except “magic” – this is super important to me, and something I’m honored to be a part of.

Every gathering needs a group photo, and these are never first-shot-good-enough-ship-it, so while aki was taking a group photo, Massimo quietly setup his gopro to timelapse the fun.

This is Mozilla’s Release Engineering group – aki, armenzg, bhearsum, callek, catlee, coop, hwine, joey, jhopkins, jlund, joduinn, kmoir, mgerva, mshal, nthomas, pmoore, simone, rail. All proudly wearing our “Ship it” shirts.

Every RelEng work week is always an exhausting hectic week, and yet, at the end of each week, as we are saying our goodbyes and heading for various planes/cars/homes, I find myself missing everyone deeply and feeling so so so proud of them all.

Proposed changes to RelEng’s OSX build and test infrastructure

tl;dr: In order to improve our osx10.6 test capacity and to quickly start osx10.9 testing, we’re planning to make the following changes to our OSX-build-and-test-infrastructure.

1) convert all 10.7 test machines as 10.6 test machines in order to increase our 10.6 capacity. Details in bug#942299.
2) convert all 10.8 test machines as 10.9 test machines.
3) do most 10.7 builds as osx-cross-compiling-on-linux-on-AWS, repurpose 10.7 builder machines to be additional 10.9 test machines. This cross-compiler work is ongoing, it will take time to complete, and it will take time to transition into production, hence, it is listed last in this list. The curious can follow bug#921040.

Each of these items are large stand-alone projects involving the same people across multiple groups, so we’ll roll each out in the aforementioned sequence.

Additional details:
1) Removing specific versions of an OS from our continuous integration systems based on vendor support and/or usage data is not a new policy. We have done this several times in the past. For example, we have dropped WinXPsp0/sp1/sp2 for WinXPsp3; dropped WinVista for Win7; dropped Win7 x64 for Win8 x64; and soon we will drop Win8.0 for Win8.1; …
** Note for the record that this does *NOT* mean that Mozilla is dropping support for osx10.7 or 10.8; it just means we think *automated* testing on 10.6,10.9 is more beneficial.

2) To see Firefox’s minimum OS requirements see: https://www.mozilla.org/en-US/firefox/25.0.1/system-requirements

3) Apple is offering osx10.9 as a free upgrade to all users of osx10.7 and osx10.8. Also, note that 10.9 runs on any machine that can run 10.7 or 10.8. Because the osx10.9 release is a free upgrade, users are quickly upgrading. We are seeing a drop in both 10.7 and 10.8 users and in just a month since the 10.9 release, we already have more 10.9 users than 10.8 users.

4) Distribution of Firefox users from the most to the least (data from 15-nov-2013):
10.6 – 34%
10.7 – 23% – slightly decreasing
10.8 – 21% – notably decreasing
10.9 – 21% – notably increasing
more info: http://armenzg.blogspot.ca/2013/11/re-thinking-our-mac-os-x-continuous.html

5) Apple is no longer providing security updates for 10.7; any user looking for OS security updates will need to upgrade to 10.9. Because OSX10.9 is a free upgrade for 10.8 users, we expect 10.8 to be in similar situation soon.

6) If a developer lands a patch that works on 10.9, but it fails somehow on 10.7 or 10.8, it is unlikely that we would back out the fix, and we would instead tell users to upgrade to 10.9 anyways, for the security fixes.

7) It is no longer possible to buy any more of the 10.6 machines (known as revision 4 minis), as they are long desupported. Recycling 10.7 test machines means that we can continue to support osx10.6 at scale without needing to buy/rack/recalibrate test and performance results.

8) Like all other large OS changes, this change would ride the trains. Most 10.7 and 10.8 test machines would be reimaged when we make these changes live on mozilla-central and try, while we’d leave a few behind. The few remaining would be reimaged at each 6-week train migration.

If we move quickly, this reimaging work can be done by IT before they all get busy with the 650-Castro -> Evelyn move.

For further details, see armen’s blog http://armenzg.blogspot.ca/2013/11/re-thinking-our-mac-os-x-continuous.html. To make sure this is not missed, I’ve cross-posted this to dev.planning, dev.platform and also this blog. If you know of anything we have missed, please reply in the dev.planning thread.

John.

[UPDATED 29-nov-2013 with link to bug#942299, as the 10.7->10.6 portion of this work just completed.]

The financial cost of a checkin (part 1)

This earlier blog post allowed us to do some interesting math. Now, we can mark each different type of job with its cost-per-minute to run, and finally calculate that a checkin costs us at least USD$30.60; the cost was broken out as follows: USD$11.93 for Firefox builds/tests, USD$5.31 for Fennec builds/tests and USD$13.36 for B2G builds/tests.

Note:

  • This post assumes that all inhouse build/test systems have zero cost, and are free, which is obviously incorrect. Cshields is working with mmayo to calculate TCO (Total Cost of Ownership) numbers for the different physical machines Mozilla runs in our colos. Once those TCO costs figured out, I can plug them into this grid, and create an updated blogpost, with revised costs. Meanwhile, however, calculating this TCO continues to take time, so for now I’ve intentionally excluded all cost of running on any inhouse machines. They are not “free”, so this is obviously unrealistic, but better then confusing this post with inaccurate data. Put another way, the costs which *are* here are an underreported part of the overall cost.
  • Each AWS region has different prices for instances. The Amazon prices used here are for the regions that RelEng is already using. We already use the two cheapest AWS regions (US-west-2 and US-east-1) for daily production load, and keep a third region on hot-backup just in case we need it.
  • The Amazon prices used here are “OnDemand” prices. For context, Amazon WebServices has 4 different price brackets available, for each different type of machine available:
    ** OnDemand Instance: The most expensive. No need to prepay. Get an instance in your requested region, within a few seconds of asking. Very high reliability – out of the hundreds of instances that RelEng runs daily, we’ve only lost a few instances over the last ~18months. Our OnDemand builders cost us $0.45 per hour, while our OnDemand testers cost us $0.12 per hour.
    ** 1 year Reserved Instance: Pay in advance for 1 year of use, get a discount from OnDemand price. Functionally totally identical to OnDemand, the only change is in billing. Using 1 year Reserved Instances, our builders would cost us $0.25 per hour, while our OnDemand testers cost us $0.07 per hour.
    ** 3 year Reserved Instances: Pay in advance for 3 year of use, get a discount from OnDemand price. Functionally, totally identical to OnDemand, the only change is in billing. Using 3 year Reserved Instances, our builders would cost us $0.20 per hour, while our 3 year Reserved Instance testers cost us $0.05 per hour.
    ** Spot Instances: The cheapest. No need to prepay. Like a live auction, you bid how much you are willing to pay for it, and so long as you are the highest bidder, you’ll get an instance. This price varies throughout the day, depending on what demand other companies place on that AWS region. Unlike the other types above, a spot instance can be deleted out from under you at zero notice, killing your job-in-progress, if someone else bids more then you. This requires additional automation to detect and retrigger the aborted jobs on another instance. Unlike all others, creating spot instance takes anywhere from a few seconds to 25-30mins to get created, so requires additional automation to handle this unpredictibility. The next post will detail the costs when Mozilla RelEng is running with spot instances in production.

Being able to answer “how much did that checkin actually cost Mozilla” has interesting consequences. Cash has a strange cross-cultural effect – it helps focus discussions.

Now we can see the financial cost of running a specific build or test.

Now its easy to see the cold financial saving of speeding up a build, or the cost saving gained by deleting invalid/broken tests.

Now we can determine approximately how much money we expect to save with some cleanup work, and can use that information to decide how much human developer time is worth spending on cleanup/pruning.

Now we can make informed tradeoff decisions between the financial & market value of working on new features and the financial value of cheaper+faster infrastructure.

Now, it is no longer just about emotional, “feel good for doing right” advocacy statements… now each cleanup work has a clear cold hard cash value for us all to see and to help justify the work as a tradeoff against other work.

All in all, its a big, big deal, and we can now ask “Was that all worth at least $30.60 to Mozilla?”.

John.
(ps: Thanks to Anders, catlee and rail for their help with this.)

Infrastructure load for October 2013

  • Overall this month was quieter then usual. I’d guess that this was caused by a combination of fatigue after the September B2G workweek, the October stabilization+lockdown period for B2Gv1.2, and Canadian Thanksgiving. Oh, and of course, Mozilla’s AllHands Summit in early October. Data for November is already higher, back towards more typical numbers. A big win was turning off obsolete builds and tests which reduced our load by 20%.
  • #checkins-per-month: We had 6,807 checkins in October 2013. This is ~10% below last month’s 7,580 checkins.

    Overall load since Jan 2009

  • #checkins-per-day:Overall load was down throughout the month. In October, 15-of-31 days had over 250 checkins-per-day, 8-of-31 days had over 300 checkins-per-day. No day in October was over 400 checkins-per-day.
    Our heaviest day had 344 checkins on 28oct; impressive by most standards, yet well below our single-day record of 443 checkins on 26aug.
  • #checkins-per-hour: Checkins are still mostly mid-day PT/afternoon ET. For 7 of every 24 hours, we sustained over 11 checkins per hour. Our heaviest load time this month was 2pm-3pm PT 12.77 checkins-per-hour (a checkin every 4.7 min) – below our record of 15.73 checkins-per-hour.

mozilla-inbound, b2g-inbound, fx-team:

  • mozilla-inbound continues to be heavily used as an integration branch. As developers use other *-inbound branches, the use of mozilla-inbound at 15.8% of all checkins is much reduced from typical, and also reduced from last month – which was itself the lowest ever usage of mozilla-inbound. The use of multiple *-inbounds is clearly helping improve bottlenecks (see pie chart below) and the congestion on mozilla-inbound is being reduced significantly as people use switch to using other *-inbound branches instead. This also reduces stress and backlog headaches on sheriffs, which is good. All very cool to see.
  • b2g-inbound continues to be a great success, now up to 10.3% of this month’s checkins landing here, a healthy increase over last month’s 8.8% and further evidence that use of this branch is helping.
  • With sheriff coverage, fx-team is clearly a very active third place for developers, with 5.6% of checkins this month, This is almost identical to last month, and may become the stable point for this branch.
  • The combined total of these 3 integration branches is 30.2%, which is fairly consistent. Put another way, sheriff moderated branches consistently handle approx 1/3 of all checkins (while Try handles approx 1/2 of all checkins).

    Infrastructure load by branch

mozilla-aurora, mozilla-beta, mozilla-b2g18, gaia-central:
Of our total monthly checkins:

  • 2.6% landed into mozilla-central, slightly higher than last month. As usual, very few people land directly on mozilla-central these days, when there are sheriff-assisted branches available instead.
  • 3.2% landed into mozilla-aurora, much higher than usual. I believe this was caused by the B2G branching, which had B2G v1.2 checkins landing on mozilla-aurora.
  • 0.8% landed into mozilla-beta, slightly higher than last month.
  • 0.2% landed into mozilla-b2g18, slightly lower then last month. This should quickly drop to zero as we move B2G to gecko26.
  • 0.4% landed into mozilla-b2g26_v1_2, which was only enabled for checkins as part of the B2Gv1.2 branching involving Firefox25. This should quickly grow in usage until we move focus to B2G v1.3 on gecko28.
  • Note: gaia-central, and all other gaia-* branches, are not counted here anymore. For details, see here.

misc other details:
As usual, our build pool handled the load well, with >95% of all builds consistently being started within 15mins. Our test pool is getting up to par and we’re seeing more test jobs being handled with better response times. Trimming out obsolete builds and tests reduced our load by 20% – or put another way – got us 20% extra “free” capacity. Still more work to be done here, but very encouraging progress. As always, if you know of any test suites that no longer need to be run per-checkin, please let us know so we can immediately reduce the load a little. Also, if you know of any test suites which are perma-orange, and hidden on tbpl.m.o, please let us know – those are the worst of both worlds – using up scarce CPU time *and* not being displayed for people to make use of. We’ll make sure to file bugs to get tests fixed – or disabled – every little bit helps put scarce test CPU to better use.

[UPDATE: added mention of Mozilla Summit in first paragraph. Thanks to coop for catching that omission! joduinn 12nov2013.]

Now saving 47 compute hours per checkin!

While researching this “better display for compute hours per checkin” post , I noticed that we now “only” consume 207 compute hours of builds and tests per checkin. A month ago, we handled 254 compute-hours-per-checkin, so this is a reduction of 47 compute-hours-per-checkin.

No “magic silver bullet” here, just people quietly doing detailed unglamorous work finding, confirming and turning off no-longer-needed-jobs. For me, the biggest gains were turning off “talos dirtypaint” and “talos rafx” across all desktop OS, a range of b2g device builds, all Android no-ionmonkey builds and tests, and a range of Android armv6, armv7 builds and tests. At Mozilla’s volume-of-checkins, saving 47 hours-per-checkin is a big big deal.

This reduced our overall load by 23%. Or put another way – this work gave us 23% extra “spare” capacity to better handle the remaining builds and tests that people *do* care about.

Great, great work by sheriffs and RelEng. Thank. You.

How many hours of builds and tests do we run per commit?

  • 207 compute hours = ~8.6 compute *days* (nov2013)
  • 254 compute hours = ~10.5 compute *days* (sep2013)
  • 137 compute hours = ~5.7 compute *days* (aug2012)
  • 110 compute hours = ~4.6 compute *days*(jan2012)
  • ~40 compute hours = ~1.6 compute *days*(2009)

There’s still more goodness to come, as even more jobs continue to be trimmed; the curious can follow bug#784681. Of course, if you see any build/test here which is no longer needed, or is perma-failing-and-hidden on tbpl.mozilla.org, please file a bug linked to bug#784681 and we’ll investigate/disable/fix as appropriate.