The financial cost of a checkin (part 1)

10 Comments

This earlier blog post allowed us to do some interesting math. Now, we can mark each different type of job with its cost-per-minute to run, and finally calculate that a checkin costs us at least USD$30.60; the cost was broken out as follows: USD$11.93 for Firefox builds/tests, USD$5.31 for Fennec builds/tests and USD$13.36 for B2G builds/tests.

checkin load desktop

Note:

  • This post assumes that all inhouse build/test systems have zero cost, and are free, which is obviously incorrect. Cshields is working with mmayo to calculate TCO (Total Cost of Ownership) numbers for the different physical machines Mozilla runs in our colos. Once those TCO costs figured out, I can plug them into this grid, and create an updated blogpost, with revised costs. Meanwhile, however, calculating this TCO continues to take time, so for now I’ve intentionally excluded all cost of running on any inhouse machines. They are not “free”, so this is obviously unrealistic, but better then confusing this post with inaccurate data. Put another way, the costs which *are* here are an underreported part of the overall cost.
  • Each AWS region has different prices for instances. The Amazon prices used here are for the regions that RelEng is already using. We already use the two cheapest AWS regions (US-west-2 and US-east-1) for daily production load, and keep a third region on hot-backup just in case we need it.
  • The Amazon prices used here are “OnDemand” prices. For context, Amazon WebServices has 4 different price brackets available, for each different type of machine available:
    ** OnDemand Instance: The most expensive. No need to prepay. Get an instance in your requested region, within a few seconds of asking. Very high reliability – out of the hundreds of instances that RelEng runs daily, we’ve only lost a few instances over the last ~18months. Our OnDemand builders cost us $0.45 per hour, while our OnDemand testers cost us $0.12 per hour.
    ** 1 year Reserved Instance: Pay in advance for 1 year of use, get a discount from OnDemand price. Functionally totally identical to OnDemand, the only change is in billing. Using 1 year Reserved Instances, our builders would cost us $0.25 per hour, while our OnDemand testers cost us $0.07 per hour.
    ** 3 year Reserved Instances: Pay in advance for 3 year of use, get a discount from OnDemand price. Functionally, totally identical to OnDemand, the only change is in billing. Using 3 year Reserved Instances, our builders would cost us $0.20 per hour, while our 3 year Reserved Instance testers cost us $0.05 per hour.
    ** Spot Instances: The cheapest. No need to prepay. Like a live auction, you bid how much you are willing to pay for it, and so long as you are the highest bidder, you’ll get an instance. This price varies throughout the day, depending on what demand other companies place on that AWS region. Unlike the other types above, a spot instance can be deleted out from under you at zero notice, killing your job-in-progress, if someone else bids more then you. This requires additional automation to detect and retrigger the aborted jobs on another instance. Unlike all others, creating spot instance takes anywhere from a few seconds to 25-30mins to get created, so requires additional automation to handle this unpredictibility. The next post will detail the costs when Mozilla RelEng is running with spot instances in production.

Being able to answer “how much did that checkin actually cost Mozilla” has interesting consequences. Cash has a strange cross-cultural effect – it helps focus discussions.

Now we can see the financial cost of running a specific build or test.

Now its easy to see the cold financial saving of speeding up a build, or the cost saving gained by deleting invalid/broken tests.

Now we can determine approximately how much money we expect to save with some cleanup work, and can use that information to decide how much human developer time is worth spending on cleanup/pruning.

Now we can make informed tradeoff decisions between the financial & market value of working on new features and the financial value of cheaper+faster infrastructure.

Now, it is no longer just about emotional, “feel good for doing right” advocacy statements… now each cleanup work has a clear cold hard cash value for us all to see and to help justify the work as a tradeoff against other work.

All in all, its a big, big deal, and we can now ask “Was that all worth at least $30.60 to Mozilla?”.

John.
(ps: Thanks to Anders, catlee and rail for their help with this.)

10 Comments (+add yours?)

  1. Kyle Huey
    20 Nov 2013 @ 01:17:18

    $30.60 per checkin is actually much lower than I expected. Assuming a loaded cost of 150k to MoCo for an engineer for 2000 hours of work annually that means it’s cheaper to burn an extra checkin than spend 25 minutes of an engineers time. A narrowly tailored try push is worth far less time than that. If you can get a try push to use 1/10th of the resources of a full checkin it’s basically always worth pushing to try rather than running 5 minutes worth of tests locally.

    (I’ve ignored second order effects like wait times here obviously)

    Reply

    • Gregory Szorc
      20 Nov 2013 @ 08:15:22

      Kyle,

      Nice analysis! But your estimate of $150k/employee is too low. I’d use at least $200k.

      Along a similar vein, you can perform math like this to compute the effectiveness of new machines for people who build a lot. E.g. $2000/year is only 1% of total employee cost.

      Reply

  2. Tim Chevalier
    20 Nov 2013 @ 01:19:10

    Heh, I was expecting this to take reviewers’ time into account.

    Reply

  3. Ben Hearsum
    20 Nov 2013 @ 04:40:29

    This is great. I would love it if the overall number were made more prominent though. Eg, I can’t find that $30.60 number in the chart (I guess it’s a composite of all platforms+products?).

    Once we have all of the hardware numbers it would also be great to throw these in json/csv/whatever and write some tools to help calculate cost of specific pushes. Eg, different branches have different sets of things, especially try.

    Reply

  4. Benoit Girard
    20 Nov 2013 @ 07:02:55

    I’m glad that we’re attaching a financial figure. It means that we can start working on the problem and attach some real savings to what we accomplish (or reject proposal if the savings are negligible. We can get these times down by optimizing the jobs but that has an opportunity cost of taking people away from the product. But with a good understanding of the cost perhaps its worth doing.

    Reply

  5. Phil Ringnalda
    20 Nov 2013 @ 07:58:59

    “Cash has a strange cross-cultural effect – it helps focus discussions.”

    I would not have stopped that sentence there, instead finishing with something like “on nothing but the cash, ignoring both difficult-to-quantify cash costs and impossible-to-quantify costs.”

    I want to see three other numbers besides how much a (full, uncoalesced) push costs: how much an hour of tree closure costs, how much a four or six hour tree closure costs (which is not at all four to six times an hour closure’s cost), and how much a developer who quits in frustration costs.

    Reply

  6. Steve Fink
    20 Nov 2013 @ 22:35:24

    Yeah, this had better not be the final cost estimate that we’re going to use to make decisions with. It only counts AWS cost, which just happens to be the one I least care about. Heck, if the cost of a full set of jobs is only $30/push, that’s nothing! Let’s eliminate coalescing and trychooser ASAP, and maybe start doing builds on “speculative backout” pushes (if 2 people push, try building with only the 2nd push applied to see what it’ll be like if you have to back out the first push.) Or run the tests twice to detect intermittents better.

    I’m guessing the reason we don’t do such things is limited in-house capacity for certain platforms: every x additional minutes of a non-expandable resource like OSX 10.6 boxes delays results by a certain amount, which leads to y1 minutes of tree closures times the number of devs waiting to push. Well, not “times” — as philor said, it’s superlinear. Add to that y2 minutes of devs waiting for OSX try results to come back before they can move on to other tasks. (Sure, there’s some amount of pipelining aka mental task switching on the devs’ part, but that helps by a small factor, not an order of magnitude. And when a dev hits their bedtime/network unavailable time, they get pushed to the next day, and have to rebase and maybe rebuild, etc.)

    The AWS cost is relevant too, but I doubt the decision is made based on it. It’s not the bottleneck yet.

    Reply

  7. @mikesorrenti
    30 Jan 2014 @ 19:00:03

    John O’Duinn of @Mozilla release engineering – “Financial cost of a checkin” http://t.co/gEdU5I471O really interesting analysis

    Reply

  8. @sawrubh
    24 May 2014 @ 04:44:11

    Just came across this and wow : http://t.co/Pk9CUhmsz8 #Mozilla

    Reply

Leave a Reply