Build-always vs Build-on-checkin

We currently build all the time. Fair enough. But I mean ALL the time. We actually generate and publish builds even if we know nothing has changed since the last build. If the computer is sitting idle anyway, what the harm is that, right!? Well, actually…

1)  When a developer lands a change, they’d like to see a new build generated right away containing their change. If the build machines are already busy doing a build, then the developer has to wait for the currently-in-progress-build to finish first. Fair enough. But waiting for a build which contains nothing new is just a waste of time, imho. While we think of our build time as being approx 20mins long, its actually 20mins starting counting after the current build has finished. An unlucky person making a change 5 minutes after a build starts, would have to wait 35mins to see their change in a build. These diagrams might help explain:

Obviously, even with this change, a developer could still be unlucky and land a change just after someone else’s change triggered a build-in-progress, in which case, things are no better/worse then they already are today. But if the developer is lucky, finds systems idle, then builds start immediately, and the build turnaround time is much improved.

2) After Build generates a build, the QA/performance machines take the build and measure build performance. Typically these performance results are then manually compared with previous builds, looking for deviations, etc. “Hey, this new build is x% faster/slower compared to last build”. Whenever there’s a deviation, humans try to figure out if its caused by code change, build change, perf infrastructure change, or unreproducible-time-space-continuum problems(!). This human analysis would be simplified (a little) if it was easy to automatically tell when builds were the same or contained different code.

So, what to do here?

We’re proposing changing our build machines to trigger a build only when a checkin is detected. Once a checkin is detected, an idle machine would start building immediately. Once a build finishes, we’d check to see if there had been any changes in the interim, and if yes, then immediately start building again…but if no changes, then just sit idle, waiting for changes.

Rolling this out will require changes in both the Build and QA/perf infrastructure, and we’re still figuring out all the gotchas, but we think its well worth the effort. Dev get builds faster. Build get cleaner infrastructure. QA/Perf analysis gets (slightly) simplified. For more details, see Rob Helmer’s blog (here and here).

(footnote: Running builds in parallel is a suggestion we are still investigating, which should help even further. The initial idea, of running ‘n’ continuously building processes staggered a few minutes apart does reduce developer wait time, but also generates ‘n’ times the number of builds over the course of a 24 hour day. Most of which are identical, and all of this would have ripple on impact on IT storage capacity and QA/performance testing infrastructure. By contrast, changing infrastructure to build-on-checkin, everyone gains immediately during the more quiet times of the day, and for the busier times this would bring us closer to having a pool of multiple buildbot slaves, available to start building in parallel during busy times, and be idle most of the time. More on this in another blog post.)

9 thoughts on “Build-always vs Build-on-checkin

  1. While I realize I’m an edge case any more, having a day job and only getting to check in at night, I have the feeling that this would be significantly worse for me in the common case where I have two patches to check in. Unless there’s actually a fair bit of lag between a checkin and the start of the triggered builds, if I check in to the quiet nighttime tree and trigger a build, then check in my second patch, I’m certain to have to wait two cycles to get clear for my two patches. In the current system, depending on how my luck is running, I can make a couple of checkins a few minutes apart, wait some fraction of a cycle until they get picked up, then a single cycle to get clear. I’d be going from an absolute worst case of two cycles, if my first patch went in seconds before the start of the slowest OS’s build start, but a common case of around 1.5 cycles and a lucky case of barely over one, to a complete certainty of two.

  2. I accept that build-on-checkin helps if someone commits something after a long idle period (especially in the 1-tbox case you’ve used as an example), but it seems like there are also many scenarios where developers lose. The most dramatic case is that someone commits a patch right after someone else. The first checkin kicks off all the clients and the second committer then has to wait nearly two full cycles to get a completed build. Even worse, both committers might be the same person (which is often the case), in which case the developer always get screwed (by design).

    Beyond that, the chaos of continuous builds (all the tinderboxen asynchronous) has a big advantage: whenever you commit a patch, it’s likely that one client will pick up your checkin quickly. This means everyone gets reasonably fast feedback on build bustages (so long as it’s cross-platform). With build-on-checkin, the first committer gets quick feedback (on all platforms) and everyone else gets slow feedback (on all platforms). On a relative basis, I’d expect that the initial feedback time hit to be much more significant (300% perhaps) with build-on-checkin while the time to finish builds would be marginally better or worse (the first-committer only sees a 50% decrease in build-time on average).

    The chaos of continuous builds also results in builds from different boxes building with different sets of patches (one might have A and then B+C and then D, while another might have A+B and then C+D). This is often helpful when diagnosing test failures. Build-on-checkin would decrease the heterogeneity (it would bias all the boxes to have A+B and C+D).

    I’d expect parallel builds from a buildbot pool would resolve these problems so long as the pool is sufficiently large, but without that, I don’t see much advantage to build-on-checkin (a few people benefit, the rest are worse off). I’m also not quite sure how they would distinguish checkins (since CVS has no atomic commits).

    What would be helpful is to measure the distribution of time-to-initial-feedback and time-to-finished-builds for both modes, either using historical data from tinderbox and buildbot clients, or perhaps simulating each mode with historical checkin data and hypothetical build times.

    Finally, (depending on how jumpy they are) build-on-checkin might result in more mid-checkin bustages, since the client would (by design) be trying to jump in right after someone commits something.

  3. There is also the practical problem with build-on-checkin that if a build busts without the patch having anything to do with it (impossible in theory, happens fairly often in practice) then it’s going to sit there being busted until someone kicks it. I’ve seen this happening a lot on the testing boxes which already do build-on-checkin. It’s really annoying – in the “normal” continuous system, they continue building and often the next build will be fine. In this new system, we would need to kick it by some not-so-interesting patch or whatever to see if it’s really spurious, and in the meantime nobody can checkin cause the tree is red/orange.

  4. I have a suggestion: a delayed build-on-checkin.

    For example, if nothing gets checked in, no builds are done. However, once there is a check-in, the build machine “waits” for a period of time (e.g. 1 hour), for all checkins that occur within that timeframe (1 hr) to get in before doing up a build. This way, a dev with 3 or 4 checkins have that time to check in all that he wants, including other devs who want to do so at the same time as well.

    Parallel machines could also work this way.

    Just an idea…

  5. John, I have mixed feelings about this: the current system of only kicking of unit-test builds on checkin has been problematic if the machines misbehave at all: the first build is red, and it doesn’t automatically cycle. Also, because CVS commits are not atomic, I’m worried about builds kicking off before the checkin is complete: sometimes a large CVS checkin can take several minutes.

    As for the question about stacked checkins, are you envisioning a system where each stacked commit would be built, or that if the build machines were running behind several commits would be batched? e.g.

    10am: commit A
    10:01am: builds A kick off
    10:02am: commit B
    10:03am: commit C
    10:20am: after builds ‘A’ finish, would they start on a tree with checkin ‘B’ or skip directly to ‘C’?

    And, just to be sure, you’re not talking about doing this with the perftest machines, are you, just the build machines? I think it is valuable to reduce noise to get as many perftest runs as possible, even if there have been no checkins.

  6. @bsmedberg:
    10:20am: after builds ‘A’ finish, would they start on a tree with checkin ‘B’ or skip directly to ‘C’?

    It can be setup in one of two ways:
    1) the next Build would build B+C
    or
    2) the next Build would build B, and the build after that would build C.

    @everyone:
    ‘build-on-checkin’ in Buildbot supports a ‘treeStableTimer’, which is the amount of silent (ie. no-checkins) required before building. This could be set to 5min, 1h, etc. So if we end up going to a build-on-checkin’ system in the future the problem some of describe can be made less of one.

Leave a Reply