We currently build all the time. Fair enough. But I mean ALL the time. We actually generate and publish builds even if we know nothing has changed since the last build. If the computer is sitting idle anyway, what the harm is that, right!? Well, actually…
1)Â When a developer lands a change, they’d like to see a new build generated right away containing their change. If the build machines are already busy doing a build, then the developer has to wait for the currently-in-progress-build to finish first. Fair enough. But waiting for a build which contains nothing new is just a waste of time, imho. While we think of our build time as being approx 20mins long, its actually 20mins starting counting after the current build has finished. An unlucky person making a change 5 minutes after a build starts, would have to wait 35mins to see their change in a build. These diagrams might help explain:


Obviously, even with this change, a developer could still be unlucky and land a change just after someone else’s change triggered a build-in-progress, in which case, things are no better/worse then they already are today. But if the developer is lucky, finds systems idle, then builds start immediately, and the build turnaround time is much improved.
2) After Build generates a build, the QA/performance machines take the build and measure build performance. Typically these performance results are then manually compared with previous builds, looking for deviations, etc. “Hey, this new build is x% faster/slower compared to last build”. Whenever there’s a deviation, humans try to figure out if its caused by code change, build change, perf infrastructure change, or unreproducible-time-space-continuum problems(!). This human analysis would be simplified (a little) if it was easy to automatically tell when builds were the same or contained different code.
So, what to do here?
We’re proposing changing our build machines to trigger a build only when a checkin is detected. Once a checkin is detected, an idle machine would start building immediately. Once a build finishes, we’d check to see if there had been any changes in the interim, and if yes, then immediately start building again…but if no changes, then just sit idle, waiting for changes.
Rolling this out will require changes in both the Build and QA/perf infrastructure, and we’re still figuring out all the gotchas, but we think its well worth the effort. Dev get builds faster. Build get cleaner infrastructure. QA/Perf analysis gets (slightly) simplified. For more details, see Rob Helmer’s blog (here and here).
(footnote: Running builds in parallel is a suggestion we are still investigating, which should help even further. The initial idea, of running ‘n’ continuously building processes staggered a few minutes apart does reduce developer wait time, but also generates ‘n’ times the number of builds over the course of a 24 hour day. Most of which are identical, and all of this would have ripple on impact on IT storage capacity and QA/performance testing infrastructure. By contrast, changing infrastructure to build-on-checkin, everyone gains immediately during the more quiet times of the day, and for the busier times this would bring us closer to having a pool of multiple buildbot slaves, available to start building in parallel during busy times, and be idle most of the time. More on this in another blog post.)

Comments
Leave a comment Trackback