Making unittest life better…

During the Open Design Lunch last week, one topic that came up frequently was around unittests. Most questions were variations of “intermittent unittest failures block developers from landing”, and “unittests take too long to run”.

Hopefully this blog post will explain some of the work already done/inprogress to make this better.

Short answer is:

fixup unittest machines & toolchain
fix unittest framework so each unittest run does not require a rebuild
run unrelated unittest suites concurrently
split out big suites like mochitest into multiple smaller suites

Solving these problems will get us muchly improved end-to-end turnaround time, simplify debugging intermittent failures, and allow us to start running unittests on nightly and release builds.

A longer, more detailed answer needs more text, some diagrams… and obviously coffee!

Each “unittest run” actually does the following steps sequentially: pull tip-of-tree, build (with modified mozconfig), TUnit, reftest, crashtest, mochitest, mochichrome, browserchrome, a11y. However, this means if you run unittests twice in a row, even without any code change, you are actually doing: pull tip-of-tree, build (with modified mozconfig), TUnit, reftest, crashtest, mochitest, mochichrome, browserchrome, a11y, pull tip-of-tree, build (with modified mozconfig), TUnit, reftest, crashtest, mochitest, mochichrome, browserchrome, a11y. Note the double pull, and double build.

This causes several important problems:

each unittest cycle takes a long time, because its doing a build every time.
it was not practical to run each unittest suite as a separate concurrent job, because:

each unittest suite would need its own build step (costing more overall CPU time) and
because each build would have its own BuildID (complicating work of reassembling together all the test results afterwards).

developers have to wait until the *last* suite is completed before they see results from *any* suite.
it has never been possible to run unittests on nightly builds or release builds. (because it would require rebuilding, which defeats the purpose!)
it complicates debugging intermittent failures because:

crashes for each rebuild get different memory stackdumps
each build pulls tip-of-tree, so if a change lands while you are re-running tests, each build could get a different pull of tip-of-the-tree source code, and you’d be testing different things.
each build has a different BuildID, so harder to confirm if all builds have same code.
having new builds each time makes it hard to spot any machine or compiler problems.
typical way to find an intermittent problem is to run test ‘n’ times. If you run “reftest” 5 times in a row, thats quick and useful. However, the wasted time of rebuilding and then running all suites serially even if you are only interested in rerunning just one suite, really adds up. Running build+all unittest suites 5 times in a row quickly becomes impractical, especially when you require the tip-of-tree to remain constant for the duration.

Our plan to fix these is:

Make sure that the spec of machines/VMs being used were sufficient for either build or unittest jobs. Also, consolidate both toolchains into one toolchain suitable for both builds and unittests.
- There was lots of work done by lsblakk, robcee, schrep and others during summer 2008 to make unittest machines identical to build machines in one general purpose pool-of-slaves.
- There was also a lot of work done by robcee, schrep, mrz, justin and myself to see if the intermittent tests would be solved by moving to faster VMs or dedicated physical hardware. While its true that we can always make incremental improvements in turnaround time by spec-ing faster VMs or buying faster dedicated physical machines, those experiments found (different!) intermittent unittest failures each time.
- I assert that fixing the system design problems outlined above will get us significantly better turnaround time, and also solve other problems that just brute force cant fix, so should be done first. Only after that global (large) optimization is done, should we revisit the discussion about local (smaller) optimizations.
Consolidate the two toolchains, and consolidate the two sets of machines in one production pool-of-slaves. This was finished just before Christmas 2008 and means that:
- all build slaves and unittest slaves are now part of the one pool-of-slaves, and all able to do either builds *or* unittests.
- we can enable unittests on a new branch as the same time as we enable builds on any new branch
- we have more machines to scale up and handle build&unittest load on whatever branch is the most active branch.
- we can now run unittests everywhere we can run builds. We’re already running unittests on each active code line. We’re nearly finished enabling unittests on try server (see bug#445611)
Separate out build from unittest
- consolidate build mozconfig with unittest mozconfig
- cleanup test setup assumptions about what files/env.settings are needed by a unittest suite, being done by Ted in bug#421611.
- one by one, as each suite is separated out, we enable that standalone suite running by itself in pool-o-slaves, and disable that suite from as part of the “build-and-remaining-unittest-suite” jobs. (seeÂ bug#383136)

Once we have all the unittest suites running without requiring a build step, then we can:

quickly re-run test suites on the same *identical* build, get easy to compare stack traces, and have no concerns about unexpected landings changing what we build from tip-of-tree.
quickly re-run specific test suite of interest much quicker (if you only care about reftest, only rerun reftest…)
run tests on older builds to figure out when a test started failing intermittently.
run each separate test suite concurrently on different machines, and post results for each suite as each individual suite completes.
split the longest running suites into smaller bite-size suites, for better efficiency.
start running unittests on nightly and release builds.

All in all, this is very exciting stuff! Not sure how much of that came across in the Open Design Lunch, but hopefully that all makes sense – let me know if you have questions/comments?

tc
John.
=====
ps: An early attempt to reducing the build+unittest time was to adjust some compile options to reduce build time, but that actually complicates matters (Win32 unittest tests non-PGO builds; Mac unittest tests intel-only builds, not universal builds, etc). We’re still investigating what to do here; any suggestions?

4 thoughts on “Making unittest life better…”

Axel Hecht says:

February 26, 2009 at 4:01 am

Sounds like great stuff, good to see that we’re fixing problems instead of throwing hardware at it. Also interesting to see how long we’re already laborating on that issue.

Looking at the complexity of your simple waterfalls, it sounds to me like we got to fix our reporting infrastructure before we can parallize tests runs, though. Folks already have a hard time to find the results of their builds, adding more boxes without having a display up for those doesn’t sound like it would help and save developer time.

John O’Duinn’s Soapbox » Unittest and l10n moved from “dedicated specialized slaves” to “pool of identical slaves” says:

March 16, 2009 at 10:44 pm

[…] Later, once we get past some unittest framework cleanup, we should be able to run unittests without requiring an additional unittest-specific build first (see blog for details). Once that’s fixed, we can then start running individual unittest suites concurrently on the pool-of-slaves, which means: […]

John O’Duinn’s Soapbox » Making unittest life better - whats next? says:

March 25, 2009 at 2:14 am

[…] Since writing this post and then this post, we now have “unittests on try” running on production. Big tip of the hat to Lukas and Catlee for that! So, whats next on our “make unittest better” ToDo list? […]

John O’Duinn’s Soapbox » Welcome (back) Lukas! says:

May 21, 2009 at 7:27 am

[…] If you’ve never met Lukas, you should know that last summer, Lukas arrived for her internship, and had barely unpacked when we handed her the entire unittest infrastructure without warning. With very little guidance, she took it all on. Lukas worked non-stop stabilizing and streamlining machine configs in a million-and-one little details, chasing intermittent unittest failures to figure out if they were caused by code bugs, testware bugs, RelEng bugs or IT bugs… or a combination! She was so great, we hired her as a part-time contractor when she was heading back to Seneca. In that role, she worked with catlee and myself to consolidate all the build and unittest machines into one shared pool (details here). This project massively simplified life in RelEng, improved end-to-end turnaround time for developers; more importantly, it was a pre-req to getting unittests running on TryServer, and also to separating out build from unittest for faster turnaround times (details here, here and here). All massive stuff. […]

John's blog

Making unittest life better…

Related

4 thoughts on “Making unittest life better…”

Leave a ReplyCancel reply

Share this:

Related

4 thoughts on “Making unittest life better…”

Leave a ReplyCancel reply

Discover more from John's blog