Cleaning up the build process for Fennec developers

A little over a week ago, ted, kyle, joey, jhford and myself got together to see how we could help improve the Makefiles specifically to help Fennec developers.

I thought people might be interested in a quick progress report.

The typical compile-package-deploy-test cycle for Fennec developers is time consuming, and tricky, mostly because of layers of workarounds for dependency misfires in the Makefile logic. Because developers do this countless times every day, every developer wants this compile-package-deploy-test cycle to be as quick as possible, so they can make progress quickly. Over time, every developer grows their own trusted workarounds to generate valid builds as quickly as possible, which is a fair solution to suboptimal Makefiles. However, badly documented, poorly understood, fragile workarounds can be a recurring cause of stress and delays, especially when one mistep in the workarounds can send you down a long false debugging trail. Not what people need right now.

To figure out where to start, we:

  1. Asked a few developers for specific pain points that they would like us to start on first. Everyone has their own pet peeve. But if there was something *all* developers mentioned, we looked there first. If there wasn’t a bug already filed, we filed one.
  2. Used some very cool tools Joey created to generate reports on time-usage-by-directories, to help us decide where the biggest time-sink is, and hence where to focus first.
  3. Studied the “how to build fennec” wiki page. This let us see what all new Fennec developers learn first, as well as learn some of the more commonly used workarounds and gotchas. Over time, developers have learned their own undocumented workarounds for different gotchas, so different people follow different build process steps, but this wiki was a great starting point.
  4. Triaged through existing Core:BuildConfig bugs to see if there are any unresolved bugs which identify problem areas.
  5. Timed full clobber builds. On my MBP, I get clobber Fennec builds completing in 20-40mins. Its not yet clear to me why such a range in timings. Joey and jhford and myself all got similar range of timings, which I found even more interesting because we each got the same range of number even though our machines range from MBP laptops up to high-spec 8 way desktop machines.
  6. Timed depend builds. Of course depend builds are trickier to measure – what gets built depends on what you touch. So I tried the “simplest” case. If I do a full clobber build, change nothing, and start a depend build, how long does that depend build take? Note: there was no change after the previous clobber build, so if everything was working right, this should take minimal time to traverse dependencies and should not recompile/relink anything. On my MBP, “nothing-to-do depend builds” take 2m45s -> 3m15s. And worst of all, imho, always did a bunch of recompiling, rebuilding manifests… even though we know nothing has changed… urgh.

To make things better, so far, we:

  • Setup, tested and deployed new Android r7b NDK and faster “gold” linker to production RelEng machines (bug#675572). We also added this same NDK and linker to posted pre-build toolchain for easy developer usage (bug#745956). As well as being faster to compile and link, this also gives us support for Android4.0 (Ice Cream Sandwich).
  • Filed and landed Bug#746741 is to add a new makefile target “build_and_deploy” to encapsulate the rebuild/repackage/install steps on Android. While this might seem like an odd place to start, this is important because there is a lot of confusion caused by the different makefile workarounds that each developer has evolved for themselves over time. Some of these workarounds become folklore which people do on faith, even when the original need has since been fixed. Figuring out what is “supposed to be done” and publishing one clear target which does that consistently, gives us something to keep working on. As we improve things inside the Makefiles, this “build_and_deploy” target will keep getting better, including handling more of the current workarounds, and getting faster every time. As developers discover that this one supported “build_and_deploy” target safely does everything that they need, and is a fast, safe alternative to their workarounds, developers will gradually no longer feel a need to do workarounds… which means developers can instead focus on making the shipping product better.

Having said all that, we’re still only scratching the surface. There’s lots more to fix. Much much more. So we filed bug#748452 to track the list of most-urgent things we’ve found so far.

If you are building Fennec and knows of a problem with the Makefiles that impact your ability to get work done, please file a bug in Core:BuildConfig for us. Of course, the trick is to do this without breaking any other groups that use the same Makefiles. If you already filed a bug before, and its still open, please cc us on the bug, OR at least put a brief description of the problem into an email to any of us, and we’ll triage.

Thanks for the patience. More news soon as we make things better.

5 thoughts on “Cleaning up the build process for Fennec developers

    • hi glandium;

      Yep, there are similar inefficiencies in desktop Makefiles, for sure. Some of the Fennec Makefiles have shared logic with desktop Makefiles, so the Fennec improvements might have the happy sideeffect of also helping desktop. Once we get the fennec Makefiles into better shape, I hope we can focus attention on the desktop makefiles – any improvements there would be a big win for the large number of desktop Firefox developers!

      Stay tuned
      John.

  1. This might be an obvious comment, but focusing on slow directories sort of misses the point. The fact that we can’t build stuff concurrently across the whole build-tree is a big bottleneck.

    The other obvious problem is that our jar packaging code runs on a per directory basis, just starting python on a per directory basis is a bad idea.

    Also see https://plus.google.com/u/0/108996039294665965197/posts/SfhrFAhRyyd for some potential goals, eg < 3second nop builds would be nice

    • This might be an obvious comment, but focusing on slow directories sort of misses the point. The fact that we can’t build stuff concurrently across the whole build-tree is a big bottleneck.
      Agreed.

      The other obvious problem is that our jar packaging code runs on a per directory basis, just starting python on a per directory basis is a bad idea.
      Yep, simple startup-shutdown process overhead adds up.

      Also see https://plus.google.com/u/0/108996039294665965197/posts/SfhrFAhRyyd for some potential goals, eg< 3second nop builds would be nice
      Thanks for the pointer. And yes, sub-3-seconds for a nothing-to-do depend build would be great. Sadly, right now, we’re just fixing more basic problems, like fixing the dependencies so that a nothing-to-do depend build will not spend ~3minutes incorrectly deciding to compile and then repackage some files! 🙁

      You can follow along (or help!) in bug#748452.

      John.

    • The python overhead can be removed by using pymake and making the various python scripts we use during the build pymake “native” commands, that is, function calls instead of fork/exec. We do that for cl.py, for example (iirc).

Leave a Reply