Catlee made an interesting discovery while digging through historical data in the buildbot db. Its not just that builds feel slower; they *are* slower!
Its important to point out a few things about this chart:
- The machines used over the year are identical for each OS.
- The times explicitly are for only compile+link of full clobber nightly mozilla-central builds. Times for doing “hg clone” beforehand, or for uploading completed builds afterwards, are explicitly excluded.
- Full clobber builds were measured because incremental builds take wildly different times depending on what was being changed.
- Nothing else is running on these machines.
Linux times wobbled for a bit, but take about the same duration, but OSX and win32 times basically doubled in the last year. Win32 went from ~1h25m to over 3hours, and then back down to 2h30mins!? OSX went from ~1h15m to >2h30m, with an expected dip as we transitioned from “PPC+intel32” to “intel64” to “intel64+intel32” builds. Sure, we’ve added more code for Firefox 4.0, but I find it hard to believe that we added *that* much, and only on OSX, Win32!
Whats going on? Well, therein lies the problem. Its hard to tell what is actually happening during the compile-and-link. Because the hardware, OS, and toolchain were consistent, I find myself looking at the makefiles with fresh interest. A quick scan of my mozilla-central clone on my laptop finds 1,797 files (Makefile, Makefile.in and *.mk files) with a combined total of 152,123 lines – and I’m not sure I found everything?!?
In the past we’ve stumbled across and fixed some bugs in Makefiles which helped speed up compile/link time, but this tangled web of makefiles needs some serious spring cleaning. We don’t know where to start yet, but the payback will be totally worth it. If you are interested in helping, or have any ideas, please let me know.
https://bugzilla.mozilla.org/show_bug.cgi?id=623617
Some random thoughts:
First comment, I don’t think we have major differences in Makefiles per OS. There might be a few different files compiled, but overall, we build the same code. So OS X and Windows changing so much, and Linux so little, indicates to me an OS difference.
Second comment, the gradual rise in both OS X and Windows suggests to me that something gradual is changing. Likely we are not making a few hundred small OS-specific inefficient changes to Makefiles (which would explain this gradual OS-specific rise, but just seems unlikely). So some underlying OS issue, that rises gradually, seems relevant.
Code size does rise gradually so that is a suspect. It might just be, that if we charted our code size against this graph, we would see that it closely matches in shape. If so, maybe the rise in OS X and Windows is simply due to the toolchain (compiler, linker, etc.) there being not efficient.
Or maybe it is something like disk fragmentation, which would also rise gradually. Comparing graphs of different machines (with the same OS) could rule that out.
That builds are reasonably fast on Linux suggests to me that the issue isn’t in the Makefiles, but the other OSes – either inherent inefficiencies, or they are not tuned somehow. Is a heavy antivirus running on the Windows ones? 😉
Last thought, do we know why Linux builds are so much faster? That should be fairly easy to figure out (we can dump out how long each compilation takes, and linking, and compare 1 to 1 between OSes). It might help understand the gradual slowdown, perhaps.
This reflects my recent experience 2.5 hours on windows compared to 20 minutes on Linux.
If I knew more about the build system I would help out … I really wish you luck on this one.
Have you spoken to khuey and ted? I know khuey is thinking about redoing the makefiles so that we don’t parse things three times over.
There’s also the work that gladium is doing which should stop us needing to build intermediate libraries which should speed up things a lot (especially on Mac I’m told).
Will a new bug be necessary to find out the cause of this increase in build time?
bug 584474 might help a little when it lands (hopefully soon on the build-system branch)
I don’t think we use that many .mk files in a regular build, but that still leaves 1377 Makefile(.in)s in the tree…
I notice that many of the Makefiles are used to install tests. We still end up scanning them during the export phase, which seems to be a bit of a waste. I don’t know if this can be improved at all.
A minor win would be to consolidate themes into 10 or even 2 Makefiles rather than the present 27.
The next time I have free time, I’ll spend it porting some subdirectory to CMake to see the build speed impact.
2 things I want to add.
– Make is known to be slow on Windows, which is why pymake exists. I don’t know if the buildbots use it (I think they don’t). Part of the problem is the number of Makefiles and the number of times the same information needs to be parsed and resolved (and the heavy usage of stat() in make doesn’t help, stat() is very slow on windows). There is work in progress to make things better there in bug 623617.
– (stating the obvious, here) OSX64 opt builds actually do 2 builds (one for 32-bits, one for 64-bits), so they’re actually twice as fast as windows builds.
[…] on from this blogpost about how much time is spent in Makefiles, pcwalton sent me a link to his blog. He has a great breakdown of mac osx builds, showing where all […]
[…] When I introduced Joey on 23rd May, I started by asking “What has 1,979 files, and now runs about twice as slow compared to a year ago“? […]
[…] code they needed to get something working. There has been a lot of noise recently (mostly from me, but not all) about cleaning this up, but it’s been hard to sustain momentum for various […]