Updating Mozilla’s Update server

(Followup from Tuesday’s Platform meeting as well as bhearsum’s blog post#1 and blog post#2.)

Updating our users is something we do very carefully.

Updating *how* we update our users is something we do very *very* VERY carefully.

On Monday 30sept, nightly users on mozilla-central will be served updates from the new AUS server. Users don’t need to do anything different – it should “just work”. (Of course, if you see any problems with updating, we’d like to hear about it… please file a bug!)

Users on aurora, beta and release channels are *not* being switched over yet. All in good time.

Additional context:
Update servers have to work accurately, securely, consistently and at scale. One of the big scary things that any software company has to do is update the system by which all of the company’s users are served updates. The same is true here at Mozilla. After all, if anything goes wrong, we can’t physically go around to each user’s house/office, everywhere around the world, to fix the problem caused by a bad update to their Firefox or Thunderbird installation. (aside: Mobile-app-only software companies avoid this, by manually uploading to Apple/Google stores, and relying on Apple/Google to do the update distribution for them. This helps small companies that *only* ship mobile apps keep their users up-to-date, but is not without risk. Offtopic, so watch for separate blog post.)

Since before 2007, Mozilla is currently serving all updates using AUS (Application Update Service), an update server that was originally written by @Morgamic. Even when I joined Mozilla in 2007, there were ongoing jaded discussions about a repeatedly deferred project called “AUS2 – The next big rewrite”. All to say, the current production AUS code has served (all of Mozilla’s users and Mozilla’s RelEng!) well, even so many years after it was originally written and put into production. Big hat tip to @Morgamic.

As we scaled up the rest of RelEng infrastructure, some early design decisions that used to be fine became trickier for us. Manually updating the live code on live production server with different version numbers was ok-to-do when Mozilla was back on the old-traditional-slower-release-cadence… but became a fragile concern when we moved to rapid-release-cadence. New requirements like being able to throttle specific operating systems at different update rates. New requirements like custom updates users for users of specific custom builds. New requirements like dropping support for specific sub-variants of an OS…

Finally, we found time last year and again earlier this year to dedicate people in RelEng to make concrete progress on this. As we got interrupted by other big externally-facing projects, we’d suspend work until later. Then resume work a few months later. Then suspend again. Then resume again. In so doing, the name and requirements kept evolving AUS2…AUS3…AUS4…BALROG. More details in bug#832454 and bug#583244. RelEng (and some brave volunteers) have been dogfooding for a while, using Balrog to get updates for nightly mozilla-central builds. We’re ready to roll this out to the next most adventurous population – nightly users on mozilla-central.

This has been the result of years of planning, and quiet focused coding by bhearsum, rail, catlee, nthomas over multiple quarters. This is a ReallyBigDeal ™ – for RelEng, for all Mozilla developers, and for Mozilla’s users – so it is exciting to see come into production.

7 thoughts on “Updating Mozilla’s Update server

Leave a Reply