Is it git or is it github?

Background:
At the Mozilla AllHands in Sept2011, I was surprised to find that my proposed session on git was accepted, and even more surprised at how well this session was attended! The room was full of people from all different groups across Mozilla. And boy, were people passionate. The slides don’t capture the lively back-forth discussions, but this is obviously a topic that people care deeply about, hence my blog post.

Here’s a quick summary of the main speaking points:

  • Some projects in Mozilla are now basing their code in github.com, not on Mozilla’s existing release infrastructure. They do this despite the fact that github in one sense has reduced functionality (doesnt do all builds/test/automated-regression alerts/etc), and despite that using github like this causes extra manual headaches of periodically importing/exporting code, and complicates branch mechanics of releases. So why do this?
  • Theory#1: git vs hg
    • Some people prefer to use the git commandline tool. At same time, some people prefer to use the hg commandline tool. Each Distributed Version Control System has various technical merits, and drawbacks, so I put this git-vs-hg debate into the same category as an emacs-vs-vi debate. (I do *not* mean that as a put down – I mean this in the nicest possible way – the reality of life is that people have their own unique hard-earned preferences)
  • Theory#2: github vs hg+bugzilla+graphserver+tbpl
    • Instead of this debate being about the mechanical differences of the command line tools, I instead believe the debate is actually about Developer Workflow. This is not just theoretical; it makes a day-to-day difference to how developers get their job done.
    • Every mozilla developer ends up using hg+bugzilla+graphserver+tbpl. These are organically grown, over the years, and each went from hacky-experiments-to-mission-critical once people started to rely on them. However, they are not smoothly cross-integrated.
    • Mozilla saw a jump in checkins-per-day when tbpl.mozilla.org went live; tree closures were shorter, in part because people could more easily figure out who broke the build, and who/what to back out. (Of course, there’s more to it then *just* tbpl, but the usability issue here is the key point).
    • While different people have talked about this, in different companies over the years, I think github.com is a really compelling proof point that good developer workflow makes a difference. If you can make it easier for a developer to find a bug, get the bugfix reviewed and landed, and be able to see if it made a difference, developers will WANT to use your production systems.
    • Apart from workflow, there’s also security and autonomy concerns. How valuable are your bug discussions, review history, regression tracking? Who else would you entrust this valuable data to? If you find someone you trust, how do you know they’ll be around as long/longer then Mozilla?

    Hal Wine has now got up-to-speed in this area because of his work on git-staging. Next, Hal is starting to see how feasible would it be to support both git command line tools and hg command line tools in our production RelEng + WebDev + IT systems. And meanwhile, Hal, LauraT, myself and some others are gathering to see what we can do to improve developer workflow. As Mozilla grows, its important to make it easier to do coding work here – after all, this is one way to encourage more people to contribute, and to help Mozilla scale.

    If you have suggestions, or want to help, we’d love to hear from you – or if you prefer, you can just follow the tip of the iceberg in bug#713782.

Welcome Hal Wine

I’m really excited to welcome Hal Wine to Release Engineering.

Hal has lots of experience in the trenches of Release Engineering, with a rare combination of experience working in distributed groups (not just solo!), working on small/embedded systems and working at scale. To make him even more unique, he’s a really nice guy *and* he even wears an occasional Hawaiian shirt! The only down side is that we’re now rethinking our previously lenient policy on bad puns 🙂

He’s reporting to me, and already helping bring more tegras online to help close out the “android as tier 1” project, while he’s also got his teeth sunk into a large interesting skunkworks project (more details coming soon in a separate post).

Hal’s based here in the San Francisco office, but you can find him on irc as “hwine”. Please do stop by and say hi – he’s already getting quite settled in as part of the family.

[UPDATE: Hal’s blog is now up and running here. joduinn 03-jan-2012 ]

Welcome John Hopkins and Thunderbird to RelEng

I’d like to welcome John Hopkins to Release Engineering. I apologize that this welcome is a very belated welcome; John started in RelEng on 19th Sept and coop already posted a welcome here.

From one perspective, its not really any change here, John had already been coming to the RelEng group meetings since earlier this year when Mozilla Messaging folded back into Mozilla Corporation. Even before that, John was a familiar face to us, working on lots of the same buildbot and release automation code as the rest of RelEng.

From another perspective, there is potential for significant change here. Are there economies of scale if we merge the existing Thunderbird release engineering systems and the existing Firefox release engineering systems together? Now that we are in rapid-release cycles for both products, could we sim-ship Thunderbird with Firefox? How would we handle long term support issues for both products? Some things seem obvious, like how the tool chains for compiling-and-linking are identical for both products, so sharing machines and setup would help, and how we can improve Mozilla’s busfactor – jhopkins is the only person in RelEng who has shipped Thunderbird release since the old TB2 automation. Some things are less obvious, like what to do with the different spec hardware which was being used by MoMo. All this will play out over the coming months. Suggestions, comments, ideas all welcome!

As a good sign of how well everyone already works well together, the same week jhopkins officially joined RelEng, we discovered a serious problem: the production MoMo machines for Thunderbird were sitting on a shelf in the old Vancouver office, and these production machines were a few days away from being powered off as part of the move to a new Vancouver office. As the new office wasnt ready, so Vancouver people had to move to a temp-workspace for some weeks, and the Thunderbird production machines couldnt come to the temp-workspace offices. This meant the Thunderbird production machines would be offline, and the trees closed for the duration. No-one liked the idea of closing the Thunderbird trees for weeks, so we all piled in to help.

An immense volume of behind-the-scenes work took place, the Thunderbird continuous integration processes were migrated to our existing RelEng colos, toolchains setup on slightly different spec machines, and then production builds+tests enabled in the RelEng colo – all before the old Vancouver office was vacated, and all without closing the Thunderbird trees at all.


Bug#688230
has all the details. As I said in the “Friends of the Tree” section of the Mozilla Foundation weekly call at the time, this was an impressive volume of behind the scenes work, and developers using production systems never hit any problems!

Welcome John!

Food experiments in Japan: continued in California

(long overdue post; found these photos while writing another travel post for my website.)

Immediately after I returned from Japan last year, I was really happy to find these fruit flavoured, gummy chews also available in SF. Weirdly addictive, and I’ve made specific trips out to buy more from the only shop I know that sells them in SF.

In the same week that I returned to California, someone else (who shall remain nameless by request!) brought the following back from his trip.

These tasted awful, and the flavour lingered for hours, despite vigorous toothbrushing 🙁

Mozilla visit to Dublin City University and Dublin Institute of Technology

About the same time that Tristan was at the OpenWeb conference in Dublin, Ireland, I was also in Dublin, for a slightly different reason.

The record heavy rain complicated logistics a bit, but it was great to meet with people at Dublin City University and also Dublin Institute of Technology about opportunities in open source projects. There was lots of curiosity and interest about Mozilla, the work that goes into Firefox and even the very idea of working at a company that was all about open source. The impromptu corridor discussions about increasing involving open source were exciting. It was a jam-packed day, but I was really grateful it all came together.

Big thanks to Mike Scott, Markus Helfert, David Sinclair, Martin Crane, Maeve Long, Geraldine Farrell, Mary Regan and Damhait Harvey for making this possible.

(ps: when I say rain, I mean *rain*. (“More than one month’s rain fell on Dublin in 24 hours…”). Click photo for full story.

How do enterprises handle “rapid releases” of other software products?

There’s been a lot of discussion in the Enterprise Working Group emails, and on the monthly EWG calls, about how Mozilla’s change to rapid release cadence is impacting enterprises. While brainstorming in a recent EWG call, I asked:

“how do enterprises handle rapid releases of other software products?”

After a few minutes discussion, we agreed to continue the discussion afterards. I’ve since emailed this same question to some lists, but am now cross-posting to my blog, in the hope that even more will see this question.

Does anyone have examples of other software products that do “rapid releases” and are being successfully handled by enterprises? If so, can you share some details? (If you prefer to email me privately, that is totally cool, please use my joduinn [at] mozilla [dot] com address). I ask because maybe we don’t need to reinvent the wheel here. If there is a tried-and-tested approach that already works well for other applications, that same approach might also makes work for Mozilla’s Firefox.

Some obvious examples of “rapid release software” in enterprises are OS-patch-updates updates, but what about Microsoft Office? Google Chrome? Flash? Java? Anything else? (In my mind updating anti-virus signature files is different scenario, but I could be misreading this scenario).

Thanks.
John.

Solving three different problems on the Enterprise Working Group?

(I raised this in the Enterprise Working Group (EWG) call this week, and it resonated strongly with some people. Therefore, I’m posting this out more widely to hopefully get more feedback.)

After all the recent discussions about “what enterprise users wanted”, I found myself wondering if we were all even attempting to solve the same problem, so I stepped back, and re-read *lots* of posts from different enterprises over the last few months.

I now believe Mozilla, and the enterprises in the Enterprise Working Group, are working to solve three overlapping but orthogonal problems.

1) Cost of verifying that a new version of Firefox is safe to deploy.
Some enterprises verify with a quick running of an ACID test. Some SaaS vendors verify by doing wider testing, and deploying bugfixes to their products. One complication for SaaS vendors is that end users may be running on newer versions of Firefox anyway, on non-enterprise machines. This can cause problems that make both the SaaS vendor, and Mozilla, look bad. We havent spent much time on this so far.

(I still wonder if we could design a testsuite compatibility test suites, in the same mindset as HTML5, JavaCompatibilityKit, etc that might help speed up this verification step?)

2) Cost of deploying a new version of Firefox to all supported users
Once an enterprise has verified a specific version of Firefox, how much effort does it take to deploy that new version onto all their machines/users. This discussion typically quickly focuses on MSI and similar technologies for doing widespread deployments, although there are some other options like an inhouse AUS or equiv. Regardless of the technology used, the idea here is to have a centralized way to move forward all users to a newer version of Firefox, without having to walk/drive/fly a human to every computer in order to manually do a new install. Sometimes this also includes discussions about silent updates.

3) Frequency of doing this all over again
The frequency of the Firefox release cadence directly impacts how often enterprises have to go back to do (1) and (2) all over again.

The verify+deploy work is typically so painful that most enterprises only do this for “new feature” releases, and not for “security only dot-releases”. For most enterprises, it seems that Mozilla’s cadence of “new feature” releases every 12-18-24 months was infrequent enough that the verify+deploy work was tolerable. However, Mozilla’s more frequent feature releases means more frequent cost of verification+deploying, which can become a significant business problem.

The ESR proposal is attempting to address this increased recurring cost and this is where most of the discussions have been taking place so far.

(It’s worth noting that everyone involved from Mozilla and different enterprises understands and agrees that Mozilla’s faster cadence of “new feature” releases is important for Mozilla to remain relevant in the browser marketplace.)

Just my thoughts, but I’d be curious to hear what others think.

John.

Modification to the Extended Support Release proposal

As some of you reading this may already know, there’s an proposal under discussion for how Mozilla could support releases for longer durations as requested by some enterprises.

I’d like to modify the latest Extended Support Release (ESR) proposal as follows:
1) Mozilla would not generate overlapping Extended Support Release (ESR) builds
2) Change the timing of when enterprises start to deploy new versions of Firefox.

The details of this are subtle, so please bear with me, while I try to explain with an brief example:

1) Mozilla anoints a specific Firefox release to be supported for a total of 42 weeks.
For the purposes of discussion, lets say this is Firefox 8.0. This means Firefox 8.0 users would be guaranteed to receive seven scheduled security-only dot-releases (plus, of course, any unplanned security chemspills that came up in that 42 weeks timeframe). Before the end of the 42 weeks, Mozilla would anoint another release to be supported for 42 weeks. To continue this example, after Firefox 8.0, the next release to be anointed would be Firefox 15.

Schedule-wise, this means:
** 8.0.1 would sim-ship with 9.0.
** 8.0.2 would sim-ship with 10.0.

** 8.0.7 would sim-ship with 15.0
** 15.0.1 would sim-ship with 16.0

2) Enterprises would start to verify/certify with the 8.0.0 release.
However, enterprises would *not* deploy 8.0.0. Specifically, enterprises would only start deployments of 8.0.1 at the time that 9.0 is released. (This is important for mechanical details about how updates are served – see more below.). (To be precise, enterprises deploy the latest 8.0.x available at the time 9.0 is released; if there are no chemspills, this would be 8.0.1, but if there are chemspills, it is always the latest latest 8.0.x available at the time of the 9.0 release).

3) When doing releases, RelEng makes a small change to how we publish updates between releases:


* 8.0.1 would sim-ship with 9.0.
** Mozilla would NOT enable updates from 8.0.0 -> 8.0.1
** Mozilla would enable updates from 8.0.0 -> 9.0.0
* 8.0.2 would sim-ship with 10.0.
** Mozilla would enable updates from 8.0.1 -> 8.0.2
** Mozilla would enable updates from 9.0.0 -> 10.0.0

* 8.0.7 would sim-ship with 15.0
** Mozilla would enable updates from 8.0.6 -> 8.0.7
** Mozilla would enable updates from 14.0.0 -> 15.0.0
* 15.0.1 would sim-ship with 16.0
** Mozilla would NOT enable updates from 15.0.0 -> 15.0.1
** Mozilla would enable updates from 8.0.7 -> 15.0.1
** Mozilla would enable updates from 15.0.0 -> 16.0.0
* 16.0.1 would sim-ship with 17.0
** Mozilla would enable updates from 15.0.1 -> 15.0.2
** Mozilla would enable updates from 16.0.0 -> 17.0.0

Thats it.

There are a few reasons why I recommend these modifications to the proposal:

1) Minimal changes to RelEng release automation or our update infrastructure. This means mechanically, we can put this new proposal into action more easily.

2) No need for any metrics infrastructure changes – all current infrastructure should just work as-is.

3) No need for Mozilla to generate overlapping concurrent ESR releases. This is significant because:
3a) the original proposal would have Mozilla sim-ship Firefox13.0, Firefox8.0.5esr and Firefox13.0esr at the same time as we also migrate aurora->beta and central->aurora. This is a significant increase in the mechanical work for RelEng in a *very* tight timeframe.
3b) this reduces the number of landings developers have to do for security fixes for 12-weeks-in-every-42 weeks. This also reduces the number of release builds to be generated if we have a chemspill in that 12-weeks-in-every-42-weeks window.

I believe these modifications still meet all the same objectives of the original proposal, yet are mechanically easier to implement. Therefore, I believe Mozilla could put this modified ESR proposal into action more easily then the existing ESR proposal.

Let me know if I missed anything, or if you have any concerns.

Thanks
John.

Marc Jessome playing with fire

After a great internship, we sadly waved goodbye to mjessome a few weeks
ago when he went back to UWaterloo to continue his studies.

Marc spent the summer working with Lukas Blakk on integrating two large parts of our developer infrastructure: TryServer and Bugzilla. Marc’s work is running in staging right now, and if all continues to go well, this should start to see light-of-day in production soon.

While it might sound strange to tie these two very different systems together like this, the idea here is quite fundamentally important to everyone who uses TryServer for Mozilla, and should roll out to other RelEng production systems soon afterwards. The curious can click here for more details or follow bug#657828.

I am happy to report that Marc seemed to still have lots of fun during his internship. Marc’s fire eating class at The Crucible was a sideeffect of a discussion he and I had during the RelEng work week in Toronto. By comparing these photos with photos taken when I took the class, you can easily tell that mjessome is much better at this then I ever was. I also discovered that interview questions about juggling and riding a unicycle, were somehow missing from RelEng’s list of interview questions, but I’ve now fixed that.

Thanks for all your great work, Marc, we really enjoyed working with you, and hope we cross paths again soon.

ps: if you are reading this, and are interested in an intership at Mozilla, have a look at our internship job postings.