“we are all remoties” at MozCamp Singapore

4 Comments

[UPDATE: a newer version of this presentation is here. joduinn 25feb2013]

In November, I was asked to present “we are all remoties” at MozCamp Singapore. In the end, I ended up presenting twice! The second time was on the main stage, in the largest room, where the keynote was held.

Giving a presentation in a room that big is always daunting, but during the presentation, it was encouraging to see the people that had been hovering at the back near the coffee machines + snacks gradually move to the remaining empty seats near the front and start taking notes. After the talk, I spent the rest of the day answering lots of questions and getting encouraging feedback.

Interesting that most of the people who came looking to attend “the remoties talk” had either heard an early version of it at Mozilla Summit in 2010, or heard about it from someone who was there. The people who heard it before thought it a good refresher; the people who were hearing it for the first time found it immediately useful to their day-to-day working lives! All self-confessed that this was a talk they never thought they’d find interesting so they almost skipped… but now thought it was essential, and wanted to know if I would I give the same presentation with their group?!?

Humbling. And encouraging. All at the same time.

Every time I get to talk about “remoties”, whether in a formal setting like MozCamp, or in discussions with people in other companies, I have two strong feelings:

  1. Passionate: I feel more and more convinced this topic is super important to the Mozilla community. In the changing face of the software industry, I feel this is becoming important to an increasing proportion of workplaces outside of Mozilla. Given Mozilla’s origins, we have a long standing reputation for successfully working with people in different physical locations. As we grow, we need to learn how to scale this part of our DNA. I feel if any organization can do this right, and show the way for other organizations to do it right, Mozilla can. The impact on the industry cannot be overstated.
  2. Embarrassed: In preparation for each talk, I pour over the slides, fix typos, rehearse and generally try to make it better. Every time, I fix lots of errors. And literally every time on stage, I find even more errors. Feedback and questions afterwards make me tweak the presentation every time. After my recent presentation at Netflix, I completely rewrote most of the presentation. Each time I do this, I feel better about the revised version, and embarrassed by the earlier versions.

Therefore, I’ve posted a PDF of the slides here.

Please do ask questions and/or give feedback/corrections/suggestions – either in comments below, or by emailing me (“joduinn” at mozilla dot com). I’ll do my best to work them all into a revised presentation before the next talk which is already scheduled for outside Mozilla (more news soon!).

Thanks
John.

Infrastructure load for December 2012

1 Comment

  • #checkins-per-month: We had 5,333 checkins in December 2012. This is down from our all-time record of 5,893 in October, but still the 5th highest number of checkins in 2012. As you’ll see below, we were on track to set another record for the month when checkins declined because the “holiday effect”.
  • As usual, we handled this load with >95% of all builds consistently being started within 15mins. Sadly, our test pools continue to have a hard time, both with the increased rate of checkins, and the ever-increasing number of test suites being run per checkin. We’re continuing to work on buying and powering up more test machines, so please continue to bear with us. Meanwhile, if you know of any test suites that no longer need to be run per-checkin, please let us know so we can put scare test CPU to better use.
  • #checkins-per-day: During December, 14-of-30 days had over 200 checkins-per-day, and we peaked on 11dec with 296 checkins. Of interest here is the “holiday effect”: the significant drop in checkins which started on 21st (the Friday of the weekend before Christmas Eve), and continued to New Years Eve at the end of the month. If the rate of checkins in the first 20days of the month had continued, we’d have easily exceeded 6,000checkins per month and set a new all-time record.
  • #checkins-per-hour: Checkins are still mostly mid-day PT/afternoon ET, although they seem to be flattening out a bit. Instead of one specific hour spiking load above others on average 5 hours of every day sustained over 10 checkins-per-hour, and an additional 3 hours of every day sustained 9 checkins-per-hour.

Overall load since Jan 2009

mozilla-inbound, mozilla-central, fx-team:
Ratios of checkins across these branches remain fairly consistent. mozilla-inbound continues to be heavily used as an integration branch, with 27.8% of all checkins, consistently far more then the other integration branches combined (fx-team has 0.7% of checkins, mozilla-central has 2.1% of checkins). As usual, very few people land directly on mozilla-central – in fact more people go through approval process to land on mozilla-aurora.

Infrastructure load by branch

mozilla-aurora, mozilla-beta:

  • 4.6% of our total monthly checkins landed into mozilla-aurora. This is an decrease from last month, but still higher then is typical. I believe this is caused by the number of b2g changes being landed into aurora and beta.
  • 3.2% of our total monthly checkins landed into mozilla-beta. As predicted last month, the recent transition of b2g to beta in late November caused increased checkins on beta for December.

(Standard disclaimer: I’m always glad whenever we catch a problem *before* we ship a release; it avoids us having to do a chemspill release and also we ship better code to our Firefox users in the first place.)

misc other details:

  • Pushes per day
    • You can clearly see weekends through the month.

    #Pushes this month

  • Pushes by hour of day
      It is interesting that mid-morning PT is consistently the biggest spike of checkins during the day. I wonder if this is caused by ET developers doing checkins immediately after lunch, at the same time as PT developers have just settled into the office after coffee and initial emails?

    #Pushes per hour

“we are all remoties” at Netflix

4 Comments

[UPDATE: a newer version of this presentation is here. joduinn 25feb2013]

Netflix asked me to present about how Mozilla handles distributed work groups – “we are all remoties” – in October. This invitation came about because Netflix RelEng team were impressed by the scale and efficiency of Mozilla’s RelEng group – and then totally impressed when they found out that Mozilla’s RelEng group was physically all remoties. Unheard of in Netflix.

Exciting, and a little daunting, all at the same time. Oh, and by the way, could it be recorded as part of their Netflix University training series?

To set context, its worth noting that Netflix has an explicit zero-remoties hiring policy, so this presentation generated quite some debate beforehand and during the Q+A sessions and afterwards.

Big thanks to everyone from Netflix who attended – the genuine curiosity and very direct, honest questions, with me and with each other, were great. After 5.75 years (and counting) in Mozilla’s very-distributed RelEng, I still forget that what feels “normal” for me is atypical for a lot of other companies. All the discussions helped me identify a bunch of assumptions that need to be called out in the presentation. Every time I have a chance to talk about remoties like this, I end up restructuring the presentation yet again to highlight missed assumptions. Thanks to all the Q+A here, the “remoties” presentation at MozCamp Singapore a month later was quite different and I hope much better (separate blog post coming).

Its still surprising to me how much I care about group organization. Done badly, its a big impediment to people getting their work done. Done well, it helps people be more effective. And, as noted by several people at Netflix, many aspects of our we-are-all-remoties group organization practices help even zero-remotie groups be more effective.

Many thanks to Curt Patrick, Gareth Bowles, Carl Quinn and Adrian Cockroft for helping make this happen, as well as for all the lively discussions before and since.

Infrastructure load for November 2012

2 Comments

  • #checkins-per-month: We had 5,646 checkins in November 2012. This is slightly down from last month’s record of 5,893, but still the 3rd highest number of checkins in 2012, after October (5,893 checkins) and August (5,803 checkins). Again, we handled this record load with >95% of all builds consistently being started within 15mins. Sadly, our test pools continue to have a hard time, both with the increased rate of checkins, and the increased number of test suites being run per checkin. We’re continuing to work on buying and powering up more test machines, please continue to bear with us. Meanwhile, if you know of any test suites that no longer need to be run per-checkin, please let us know.
  • #checkins-per-day: During November, 19-of-30 days had over 200 checkins-per-day. Worth noting was spikes of 295 checkins on 13nov and 301 checkins on 27nov.
  • #checkins-per-hour: The peak across this month was a new record 13.36 checkins per hour between 10-11am. Of interest, I note that for 25% of every day (6 of every 24 hours), we sustained over 10 checkins per hour.

Overall load since Jan 2009

mozilla-inbound, mozilla-central, fx-team:
Ratios of checkins across these branches remain fairly consistent. mozilla-inbound continues to be heavily used as an integration branch, with 26.9% of all checkins, consistently far more then the other integration branches combined (fx-team has 0.8% of checkins, mozilla-central has 2.2% of checkins). As usual, very few people land directly on mozilla-central – in fact more people go through approval process to land on mozilla-aurora.

Infrastructure load by branch

mozilla-aurora, mozilla-beta:

  • 4.9% of our total monthly checkins landed into mozilla-aurora. This is an decrease from last month’s high, but still higher then is typical. I believe this is caused by the number of b2g changes being landed into aurora, which then stopped ~20th November, with the migration of b2g from aurora to beta.
  • 2.2% of our total monthly checkins landed into mozilla-beta. This is typical for beta. I suspect the recent transition of b2g to beta just before holiday season here in the US, means we didnt see many b2g-related checkins on beta for November and I do expect beta to show increased checkins for December.

(Standard disclaimer: I’m always glad whenever we catch a problem *before* we ship a release; it avoids us having to do a chemspill release and also we ship better code to our Firefox users in the first place.)

misc other details:

  • Pushes per day
    • You can clearly see weekends through the month.

    #Pushes this month

  • Pushes by hour of day
      It is interesting that mid-morning PT is consistently the biggest spike of checkins during the day. I wonder if this is caused by ET developers doing checkins immediately after lunch, at the same time as PT developers have just settled into the office after coffee and initial emails?

    #Pushes per hour

Android x86 builds now on tbpl.mozilla.org

2 Comments

This week, kmoir quietly enabled Android x86 builds on tbpl.m.o. This was already announced in this week’s Mozilla developer platform meeting but is important enough to repeat here too.

These builds are being run per-checkin, as well as nightly. There are updates for nightly builds. Obviously, there’s a lot of details to still work through (crash symbols, release builds, signing, etc…), but this is a major milestone worth noting. You can follow progress, and offer help to kmoir, on the remaining work in bug#750366.

As always with standing up new platforms in production, if you see problems on tbpl.m.o, please do not just hide the breakages; instead, please file bugs in mozilla.org:ReleaseEngineering and we’ll be happy to investigate.

Big thanks to kmoir, callek, jmaher, blassey and others for making this happen.

Yoda does powerpoint

2 Comments

Concise. Accurate. Perfect. Just perfect.

(credits: Looks like graphjam.com is now part of cheezburger.com empire, which confused tracking down the original author. Digging around, I found a few versions of this on different sites going back through 2011. I *think* its originally from Nathan Yau on flowingdata.com, or Garr Reynolds, posted here but if you know anything about the original author, please let me know.)

multi-locale b2g builds now on tbpl.mozilla.org

1 Comment

(This is so important to FirefoxOS localization that its worth cross-posting.)

In case you missed it, bhearsum just announced that b2g builds on tbpl.m.o are now multi-locale.

There’s two sets of locales here, so to be precise:

  • existing desktop and unagi, otoro builds: ar, en-US, fr, es, pt-BR, zh-TW. These locales are specifically chosen because they help developers debug various font-specific issues (right-to-left, etc).
  • new desktop builds: ar, as, bn-BD, ca, cs, cy, de, el, en-US, eo, es, et, eu, ff, fr, fy-NL, ga-IE, gd, gl, he, hi-IN, ht, hu, id, it, ja, ko, lij, ml, nl, or, pa, pl, pt-BR, ro, ru, sl, sq, sv-SE, te, tr, ur, zh-CN, zh-TW. These are the list of locales being actively worked on. We expect this list to continue growing. For progress, track this file: https://github.com/mozilla-b2g/gaia/blob/master/shared/resources/languages-all.json

For more info, check out bhearsum’s blogpost1 or blogpost2 or bug#766962.

Big thanks to bhearsum, stas, aki and others for making this happen. And hey, if you want to help with localizing boot2gecko / FirefoxOS into your locale, please contact the friendly folks in dev.l10n newsgroups – they’d be delighted with your help.

Otoro and Unagi builds now on tbpl.mozilla.org

1 Comment

In case you missed it, Catlee has quietly enabled Otoro builds on mozilla-beta. This means that now:

  • per checkin, we generate a full oroto build. This otoro build is on the same changeset as the unagi build, the gecko-compiled-for-arm-with-b2g-enabled builds, and all the usual desktop Firefox and mobile Fennec builds that we already generate per checkin. Having these builds on tbpl helps narrow down build regressions. However, as these full otoro builds are ~164MB each, and we do a *lot* of builds per day, we do not plan on uploading these until we have tests to run against them. Of course, if you have a need for these, please let us know, and we’ll enable them, its an easy change.
  • every night, we generate and publish an oroto build, an unagi build, the gecko-compiled-for-arm-with-b2g-enabled builds, b2g desktop builds, on the same identical changeset. As usual, all the desktop Firefox and mobile Fennec builds are built on the one changeset.

These new otoro builds have already been quickly evaluated by RelEng and by QA, and as far as we can tell, they look fine to us. However, if you have an Otoro phone, and have access to the private share, you can help! Please grab one of these new Otoro builds, which you’ll find alongside the previous otoro builds, install it on your Otoro phone and give it a try. We’re expecting to transition everyone over to these builds in the coming days, so if you see anything which makes you think we should stop rollout, please file a bug and we’ll get right on it.

RelEng production systems now in 3 AWS regions

4 Comments

tl;dr: We’re now running our build infrastructure across 3 different Amazon regions. This makes us more robust and *cheaper*?! Whats not to love?! :-)


Mozilla-to-AWS network diagram


This is important for 3 reasons:

1) It means RelEng can keep up with load, and hence keep all trees open, even if an Amazon region goes offline, or if a VPN link fails. Amazon doesn’t lose a region often, but it can happen. Mozilla doesn’t lose a VPN link often, but it can happen. Using 3 different regions, with 3 different VPN links, makes it unlikely we’d lose all at one time. In fact, multi-region outages on AWS are so rare, that the most recent multi-region outage I could find was in June2008.

2) As our first “go hybrid on AWS VPC” is a clear success, we’re experimenting with VPCs that are further away (ie slightly slower connection) from our inhouse colos. This allows us to start using regions that are cheaper (good!), and not inside the same earthquake zone (also good!).

3) We have a month or so of realistic usage data on AWS, as all the builds which we can run on AWS are now running on AWS. (The only exceptions are recent requests for new B2G builds, which we’re still setting up). This means that we can now make decisions about bulk-purchase-in-advance instances (called “reserved instances” in AWS lingo), which buys us the same compute power for ~1/4 the price. As these reserved instances are region-specific, and cannot be refunded or swapped around to other regions later, it was worth bringing new cheaper regions online first before we start bulk purchases in those cheaper regions.

All in all, this is a big deal.

Oh, and it is worth noting for the record that these additional regions were brought online, and into production, without needing any downtime, and without any hiccups! Big thanks to catlee and rail and ravi for their work.

Khmer team at MozCamp Singapore

2 Comments

At MozCamp Singapore, I was delighted to meet up with some of the Khmer localization team again. From left to right, here’s Vannak Eng, Sokhem Khoem, John O’Duinn, Sophea Sok, Mark West and Piseth Kheng. (Javier Sola was unable to make the trip at the last minute, so is sadly missing from the photo.)

They are very modest, great fun, and super sharp. As a sneak surprise, it was great to see them hack together a Khmer version of FirefoxOS during a quick hackfest arranged by Stas at MozCamp.

Since we last met at my workshop in Phnom Penh in Jan 2012, we officially released Khmer as part of the Firefox13.0 release and government policy now has Khmer Firefox being used in government departments in Cambodia! Great stuff.

If you want to find out more about the Khmer team, or want to join them in localizing, please look here: https://wiki.mozilla.org/L10n:Teams:km or http://www.mozillakm.org/

Older Entries Newer Entries