R.I.P. Carroll Shelby

No Comments

On Thursday, 10-may-2012, Caroll Shelby died, aged 89.

Originally a chicken farmer, he became a race car driver until 1959, when heart problems brought his successful racing career to an end, so he switched again to focus on designing and building powerfully fast, brash, muscle cars that he loved to drive, including some great icons:

AC Cobra/Shelby Cobra
AC Cobra

Ford Mustang GT390 (made famous by Steve McQueen in the movie “Bullitt“.) Bullitt screenshot
…and several other
Mustang-variants.

Dodge Viper
Dodge Viper

Ford GT
Ford GT screenshot

Even at 84 years of age, while a consultant on developing the new high-powered Ford GT, he still loved driving fast, and test drove the new Ford Mustang GT500 on a race track at 150mph.

I didnt realize until today that he was also one of the world longest living heart transplant recipients, having received a heart transplant in 1990.

A more detailed bio is available on his Shelby-America company website, on Wikipedia, The New York Times and The Washington Post .

Are you a night owl?

3 Comments

I was not a participant in either of these two medical studies.

Caffeine Disrupts Sleep for Morning People, but Not Night Owls
“…for the early risers, the more caffeine in their bodies, the more time they spent awake during the night after initially falling asleep. This was not seen in the night owls.”

‘Morning people’ and ‘night owls’ show different brain function
“…In morning people their cortical excitability actually decreased throughout the day. It was highest in the morning and lowest in the evening,… It was the opposite for evening people; their brain activity was highest at 9 p.m.”

Those of you who know me, know that I am a night owl who drinks a lot of coffee, and routinely finishes a pot of coffee before going to sleep. I used to think it was just me. However, after reading these medical studies, I wonder how many others out there are also night owls with a high tolerance for caffeine?

Mozilla’s Release Engineering published in AOSA(vol2)

1 Comment

I’m excited to say that The Architecture of Open Source Applications (vol2) is now available.

book coverThis book is a collection of great chapters, each written by different people from different aspects of the open source world. For armenzg, catlee, lsblakk and myself, this was a great opportunity to write a chapter describing the release automation behind Mozilla’s Firefox.

If you were ever curious about the process (and the code!) that allow us to do things like sim-ship a Firefox release in 93 locales, or lets us ship 8 emergency chemspill releases in 42 hours, then please have a read. Hopefully, this might also help others who are doing release automation at scale for other products. If you find a typo in the book, or something that you think could be improved in our automation, please be kind and let us know.

Our release automation constantly evolves, as new product requirements arise or we find new ways to obsessively streamline things, so it’ll be interesting to see how this chapter holds up over time.

In addition to the print version (buy here), the book will soon also be available for purchase as a PDF, for purchase as ebook from Amazon and as a free html download (links coming). All royalties go to Amnesty International.

Big thanks to Greg Wilson and Amy Brown who did a great job of making all this happen, explaining mysteries of the book publishing world to us, and generally cat herding armenzg, catlee, lsblakk and myself through the publishing process, within deadlines, all while also doing our “day jobs” at Mozilla.

(Interesting coincidence: kmoir, who recently joined us at Mozilla’s RelEng, is an author of a chapter in the earlier Architecture of Open Source Applications (vol1) – another interesting read.)

Thank you Armen, Chris and Lukas for helping make this book a reality.

40,207 test jobs in a 24 hour day

1 Comment

On 03-may-2012, RelEng infrastructure processed over 40,000 test jobs in a 24 hour day. 40,207 to be precise.

#test jobs in a day

For comparison, 30,000 test jobs a day was a big milestone for us only 5 months ago (09-dec-2011). The milestone of 20,000 test jobs a day was only two months before that (19-oct-2011).

The initial value of ~100 for May2007 is a complete guess on how many test jobs our 2?3? test machines could handle back then when I first started at Mozilla, in between starting to bring up unittest automation, all the tree closures, and needing to rebuild-firefox-as-part-of-tests. Even if the initial value is slightly wrong, the order of magnitude is right. The gap since from 2007-2010 is because we did not have any metrics in place.

#test jobs in a day

Exciting stuff!

Infrastructure load for April 2012

4 Comments

NOTE: In April, we lost 20% (6 of 30 days) of metrics data during the switchover of the backend databases from sjc1 to scl3 data center. Despite missing 20% of the entire month’s data, we still recorded more jobs in April then entire November2011 or entire December2011.

  • #checkins-per-month: April looks like a drop, with “only” 3,327 checkins, but this drop is under-reporting because of the 20% lost data. For comparison, previous records are: March2012 (4,508 checkins), February2012 (4,027 checkins), January2012 (3,962 checkins), December2011 (3,262 checkins), and November2011 (3,209 checkins).
  • #checkins-per-day: We set three new records in April: 280 checkins per-day on 25-apr-2012, along with 278 checkins-per-day on 26-apr-2012, and 246 checkins-per-day on 24-apr-2012.
  • #checkins-per-hour: We peaked at 7.1 checkins-per-hour, which is lower then usual, but to be expected given the lost 20% of data.

Overall load since Jan 2009

mozilla-inbound, fx-team:
mozilla-inbound continues to be heavily used as an integration branch, with 22% of all checkins, by comparison with the fx-team branch (~2% of checkins) or mozilla-central (~4% of checkins).

Infrastructure load by branch

mozilla-aurora, mozilla-beta:

  • ~3% of our total monthly checkins landed into mozilla-aurora, consistent with previous months.
  • ~1% of our total monthly checkins landed into mozilla-beta, consistent with previous months.

(Standard disclaimer: I’m always glad whenever we catch a problem *before* we ship a release; it avoids us having to do a chemspill release and also we ship better code to our Firefox users in the first place.)

misc other details:

  • Pushes per day
    #Pushes this month

  • Pushes by hour of day
    #Pushes per hour

Welcome Kim Moir

1 Comment

On Monday, we were delighted to have Kim Moir join Mozilla’s Release Engineering group. She’ll be working with coop, who is (coincidentally) also based out of Ottawa, but the rest of us can find her on irc as “kmoir”. Please do say “hi” and welcome her to Mozilla.

Kim brings great perspective to the group, as she has worked on Eclipse release engineering for years, has worked with distributed groups, and has also done lots to raise awareness about release engineering with the open source community. If you are not already familiar with her work, you should read her blog here (great title, by the way!).

Welcome, Kim!

ps: Yes, we’re still hiring. Our Release Engineering group helps Mozilla’s developer community write great code, and then efficiently gets that code into the hands of our Firefox users. If you are passionate about open source, and about building large complex distributed infrastructure, we’d love to hear from you.

Cleaning up the build process for Fennec developers

5 Comments

A little over a week ago, ted, kyle, joey, jhford and myself got together to see how we could help improve the Makefiles specifically to help Fennec developers.

I thought people might be interested in a quick progress report.

The typical compile-package-deploy-test cycle for Fennec developers is time consuming, and tricky, mostly because of layers of workarounds for dependency misfires in the Makefile logic. Because developers do this countless times every day, every developer wants this compile-package-deploy-test cycle to be as quick as possible, so they can make progress quickly. Over time, every developer grows their own trusted workarounds to generate valid builds as quickly as possible, which is a fair solution to suboptimal Makefiles. However, badly documented, poorly understood, fragile workarounds can be a recurring cause of stress and delays, especially when one mistep in the workarounds can send you down a long false debugging trail. Not what people need right now.

To figure out where to start, we:

  1. Asked a few developers for specific pain points that they would like us to start on first. Everyone has their own pet peeve. But if there was something *all* developers mentioned, we looked there first. If there wasn’t a bug already filed, we filed one.
  2. Used some very cool tools Joey created to generate reports on time-usage-by-directories, to help us decide where the biggest time-sink is, and hence where to focus first.
  3. Studied the “how to build fennec” wiki page. This let us see what all new Fennec developers learn first, as well as learn some of the more commonly used workarounds and gotchas. Over time, developers have learned their own undocumented workarounds for different gotchas, so different people follow different build process steps, but this wiki was a great starting point.
  4. Triaged through existing Core:BuildConfig bugs to see if there are any unresolved bugs which identify problem areas.
  5. Timed full clobber builds. On my MBP, I get clobber Fennec builds completing in 20-40mins. Its not yet clear to me why such a range in timings. Joey and jhford and myself all got similar range of timings, which I found even more interesting because we each got the same range of number even though our machines range from MBP laptops up to high-spec 8 way desktop machines.
  6. Timed depend builds. Of course depend builds are trickier to measure – what gets built depends on what you touch. So I tried the “simplest” case. If I do a full clobber build, change nothing, and start a depend build, how long does that depend build take? Note: there was no change after the previous clobber build, so if everything was working right, this should take minimal time to traverse dependencies and should not recompile/relink anything. On my MBP, “nothing-to-do depend builds” take 2m45s -> 3m15s. And worst of all, imho, always did a bunch of recompiling, rebuilding manifests… even though we know nothing has changed… urgh.

To make things better, so far, we:

  • Setup, tested and deployed new Android r7b NDK and faster “gold” linker to production RelEng machines (bug#675572). We also added this same NDK and linker to posted pre-build toolchain for easy developer usage (bug#745956). As well as being faster to compile and link, this also gives us support for Android4.0 (Ice Cream Sandwich).
  • Filed and landed Bug#746741 is to add a new makefile target “build_and_deploy” to encapsulate the rebuild/repackage/install steps on Android. While this might seem like an odd place to start, this is important because there is a lot of confusion caused by the different makefile workarounds that each developer has evolved for themselves over time. Some of these workarounds become folklore which people do on faith, even when the original need has since been fixed. Figuring out what is “supposed to be done” and publishing one clear target which does that consistently, gives us something to keep working on. As we improve things inside the Makefiles, this “build_and_deploy” target will keep getting better, including handling more of the current workarounds, and getting faster every time. As developers discover that this one supported “build_and_deploy” target safely does everything that they need, and is a fast, safe alternative to their workarounds, developers will gradually no longer feel a need to do workarounds… which means developers can instead focus on making the shipping product better.

Having said all that, we’re still only scratching the surface. There’s lots more to fix. Much much more. So we filed bug#748452 to track the list of most-urgent things we’ve found so far.

If you are building Fennec and knows of a problem with the Makefiles that impact your ability to get work done, please file a bug in Core:BuildConfig for us. Of course, the trick is to do this without breaking any other groups that use the same Makefiles. If you already filed a bug before, and its still open, please cc us on the bug, OR at least put a brief description of the problem into an email to any of us, and we’ll triage.

Thanks for the patience. More news soon as we make things better.

Kilimanjaro: “trains, planes and automobiles”

1 Comment

Until recently, Mozilla has mostly focused on shipping one product – Firefox. (Yes, I know Mozilla shipped other products like Thunderbird, SeaMonkey, Camino, but they used the same tool chain, and same/similar release cadence, so can be thought of as similar, if not identical, for the purposes of this discussion.). I think the tight formation flying of Blue Angels seems a good analogy!

Times are changing.

Mozilla now ships multiple very different products: Firefox, Fennec, Sync, BrowserID and soon Boot2Gecko. Each of these products are built by different groups of people, with different toolchains, different features, different processes for tracking blocking/shipping criteria, and most importantly, different release schedules.

If we want to ship a new feature that requires work coordinated across different products, we need all the different parts of the feature to ship at the same time. Coordinating this means each product needs to plan backwards in time, to coordinate when they start working on their part of the shared feature. Also, any schedule slip in any one product needs to be cross-coordinated across all products.

Coordinating the different parts of a new feature into each of these different products is tricky.

Having all products ship their part of the overall feature in a coordinated manner is even trickier.

To me, it feels like arranging transportation for a family event. Some guests live in the same town and can walk over at short notice. Some guests will drive. Some guests will take a train. Some guests will fly in – and some of those will have to get visas. All have to arrive in the same location by the “release date” (ie the day of the family event). This does not mean everyone starts using airplane ticket agents to pay for train tickets or to refuel their cars. However, this does mean everyone has to plan, according to their own release schedules, and transportation of choice, when they need to start making travel plans in order to still arrive at the event on time.

As Mozilla starts to build more complex features across our range of shipping products, we’ll need to learn this new cross-product coordination skill and get better at it, so we can do it again. And again. And again.

Kilimanjaro is the first “coordinate-what-parts-need-to-go-into-which-products-and-by-when-so-they-all-ship-as-one-coordinated-feature” project. There will be others; its cool to see the start of this really cool new phase for Mozilla.

New Mozilla mirror in Cambodia

1 Comment

A few weeks ago, sabay.com.kh became an official mirror node for Mozilla in Cambodia. This will help Firefox users in Cambodia have faster downloads of security updates, as well as anyone in Cambodia looking to download fresh installs of Firefox. Given the market presence and bandwidth capacity of Sabay (and parent company CIDC-IT) in Cambodia, this is great news.

Many thanks to Mike Gaertner, COO of CIDC-IT, for taking the time to meet with me in Phnom Penh in January 2012, and then taking the personal interest to work through the mechanics to make this new mirror node a reality.

We are all “remoties”

2 Comments

At the Mozilla Summit in sept2011, we ran a session on working remotely at Mozilla.

I was surprised/stunned/honored by needing to run this session *twice* because of popular demand, the sheer volume of interaction in each session and the ongoing interest since the summit.

Writing these slides, I realize how much I care about this topic… and how many careful subtle habits we’ve developed within RelEng over the last ~5 years.

During the summit, and again last week in Toronto, I had a chance to meet with Homa Bahrami (Senior Lecturer, Haas Management of Organizations Group, Haas School of Business, Berkeley). Apart from being a great person to talk with, she has lots of organizational and behavioral science background to help explain why the things that we felt were helping, were in fact, something she would expect to help!

PDF of slides
(click image for PDF of slides; keynote available on request, but its large!)

As I said at the start of each session, at first it felt odd for a Release Engineer to be talking about work habits of distributed groups… until you think about how physically distributed Mozilla’s Release Engineering group is. I note, for the record, that *none* of RelEng are “in headquarters”. While there are occasional miscommunications, RelEng is fairly well plugged into whats going on… after all, we *need* to be in order to do our job of shipping software quickly, reliably and accurately.

To me, this feels like it actually is about working together in clearly understood ways. The suggestions here have helped “remote” RelEng people in clear and obvious ways, but they *also* help “local” RelEng people work together better.

Please let me know what you think. And of course, if you have ideas or suggestions that I missed, I’d love to hear them.

(Apologies to those who’ve been pestering me to post these over the last few months. Last week’s “remoties” day reminded me how important this is to post – even in its rough state. I’ve fixed the most egregious errors/typos, and merged in some feedback I got in the Q&A sessions. However, these slides still need further work. If you spot anything to fix, please let me know!)

Older Entries