Infrastructure load for July 2012

  • #checkins-per-month: We had 5,635 checkins in July2012, another new record, and well above our previous record of 5,246 checkins in May2012.
  • #checkins-per-day: We had consistently high load across the month, and 19-of-30 days had over 200 checkins-per-day. Put another way, we had over 200 checkins-per-day every work day in July except Canada Day (02jul2012) and US Independence Day (04jul2012).
  • #checkins-per-hour: The peak this month was 11.35 checkins per hour, and throughout the month, we sustained over 10-checkins-per-hour for 5 out of 24 hours in a day.

mozilla-inbound, fx-team:
mozilla-inbound continues to be heavily used as an integration branch, with 26% of all checkins, far more then the other integration branches fx-team (1.5% of checkins) or mozilla-central (~3% of checkins). For comparison, I note that more people landed on mozilla-aurora then on mozilla-central.

mozilla-aurora, mozilla-beta:

  • 3.8% of our total monthly checkins landed into mozilla-aurora.
  • 2.1% of our total monthly checkins landed into mozilla-beta. This is higher then previous months, but I guess this to be related to the NativeFennec-landing-on-beta work this month.

(Standard disclaimer: I’m always glad whenever we catch a problem *before* we ship a release; it avoids us having to do a chemspill release and also we ship better code to our Firefox users in the first place.)

misc other details:

  • Pushes per day
    • You can clearly see weekends through the month, as well as the impact of the national holidays on 02jul2012, 04jul2012.

  • Pushes by hour of day
    • It is worth noting that for 5 hours in every 24 hour day, we did over 10 checkins-per-hour. Phrased another way, thats one checkin every 6mins for 5 hours.

RelEng production systems go hybrid… now available on AWS

As of Friday afternoon (06jul2012), RelEng started generating a small number of production builds and try builds on Amazon Web Services.

(Terminology alert: this means Mozilla’s network of RelEng machines are now considered a mix of a private cloud, and a public cloud, …which is called a hybrid cloud.)

catlee already covered this in Mozilla’s Platform Meeting, but this multi-month project is a massively important step forward for Mozilla’s Release Engineering infrastructure as well as for all Mozilla’s developers, so is worth calling attention to three important details:

  • Security
  • Seamless integration
  • Dynamic allocation

Security

The security of our RelEng infrastructure is obviously important to Mozilla, so we setup these Amazon-based VMs inside a Virtual Private Cloud (VPC). While it is technically possible to have the VMs inside the VPC connect directly to the external internet, we felt it was safer to prohibit any access from the VPC to or from the internet. Therefore the only connection we have to/from our VPC is over a VPN link directly into Mozilla’s existing Build network, within Mozilla’s secured infrastructure.

If an Amazon VM needs to reach an external site for any reason, it can only do this by going from Amazon over VPN to Mozilla’s Build Network and then out through Mozilla’s firewalls. If a Mozilla person wants to access one of our Amazon VMs, they have to do this by going through Mozilla BuildNetwork over the VPN link to the VPC. We designed this very restricted access to help protect these vital systems. It was reassuring to also see all the security audits that Amazon has done.


Seamless integration
We integrated Amazon’s VMs nicely into our existing mix of VMs and physical machines in the Mozilla build network. The easiest way to see if your specific build was handled by an Amazon VM (called an “EC2 instance” in Amazon-speak), is to look at the machine name on tbpl.mozilla.org.

The only other way that you can tell we are using AWS for some of the builds is that the additional compute capacity is helping reduce wait times for our builds!


Dynamic allocation
As you can see from looking at our monthly load posts, load on our RelEng infrastructure varies over different times of the day, and over different days of the week. To handle this efficiently, we now dynamically add and remove Amazon VMs from production at any given time to match the demand at that time. We do this as follows:

  • Our automation monitors the queue for pending builds
  • If there is a backlog of pending builds in the queue, our automation dynamically starts reviving enough VMs in our Virtual Private Cloud to handle the backlog.
  • As each of these VMs come online, they connect to a buildbot master, indicating they are idle and ready to process jobs.
  • A buildbot master assigns a pending build job to the newly available idle slave.
  • Once the build job is completed, the slave goes back to the master looking for another job.
  • If there are no backlog of pending jobs for 60 minutes, then our automation starts suspending the idle Amazon VMs. Suspending VMs like this allows us to quickly bring VMs back into production in a few seconds to handle any new backlog, while also reducing costs during low load times. Also, note that the 60minute threshold worked well for us in staging, but we’ll likely adjust this in the near future as we more experience with real-world load.

As of today, we only let some B2G builds overflow onto AWS like this, and we continue to monitor builds and the dynamic allocation carefully. Assuming this continues to work well, we will soon let the rest of the B2G builds overflow to AWS. Then next will be fennec/android builds, and then linux desktop builds. Our focus in the immediate short term will be to siphon excess load from our Mozilla build machines over to AWS, allowing us to better handle the increased number of B2G and Fennec builds being enabled in production recently. This also allows us to reimage some/all of our physical linux builders as physical win64 builders to immediately help with our win64 builder wait times. Eventually, we may start running win64 builds, and maybe even some unittests, on AWS but that need further investigation – stay tuned!


Its hard to overstate how important this is for us.

The increase in build types for B2G and fennec and desktop, combined with the increase in number of checkins per day has RelEng systems continually under heavy load. We first tried using AWS in 2008, but the Amazon VMs that we were using kept being restarted, usually before the build completed; the build would automatically restart everything once revived a few seconds later, but it still blocked us from actually being able to use these in production. Some renewed experiments in summer2011, and discussions with others companies who were doing similar investigations looked promising, so we started work on this in full force in Feb2012.

We hope you like this, and of course, if you see any problems, let us know asap, or file a bug!

[UPDATED: fixed typos, joduinn 17jul2012]

New builds in production: Fennec-Armv6, B2G-Armv7 and B2G-desktop

In the last two weeks, the following new build types were enabled on our production infrastructure:

  • arm v6: These Fennec builds run on Arm v6 chips. Mobile developers asked for these builds because so many people still use Arm v6 phones. We generate these builds as well as continue to generate the existing NativeFennec Arm v7 builds and XULFennec builds. You can find these on tree.mozilla.org as “Android Armv6 opt”. More details in bug#723946.
  • B2G Arm v7: These boot2gecko builds run on Arm v7 chips. We generate both Opt and Debug builds. You can find these on tree.mozilla.org as “Armv7a GB opt” and “Armv7a GB debug”. More details in bug#758425.
  • B2G desktop. These boot2gecko builds are compiled specifically to run on *desktop* machines, not on boot2gecko devices. The intended users for these builds are developers / QA / localizers and community people who are helping work on B2G and can do a large portion of their work without access to physical devices. You can find these on tree.mozilla.org as “Ng” for each of win/mac/linux desktop platforms. More details in bug#744008.

Next time you see them, give thanks to armenzg and bhearsum for their speedy behind-the-scenes work to make all this happen.

John.

Infrastructure load for June 2012

  • #checkins-per-month: We had 5,194 checkins in June2012. This is on par with last month’s record of 5,246 checkins in May2012.
  • #checkins-per-day: We had consistently high load across the month, and 15-of-30 days had over 200 checkins-per-day.
  • #checkins-per-hour: The peak this month was 11.5 checkins per hour. It is worth noting that throughout the month, we sustained over 10-checkins-per-hour for 5 out of 24 hours in a day.

mozilla-inbound, fx-team:
mozilla-inbound continues to be heavily used as an integration branch, with 23% of all checkins, by comparison with the fx-team branch (~2% of checkins) or mozilla-central (~4% of checkins). These ratios have been fairly consistent over the last few months.

mozilla-aurora, mozilla-beta:

  • 3.8% of our total monthly checkins landed into mozilla-aurora, which is higher then previous months. I suspect this is caused by checkins for nativefennec-beta-on-aurora.
  • ~2.6% of our total monthly checkins landed into mozilla-beta. This is higher then previous months, but I guess this to be related to the NativeFennec-landing-on-beta work this month.

(Standard disclaimer: I’m always glad whenever we catch a problem *before* we ship a release; it avoids us having to do a chemspill release and also we ship better code to our Firefox users in the first place.)

misc other details:

  • Pushes per day

  • Pushes by hour of day
    • It is worth noting that for 5 hours in every 24 hour day, we did over 10 checkins-per-hour. Or a checkin every 6mins, if thats easier to wrap your brain around. 🙂

Infrastructure load for May 2012

  • #checkins-per-month: We set a new record of 5,246 checkins in May2012. This is a significant jump above our previous record of 4,508 checkins in March2012.
  • #checkins-per-day: We had consistently high load across the month, and 12-of-31 days had over 200 checkins-per-day.
  • #checkins-per-hour: We set a new record of 12.3 checkins-per-hour.

mozilla-inbound, fx-team:
mozilla-inbound continues to be heavily used as an integration branch, with 23% of all checkins, by comparison with the fx-team branch (~2% of checkins) or mozilla-central (~4% of checkins).

mozilla-aurora, mozilla-beta:

  • 4.3% of our total monthly checkins landed into mozilla-aurora, which is higher then previous months. I suspect this is caused by checkins for nativefennec-beta-on-aurora.
  • ~1% of our total monthly checkins landed into mozilla-beta, consistent with previous months.

(Standard disclaimer: I’m always glad whenever we catch a problem *before* we ship a release; it avoids us having to do a chemspill release and also we ship better code to our Firefox users in the first place.)

misc other details:

  • Pushes per day

  • Pushes by hour of day
    Two details worth noting here:

    • between 11am PDT and noon every day of May, we *averaged* a checkin every 5minutes.
    • It is worth noting that for 5 hours in every 24 hour day, we did over 10 checkins-per-hour.

HOWTO build multi-locale Fennec on Android

Aki recently added some new features to MozHarness which make it much easier to create multi-locale Fennec builds.

Creating mutli-locale Fennec builds has always been tricky, but at the same time, its also important. Simplifying this process makes it easier for developers to recreate locally what Mozilla will eventually ship to end-users, as well as make it easier to debug problems reported by our multi-locale aurora/beta Fennec users – always a good thing!

Aki’s posted a short summary of steps to the Fennec wiki page. For more details, see Aki’s blogpost. Please give this new code a try… and of course, if you find something that you think should be changed to make this better, let Aki know or file a bug in mozilla.org:ReleaseEngineering.

Nice work, Aki,

HOWTO use Narro with Mozilla’s l10n process

If you are doing any localization here at Mozilla, and especially using Narro, please watch this brief screencast by armenzg.

As anyone who’s done localization knows, there’s a lot of gotchas to watch out for, and this is also true here at Mozilla. Over the years, armenzg has lived through a lot of these gotchas for his Armenian localization work, as well when he was helping Vannek, myself and the rest of the Khmer team working on the Khmer localization. After watching this screencast, I think armenzg did a great job of summarizing the steps to take to avoid the gotchas and get Firefox localization work done smoothly.

Those 15mins now will teach you lots and save you hours of frustrating debugging later. Trust me on this. Make some coffee, and watch it now. Of particular interest to me personally was ~12mins into the screencast, dealing with some gotchas about keyboard shortcut characters that tripped us up once when exporting Khmer fixes from Narro and committing to hg.m.o.

For more details and code samples about this screencast, see armenzg’s blog post. And if you found that useful, please also check out armenzg’s other posts about Narro:

How we use Narro to localize Firefox
How to exclude a file from being exported in Narro

Big thanks to armenzg for doing these blogposts – they are a great help!

Khmer Firefox shipped!

Today, in the midst of all the other excitement for the Firefox13 release, I’d like to draw attention to a special achievement.

Khmer release download button The Khmer version of Firefox shipped today as an officially supported release. No longer “just” a beta, the Cambodian people can now use an official release version of Firefox, in their native Khmer language, with identical features and security as every other Firefox 13 user.

This has been a long, huge undertaking by Vannak, Javier and everyone else on the Khmer localization team. I cant thank them enough for all their work. Its been a real privilege to get to know them, online and while I was in Phnom Penh earlier this year. I’m proud of their work, and honored to know them.

Thank you!
John.

ps: If you ever want to meet some cool folks, and help out with future Khmer localization, come over to http://www.mozillakm.org/ or the mozilla.dev.l10n.km newsgroups and say hi – they’d love to meet you!

pps: Updated to add some news articles that mentioned the official support for Khmer [here], [here], [here]

Pina by Wim Wenders

I rarely go to the movies, and even more rarely do any movie reviews on this blog, especially in an area I know little about. However, after seeing this in movie in 3D at a local theater recently, I still cant get it out of my head.

Luscious imagery. Great choreography. And a soundtrack I eventually had to buy and keep looping around and around… I’m still listening to it right now, weeks later.

Very different. Hope you enjoy!

More details about Pina Bausch here on wikipedia.

R.I.P. Carroll Shelby

On Thursday, 10-may-2012, Caroll Shelby died, aged 89.

Originally a chicken farmer, he became a race car driver until 1959, when heart problems brought his successful racing career to an end, so he switched again to focus on designing and building powerfully fast, brash, muscle cars that he loved to drive, including some great icons:

AC Cobra/Shelby Cobra

Ford Mustang GT390 (made famous by Steve McQueen in the movie “Bullitt“.)
…and several other
Mustang-variants.

Dodge Viper

Ford GT

Even at 84 years of age, while a consultant on developing the new high-powered Ford GT, he still loved driving fast, and test drove the new Ford Mustang GT500 on a race track at 150mph.

I didnt realize until today that he was also one of the world longest living heart transplant recipients, having received a heart transplant in 1990.

A more detailed bio is available on his Shelby-America company website, on Wikipedia, The New York Times and The Washington Post .