Ferrari F430 on 18th street

The old joke is “You know you’ve lived in San Francisco too long when… the sight of a legal parking space can bring you to tears”.

There’s two noteworthy things in these photos:
1) These photos were taken on 18th street, San Francisco, which is always packed because that one block has Dolores Park, Tartine, BiRite market, BiRite icecream and two Delfina restaurants. Walking or driving down this street is super-busy. Parking is… Well, to give you an idea of how hard it is to park here, I’ve lived in the area for years now, and can count on one hand the number of times I parked legally on this street.
2) Oh, and yes, the car is a Ferrari F430 – a rare sight. (2006 year I’d guess, from matching the photos). From the new-dealer-tags, I think someone just bought themselves a very nice pressie.

Reworking RelEng bugzilla components

tl;dr: RelEng has renamed + re-scoped our Bugzilla components, added some new components, and moved them all to a new top-level “Release Engineering” product. This should make it easier for people to file bugs in RelEng, improve triaging so bugs dont get lost, and help us scale. We think this is a big improvement, and hope you like it.

The scope, and the nature, of Release Engineering has evolved significantly over the last 6 years. As our responsibilities grow, the group grows, and as we re-scope who works on what, our old components simply weren’t working well for us anymore. It was getting tricky to keep track of new incoming new bugs, it was unclear (and inconsistent) on where new bugs should be filed, existing bugs were getting lost in the noise and triage of all this was becoming unwieldy. We already did one big change of our components at the start of the year (2013), which helped a bunch, but which we felt could be improved further. Today, after lots of planning, these changes went live, with thanks to glob for his behind-the-scenes-bugzilla-magic.

These new components may not be perfect (!), but we believe this is a big improvement over what we had earlier this year, and which is, in turn an improvement over what we had before that!

Of course, renaming and creating new components doesn’t actually *fix* bugs. But, this change should make it easier for people to file bugs in the right component, easier for RelEng to triage, and harder for bugs to fall through the cracks.

This reshuffle is already helping cleanup our backlog – we’ve already found bugs that were DUPs, as well as fixed-long-ago-yet-left-open bugs. Please do be patient with us while we triage our way through all our bugs, and if we update a long-sleeping bug of yours asking for more info, please do bear with us while we figure all these out.

ps: You can see more details in bug#898244, but the summary is:

0) Saved searches that relied on the ID of the renamed components should be ok, but saved searches that relied on matching the component name will need to be updated.

1) The new name and scope for each component is as follows:

Release Engineering: Buildduty:
This component tracks the routine care and feeding of RelEng systems (machine management), reconfigs of masters, and any tree closures or
downtimes. This includes physical machines in any of Mozilla’s colos, as well as instances in AWS.

Release Engineering: Loan Requests:
This component tracks all requests from developers for a loan of a production machines.

Release Engineering: General Automation:
This component tracks

  • Modifying or removing existing builds, tests and other jobs
  • Adding support for new types of jobs, builds or tests (e.g. opt, pgo,
    debug, ASAN or code coverage builds; b2g device builds, new test suites;
    special builds like spidermonkey or valgrind)
  • Scheduler changes: what jobs get run and when

Release Engineering: Release Automation:
This component tracks bugs related to the Release Automation including, but not limited to buildbot code.

Release Engineering: Releases:
This component tracks bugs related to specific releases, or updates to those releases.

Release Engineering: Tools:
This component tracks large scale tools used to interact with RelEng systems. Some examples include (but are not limited to):

  • vcs2vcs, balrog, tryserver/trychooser, cloud-tools, buildapi,
    self-serve, hg/git mapper, integration-with-s3, tools-for-sheriffs,
    autoland, kittenherder…
  • alerts for colo outages, long-running-jobs,
  • reports for wait-times, try-server-top-users, cost reporting, slave health, …

Release Engineering: Platform Support:
This component tracks requests for supporting builds/tests on a new operating system. (Examples include: android x86, win 8.1, osx10.9). This component also tracks requests for toolchain changes (new versions of compilers, python, hg, git, puppet, GPO, etc.)

Release Engineering: Repos and Hooks:
This component handles creating/deleting Mercurial and Git repos on and respectively, any mirroring between those systems, as well as any hooks on those repos. Note: This used to be the unowned component, which has significant production impact, and overlapped with work we do for new repo requests, so made sense to include here.

Release Engineering: Other:
This component is used for goals, tracking bugs, and any general RelEng work that spans across different RelEng components. Anything that doesn’t fit in any other component goes here.


Infrastructure load for July 2013

  • #checkins-per-month: We had 7,051 checkins in July 2013. This is 20% above last month’s 5,893 checkins, and 10% above our previous all-time-record of 6,433 in Mar2013.

    Overall load since Jan 2009

  • #checkins-per-day: We had 370 checkins checkins on 02jul. During July, 20-of-31 days had over 200 checkins-per-day, 18-of-31 days had over 250 checkins-per-day. Of note, 10-of-31 days had over 300 checkins-per-day – a huge jump in load.
  • #checkins-per-hour: Checkins are still mostly mid-day PT/afternoon ET. For 10 of every 24 hours, we sustained over 10 checkins per hour. Heaviest load time this month was 1pm-2pm PT (14.7 checkins-per-hour) – a new record.
  • As usual, our build pool handled the load well, with >95% of all builds consistently being started within 15mins. The use of multiple inbounds is really helping improve bottlenecks. Our test pool continues to improve. All the hard work by RelEng, ATeam and IT is paying off, we’re seeing more test jobs being handled with better response times. The work on fixing/disabling any tests that are hidden-yet-still-being-run is also improving our test situation. The peak for July was 55,983 test jobs on 18jul. Still more work to be done here, but very encouraging.

    As always, if you know of any test suites that no longer need to be run per-checkin, please let us know so we can immediately reduce the load a little. Also, if you know of any test suites which are perma-orange, and hidden on tbpl.m.o, please let us know – thats the worst of both worlds – using up scarce CPU time and not being displayed for people to make use of. We’ll make sure to file bugs to get tests fixed – or disabled – every little bit helps put scarce test CPU to better use.

mozilla-inbound, birch/b2g-inbound, fx-team:
mozilla-inbound continues to be heavily used as an integration branch. Its noteworthy that as developers start to use other -inbound branches, we saw mozilla-inbound reduce significantly to 21.3% of all checkins. Its still consistently far more then all other integration branches combined, but you can see the congestion reduced as people use other *-inbound branches.

The “birch as b2g-inbound” experiment is officially a great success, with 7.9% of this month’s checkins landing here, birch has now become the 3rd busiest branch (after try, and mozilla-inbound). Birch is also helping reduce pain of any mozilla-inbound closures, and further proving the lure of sheriff-assisted-landings to developers. As of 01aug, the official “b2g-inbound” branch is officially open, on a permanent basis, to use instead of birch. I expect the percentage on this branch to stabilize in the coming weeks.

The fx-team branch increased slightly to 2.4% of checkins this month, as sheriffs coverage started late in the month. I expect the percentage on this branch to grow over August, as more people rely on sheriff support here.

The combined total of these 3 integration branches is 31.6%, showing just how much our sheriffs are helping.

Infrastructure load by branch

mozilla-aurora, mozilla-beta, mozilla-b2g18, gaia-central:
Of our total monthly checkins:

  • 1.8% landed into mozilla-central, slightly lower than last month. As usual, very few people land directly on mozilla-central these days, when there are sheriff-assisted branches available instead.
  • 2.1% landed into mozilla-aurora, slightly higher than last month.
  • 1.1% landed into mozilla-beta, slightly higher than last month.
  • 1.2% landed into mozilla-b2g18, slightly lower then last month.
  • Note: gaia-central, and all other gaia-* branches, are not counted here anymore. For details, see here.

misc other details:

  • Pushes per day
    • You can clearly see weekends through the month. Its worth noting that we had >200 checkins-per-day almost every working day in July. This has been true for a few months now, so it is starting to feel like 200 checkins-per-day is the new “normal” for a working day at Mozilla. Having 10-of-31 days over 300 checkins-per-day is a big deal.
    • Pushes by hour of day
        Mid-day PT is consistently the biggest volume of checkins, specifically between 1pm-2pm PT, with 14.77 checkins-per-hour, and 2pm-3pm PT, with 12.63 checkins-per-hour. Its interesting to see load spreading out across the day, with 10-of-every-24 hours sustaining over 10 checkins per hour. Heaviest load time this month was 1pm-2pm PT (14.7 checkins-per-hour) – a new record.