20 Jun 2011
JohnMozilla
Between Wed (8th) and Fri (17th), RelEng did ten complete releases. In a few cases, at the last minute the release was not shipped to end users because of another release starting immediately, but all the work had been done already.
While doing all these releases, RelEng also set several new speed records – Firefox releases in 8-9 hours, Fennec releases in ~5 1/2 hours. Nice work. Calmly done. Great to watch in action.
Firefox5.0beta5
go to build: 08jun 14:07
all done: 09jun 15:39
end-to-end time: 25h 32m
RelEng time: 9h 01m
Fennec5.0beta5:
go to build: 08jun 14:15
all done: 09jun ???
end-to-end time: ???
RelEng time: 16h 05m (or 05h 10m – depending on whether you include the time waiting for person to wake up to start this non-chemspill release)
Firefox5.0beta6
go to build: 13jun 15:48
all done: 14jun 17:16
end-to-end time: 25h 28m
RelEng time: 8h 48m
Fennec5.0beta6:
go to build: 13jun 15:48
all done: 14jun 18:11 (stopped because of upcoming Fennec5.0b7).
end-to-end time: 26h 23m
RelEng time: 8h 12m
Firefox5.0beta7:
go to build: 14jun 16:34
all done: 15jun 19:10
end-to-end time: 26h 36m
RelEng time: 9h 08m
Fennec5.0beta7
go to build: 14jun 16:35
all done: 15jun 10:42
end-to-end time: 18h 07m
RelEng time: 5h 24m
Firefox5.0 rc:
go to build: 15jun 14:34
all done: in progress
end-to-end time: in progress
RelEng time: 14h 50m
Fennec5.0 rc
go to build: 14jun 16:35
all done: in progress
end-to-end time: in progress
RelEng time: 5h 24m
Firefox3.6.18build1
go to build: 10jun 20:18
all done: 14jun 17:08 (stopped because of upcoming FF3.6.18build2)
end-to-end time: 3d 20h 50m
RelEng time: 2d 13h 29m (or 7h 22m – depending on whether you include the time waiting for person to wake up to start this non-chemspill release)
Firefox3.6.18build2
go to build: 14jun 17:08
all done: in progress
end-to-end time: in progress
RelEng time: 10h 23m
16 Jun 2011
JohnMozilla
This morning, we started running Alice’s new Tp5 performance test suite in production.
From now until 30June2011, we will run the Tp4 suite and also the Tp5 suite on every checkin, on every branch. This will add ~15mins extra load per checkin, which adds up quickly when you consider the number of checkins we do daily at Mozilla. We *think* our infrastructure can handle this extra load just fine because of Armen’s super-detailed behind-the-scenes work pruning obsolete tests, consolidating tests suites, reassigning idle win64 slaves, disabling perma-orange tests…
If there are no problems detected by 30June, we’ll disable Tp4 across the board and only measure Tp5. Of course, if you see anything wrong with the Tp5 results, or you see any infrastructure load problems, please file a bug in mozilla.org/ReleaseEngineering and we’ll get right on it!
(For those who don’t know: Tp is a performance test suite that measures page load times on the top 100 pages of http://www.alexa.com/topsites. The list of the top pages changes over time, obviously, as people change to use different websites, so we refresh the pagesets periodically. Before Tp5, there was Tp4, Tp3…)
16 Jun 2011
JohnMozilla
In case you missed it, coop did an important post on some upcoming directory restructuring work about to happen on ftp.mozilla.org. If you have any concerns/questions, please comment in bug#449607
Depending on release schedules, next week coop plans to change the directory structure of builds on ftp.m.o from using a shorter BuildID (YYYY-MM-DD-HH) to using our “new” full BuildID (YYYY-MM-DD-HH-mm-ss).
This change means a few important things:
* if you look at the buildid in your Firefox, you would be able to programatically create the full path to where that specific build can be downloaded from ftp.m.o.
* RelEng can programatically create the full path to where the build can be uploaded onto ftp.m.o.
* it reduces the risk of collision if we do two nightly builds within the same hour (rare, but it sometimes happens!).
Once we have this change in place, we can start removing a bunch of hacky workarounds in our release automation dealing with these mismatches. Removing these, in turn, makes our remaining release automation more streamlned, easier to debug, and also safer against surprise breakages whenever we make changes.
This cleanup has been literally years in the unravelling and debugging. For a glimpse of the behind-the-scenes work, have a read of bug#449607, all its connected bugs, and also the dev.planning newsgroup discussions here.
ps: Many many thanks to coop for taking over this and running this to completion. As he said himself earlier today, “we’d have rolled this change out to production weeks ago if only we’d stop doing releases for a few minutes”!
08 Jun 2011
JohnMozilla
(Originally posted in dev.planning; reposting in case a TryServer user missed this important announcement.)
TryChooser was introduced in sept2010 as an optional way to let people skip build/test jobs they did not want, and hence reduce overall load on TryServer. However, we still see a lot of pushes (~40-50%) to TryServer that are not using TryChooser syntax which results in a “do all platforms (opt & debug) as well as all unittests” request, even though in an informal survey, we hear from anecdotal discussions not all that is needed.
To reduce unwanted jobs on TryServer, lsblakk has a hook ready to land that will immediately reject pushes to try unless the final commit message contains TryChooser syntax. This will encourage people to think about what jobs they really need for their particular situation, and ask for what is appropriate. The http://people.mozilla.org/~lsblakk/trychooser/ website helps build the syntax for developers so there are less typos and the test/talos suites are listed for easy picking.
Of course, please do continue to use TryChooser syntax in your commit message to ask for whatever jobs you feel it is useful to you. The hope is that this change will encourage people to consider the cost of each push, and do the right thing.
Given that tryserver is ~60% of our overall infrastructure load, this should reduce our test load quickly. Details in https://bugzilla.mozilla.org/show_bug.cgi?id=649402 and of course, here: https://wiki.mozilla.org/ReleaseEngineering/TryChooser
If you have any comments or concerns, please comment in the bug. I’ll transcribe any comments below to the bug.
06 Jun 2011
JohnMozilla
Here’s a belated blogpost, welcoming Joey to Coop‘s group in RelEng.
When I introduced Joey on 23rd May, I started by asking “What has 1,797 files, and now runs about twice as slow compared to a year ago“?

Mozilla’s makefiles are used daily by every developer every day, and by RelEng continuous integration machines literally thousands of times every day. Joey will be focused on making these Makefiles more efficient, easier to understand, easier to *safely* change, as well as make them build faster. All super important. Oh, and don’t break anything in the process!
Given how much this impacts RelEng infrastructure, and all the really exciting work on Makefiles that Joey has done in a previous life, we’re really excited to have Joey join our group and turn his quiet laser focus here.
Joey will be another remote RelEng person – working from New York state – but you can find him in irc.mozilla.org in the #build channel as “joey”. Stop by and say “hi”, but please be patient. To start with, Joey is wading through *tons* of stuff with Ted and Kyle to learn his way around our Makefiles. Ted and Kyle know this stuff more then most anyone, I think. They have already done tons of work here over the years, wrangling this large complex can of worms, and keeping things working along while also at the same time dealing lots of other stuff on their plates. Every time I look into these Makefiles “to make a quick simple change”, I find myself going away thinking I need to buy Ted and Kyle a beer for all their hard work so far.
UPDATE: (Joey’s working on setting up a blog, and getting it added to planet. I’ll link to it once its up and running.) Joey’s blog is here. joduinn 05aug2011
27 May 2011
JohnMozilla
Late Wednesday night, we completed the project to migrate these repos:
http://hg.mozilla.org/mozilla-aurora –> http://hg.mozilla.org/releases/mozilla-aurora
http://hg.mozilla.org/mozilla-beta –> http://hg.mozilla.org/releases/mozilla-beta
This project started several weeks ago when we moved the repos and put symlinks in place. We did this so anyone using the repo in either location would see things working just fine during the transition phase. With all infrastructure pointing to the new repo locations, the last few weeks have been mostly wait-and-see, waiting for a quiet time between “the mozilla-centra –> aurora merge” on Tuesday, and the “enabling of updates on aurora” this morning. Late Wed night we removed the symlinks, and cleaned up http://hg.m.o to remove the duplicate entries.
If you see *any* problems, please reopen and comment in bug#652229. Special thanks to NoahM in IT for all his help in this project.
Other info:
* We added HTTP redirects to hg.m.o, and will leave these in place, in order to maintain existing URLs in any bugs that refer to the old http://hg.mozilla.org/mozilla-aurora and http://hg.mozilla.org/mozilla-beta locations.
* People starting new patches for mozilla-aurora, mozilla-beta should reclone using the new http://hg.mozilla.org/releases/mozilla-aurora or http://hg.mozilla.org/releases/mozilla-beta locations. Alternatively, more fearless hackers can update their .hg/hgrc file.
For more details see the original post to dev.planning.
24 May 2011
JohnMozilla
Summary:
We had 1,751 pushes in March 2011 – a continued significant drop from the last few months. I believe this is because of the continued checkin restrictions during March as we got closer to the Firefox4.0 release. However, I have no way to prove that, it is just my interpretation. If you have other suggestions, please let me know.


Details:
- Load on TryServer is exactly the same (52% vs 52% in previous month) of our overall load.
- The numbers for this month are:
- 1,751 code changes to our mercurial-based repos, which triggered 214,670 jobs:
- 32,798 build jobs, or ~44 jobs per hour.
- 103,608 unittest jobs, or ~139 jobs per hour.
- 78,264 talos jobs, or ~105 talos jobs per hour.
- We are still double-running unittests for some OS; running unittest-on-builder and also unittest-on-tester. This continues while developers and QA work through the issues. Whenever unittest-on-test-machine is live and green, we disable unittest-on-builders to reduce wait times for builds. Any help with these tests would be great!
- The entire series of these infrastructure load blogposts can be found here.
- We are still not tracking down any l10n repacks, nightly builds, release builds or any “idle-timer” builds.
Even more details:


Here’s how the math works out (Descriptions of build, unittest and performance jobs triggered by each individual push are here:

23 May 2011
JohnMozilla
On Friday, we shipped Firefox 5.0beta2 and Fennec 5.0beta2. The cool new features have already been covered elsewhere, but there were two details that I thought were important:
- Firefox and Fennec both used the same changeset http://hg.mozilla.org/releases/mozilla-beta/rev/2b3275216413.
- Firefox and Fennec shipped at the same time. Its true we’ve sim-shipped these products before, but this was the first time planned as part of the new rapid release cadence.
This is an important milestone for us, as we bring Fennec up to parity with Firefox.
Aki led some great behind-the-scenes cleanup work to make this happen. And even with that work in place, we still needed some careful workarounds in place for last week’s builds (details in http://oduinn.com/blog/2011/05/05/branch-mechanics-for-fennec-5-0).
We’ll continue to consolidate the product code bases, and consolidate the release automation behind these two products, so we can do this again more easily next time. Meanwhile, it was really great to see both these releases ship, from the same changeset, on Friday morning.
18 May 2011
JohnMozilla
Summary:
(Sorry for the long-delay in posting these, things are starting to calm down again, so I’ll catch up.)
We had 2,060 pushes in February 2011 – a significant drop from the last few months. I believe this is because of the checkin restrictions during February (for Firefox4.0beta11, Firefox4.0beta12, Fennec4.0beta4, Fennec4.0beta5) as we got closer to the Firefox4.0 release, but honestly, that is just my interpretation. I have no way to prove that. If you have other suggestions, please let me know.


Details:
- Load on TryServer is about the same (52% vs 53% in previous month). of our overall load. People continue to send their patches to TryServer before landing, so patches that do land are less-risky, and the tree stays green more often!
- The numbers for this month are:
- 2,060 code changes to our mercurial-based repos, which triggered 262,340 jobs:
- 39,365 build jobs, or ~58 jobs per hour.
- 123,603 unittest jobs, or ~184 jobs per hour.
- 99,372 talos jobs, or ~148 talos jobs per hour.
- We are still double-running unittests for some OS; running unittest-on-builder and also unittest-on-tester. This continues while developers and QA work through the issues. Whenever unittest-on-test-machine is live and green, we disable unittest-on-builders to reduce wait times for builds. Any help with these tests would be great!
- The entire series of these infrastructure load blogposts can be found here.
- We are still not tracking down any l10n repacks, nightly builds, release builds or any “idle-timer” builds.
Detailed breakdown is :


Here’s how the math works out (Descriptions of build, unittest and performance jobs triggered by each individual push are here:

16 May 2011
JohnMozilla
Please join me in welcoming Marc as an intern in RelEng.
Marc comes from UWaterloo, and started last Monday! He’s started working on some release automation around Android signing (details in bug#603826, and will then be helping Lukas with the buildbot-bugzilla-tryserver integration project.
Once Marc has finished fixing up his website / blog, you can follow along on his website! Of course, if you are in 650Castro, or in #build, please do stop by and say hello to “mjessome”.
Older Entries Newer Entries