Stopping Firefox 3.5

Now that Firefox5.0 shipped, we’ve got time to go back and do some cleanup. We’re going to disable the Firefox3.5 jobs this week.

This was already announced in last week’s platform meeting , but after all those years, its quite possible that people are relying on those jobs in ways we do not even know about. Hence this widespread notice. If you have any reasons these Firefox 3.5 jobs should be left running, please let us know by commenting in bug#666407.

    What will change:

  • No FF3.5.x incremental/depend/hourly builds will be produced.
  • No FF3.5.x clobber/nightly builds will be produced.
  • No FF3.5.x release builds will be produced.
  • The FF3.5 waterfall page will be removed from tinderbox. Specifically, this page http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox3.5 will go away as it will be empty.

    What will *not* change:

  • Existing FF3.5.x builds would still be available for download from http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/
  • Existing update offers would still be available. For example:
    • FF3.5.14 users can still update to FF3.5.19.
    • FF3.5.19 users can still update to latest FF3.6.x release (which is FF3.6.18 as of this writing).
  • Newly revised major update offers, like from FF3.5.19 -> a future FF3.6.19 release, could still be produced if needed
  • Any mozilla-1.9.1 machines which are not Firefox specific should continue to run as usual.

    Why do this:

  • Free up compute cycles in the shared production pool-of-slaves or try
    pool-of-slaves. This will help make life better for all other jobs.
  • Reduce manual support workload and systems complexity for RelEng and IT.
  • Allows us speed up making changes to infrastructure code, as there’s now no longer a need to special-case and retest FF3.5 specific situations.

If you have any reasons that these Firefox3.5 jobs should continue running, please comment in bug#666407. Now.

Yes, really.

Now.

Thanks
John.

10 releases in 9 days

Between Wed (8th) and Fri (17th), RelEng did ten complete releases. In a few cases, at the last minute the release was not shipped to end users because of another release starting immediately, but all the work had been done already.

While doing all these releases, RelEng also set several new speed records – Firefox releases in 8-9 hours, Fennec releases in ~5 1/2 hours. Nice work. Calmly done. Great to watch in action.

Firefox5.0beta5
go to build: 08jun 14:07
all done: 09jun 15:39
end-to-end time: 25h 32m
RelEng time: 9h 01m

Fennec5.0beta5:
go to build: 08jun 14:15
all done: 09jun ???
end-to-end time: ???
RelEng time: 16h 05m (or 05h 10m – depending on whether you include the time waiting for person to wake up to start this non-chemspill release)

Firefox5.0beta6
go to build: 13jun 15:48
all done: 14jun 17:16
end-to-end time: 25h 28m
RelEng time: 8h 48m

Fennec5.0beta6:
go to build: 13jun 15:48
all done: 14jun 18:11 (stopped because of upcoming Fennec5.0b7).
end-to-end time: 26h 23m
RelEng time: 8h 12m

Firefox5.0beta7:
go to build: 14jun 16:34
all done: 15jun 19:10
end-to-end time: 26h 36m
RelEng time: 9h 08m

Fennec5.0beta7
go to build: 14jun 16:35
all done: 15jun 10:42
end-to-end time: 18h 07m
RelEng time: 5h 24m

Firefox5.0 rc:
go to build: 15jun 14:34
all done: in progress
end-to-end time: in progress
RelEng time: 14h 50m

Fennec5.0 rc
go to build: 14jun 16:35
all done: in progress
end-to-end time: in progress
RelEng time: 5h 24m

Firefox3.6.18build1
go to build: 10jun 20:18
all done: 14jun 17:08 (stopped because of upcoming FF3.6.18build2)
end-to-end time: 3d 20h 50m
RelEng time: 2d 13h 29m (or 7h 22m – depending on whether you include the time waiting for person to wake up to start this non-chemspill release)

Firefox3.6.18build2
go to build: 14jun 17:08
all done: in progress
end-to-end time: in progress
RelEng time: 10h 23m

Introducing Talos Tp5

This morning, we started running Alice’s new Tp5 performance test suite in production.

From now until 30June2011, we will run the Tp4 suite and also the Tp5 suite on every checkin, on every branch. This will add ~15mins extra load per checkin, which adds up quickly when you consider the number of checkins we do daily at Mozilla. We *think* our infrastructure can handle this extra load just fine because of Armen’s super-detailed behind-the-scenes work pruning obsolete tests, consolidating tests suites, reassigning idle win64 slaves, disabling perma-orange tests…

If there are no problems detected by 30June, we’ll disable Tp4 across the board and only measure Tp5. Of course, if you see anything wrong with the Tp5 results, or you see any infrastructure load problems, please file a bug in mozilla.org/ReleaseEngineering and we’ll get right on it!

(For those who don’t know: Tp is a performance test suite that measures page load times on the top 100 pages of http://www.alexa.com/topsites. The list of the top pages changes over time, obviously, as people change to use different websites, so we refresh the pagesets periodically. Before Tp5, there was Tp4, Tp3…)

Changing ftp.mozilla.org to use the “new” longer BuildID

In case you missed it, coop did an important post on some upcoming directory restructuring work about to happen on ftp.mozilla.org. If you have any concerns/questions, please comment in bug#449607

Depending on release schedules, next week coop plans to change the directory structure of builds on ftp.m.o from using a shorter BuildID (YYYY-MM-DD-HH) to using our “new” full BuildID (YYYY-MM-DD-HH-mm-ss).

This change means a few important things:
* if you look at the buildid in your Firefox, you would be able to programatically create the full path to where that specific build can be downloaded from ftp.m.o.
* RelEng can programatically create the full path to where the build can be uploaded onto ftp.m.o.
* it reduces the risk of collision if we do two nightly builds within the same hour (rare, but it sometimes happens!).

Once we have this change in place, we can start removing a bunch of hacky workarounds in our release automation dealing with these mismatches. Removing these, in turn, makes our remaining release automation more streamlned, easier to debug, and also safer against surprise breakages whenever we make changes.

This cleanup has been literally years in the unravelling and debugging. For a glimpse of the behind-the-scenes work, have a read of bug#449607, all its connected bugs, and also the dev.planning newsgroup discussions here.

ps: Many many thanks to coop for taking over this and running this to completion. As he said himself earlier today, “we’d have rolled this change out to production weeks ago if only we’d stop doing releases for a few minutes”! 🙂

Changing TryServer to require TryChooser

(Originally posted in dev.planning; reposting in case a TryServer user missed this important announcement.)

TryChooser was introduced in sept2010 as an optional way to let people skip build/test jobs they did not want, and hence reduce overall load on TryServer. However, we still see a lot of pushes (~40-50%) to TryServer that are not using TryChooser syntax which results in a “do all platforms (opt & debug) as well as all unittests” request, even though in an informal survey, we hear from anecdotal discussions not all that is needed.

To reduce unwanted jobs on TryServer, lsblakk has a hook ready to land that will immediately reject pushes to try unless the final commit message contains TryChooser syntax. This will encourage people to think about what jobs they really need for their particular situation, and ask for what is appropriate. The http://people.mozilla.org/~lsblakk/trychooser/ website helps build the syntax for developers so there are less typos and the test/talos suites are listed for easy picking.

Of course, please do continue to use TryChooser syntax in your commit message to ask for whatever jobs you feel it is useful to you. The hope is that this change will encourage people to consider the cost of each push, and do the right thing.

Given that tryserver is ~60% of our overall infrastructure load, this should reduce our test load quickly. Details in https://bugzilla.mozilla.org/show_bug.cgi?id=649402 and of course, here: https://wiki.mozilla.org/ReleaseEngineering/TryChooser

If you have any comments or concerns, please comment in the bug. I’ll transcribe any comments below to the bug.

Welcome Joey Armstrong to Release Engineering

Here’s a belated blogpost, welcoming Joey to Coop‘s group in RelEng.

When I introduced Joey on 23rd May, I started by asking “What has 1,797 files, and now runs about twice as slow compared to a year ago“?

Mozilla’s makefiles are used daily by every developer every day, and by RelEng continuous integration machines literally thousands of times every day. Joey will be focused on making these Makefiles more efficient, easier to understand, easier to *safely* change, as well as make them build faster. All super important. Oh, and don’t break anything in the process! 🙂 Given how much this impacts RelEng infrastructure, and all the really exciting work on Makefiles that Joey has done in a previous life, we’re really excited to have Joey join our group and turn his quiet laser focus here.

Joey will be another remote RelEng person – working from New York state – but you can find him in irc.mozilla.org in the #build channel as “joey”. Stop by and say “hi”, but please be patient. To start with, Joey is wading through *tons* of stuff with Ted and Kyle to learn his way around our Makefiles. Ted and Kyle know this stuff more then most anyone, I think. They have already done tons of work here over the years, wrangling this large complex can of worms, and keeping things working along while also at the same time dealing lots of other stuff on their plates. Every time I look into these Makefiles “to make a quick simple change”, I find myself going away thinking I need to buy Ted and Kyle a beer for all their hard work so far.

UPDATE: (Joey’s working on setting up a blog, and getting it added to planet. I’ll link to it once its up and running.) Joey’s blog is here. joduinn 05aug2011