Real-time traffic status: San Francisco and Tokyo

No Comments

On the bus from Narita into downtown Tokyo, I noticed the traffic signs show real-time traffic updates. By contrast, in the San Francisco area, traffic signs are fixed displays, so most people use Google maps to get live traffic updates on their phones.

Both approaches use yellow (slow traffic) and red (stopped traffic) indicators, so they felt very similar to each other. Having the info displayed on traffic signs seems safer – after all you don’t have to look down at your phone while driving. But I wonder how the Tokyo signs display info about traffic outside the immediate area.

Anyway, the differences and the similarities, struck me as noteworthy. Click the thumbnails for more detailed photos, and let me know what you think!!

Election pamphlets in San Francisco

No Comments

Today was election day here in San Francisco.

When I first started getting election pamphlets in the mail months ago, I simply tossed them into recycling along with the junk mail. Eventually, it struck me that there seemed to be a lot, so out of curiosity, two weeks ago I started collecting them, putting all the various election pamphlets I’ve received in a pile. All unread.

Last night, I sat down to read them all, with a fresh cup of coffee.

There were so many, it was almost as tall as my coffee mug. Reading them all took hours.

Having avoided all the political TV/radio ads, reading these flyers was my first real exposure into the style of the election ads going on all campaign. Reading them all in one large pile like that was a bit of a culture shock, and frankly, disappointing.

Lots of “dont vote for the other person; they’re corrupt/evil/wrong! Instead you should vote for me; I’m honest/good/correct”. All the pamphlets were very destructive of the other candidates. Calling them “attack ads” glosses over the personal destructive nature of these pamphlets.

Very few leaders, with positive and open constructive discussions.

This doesn’t bode well for my hope of seeing elected officials working together to solve some (any!) of the pressing problems facing us today.

Firefox 3.6.12 and Firefox 3.5.15 by the (wall-clock) numbers

No Comments

Firefox3.6.12 was released on Wednesday 27-oct-2010, at 16:48PST. This was yet another release that shipping within 24hours, and yet again, this set a new speed record for us.

From “Dev says go” to “release is now available to public” was 21h 32m wall-clock time. The Release Engineering portion of that was 10h 25m. This was faster than our previous fastest ever release FF3.6.6, and well inside of 24 hours from start to finish. For FF3.6.12, the wall clock times were:

19:20 26oct: Dev says “go” for FF3.6.12
19:48 26oct: FF3.6.12 builds started
21:55 26oct: FF3.6.12 linux, mac, unsigned-win32 builds handed to QA
00:05 27oct: FF3.6.12 signed-win32 builds handed to QA
03:35 27oct: FF3.6.12 update snippets available on test update channel
09:05 27oct: Dev & QA says “go” for Release; ok to start mirror absorption
10:50 27oct: mirror absorption started
10:55 27oct: mirror absorption good enough for testing
16:00 27oct: website changes finalized and visible. Build given “go” to make updates snippets live.
16:21 27oct: update snippets available on live update channel
16:48 27oct: release announced

I note that we also simultaneously shipped FF3.5.15 in the same super-fast way.

19:20 26oct: Dev says “go” for FF3.5.15
19:46 26oct: FF3.5.15 builds started
21:55 26oct: FF3.5.15 linux, mac, unsigned-win32 builds handed to QA
11:05 26oct: FF3.5.15 signed-win32 builds handed to QA
04:35 27oct: FF3.5.15 update snippets available on test update channel
09:05 27oct: Dev & QA says “go” for Release; ok to start mirror absorption
10:10 27oct: mirror absorption started
10:50 27oct: mirror absorption good enough for testing
16:00 27oct: website changes finalized and visible. Build given “go” to make updates snippets live.
16:21 27oct: update snippets available on live update channel
16:48 27oct: release announced

Obviously, we don’t want to ship this quickly all the time, but its nice to know that we can if we have to. Really really nice. And stay tuned – we’ve got more work in progress to make this even faster! :-)

Notes:

1) While there’s been some discussion about how this release was done in “2 f*ing days”, I prefer to measure the time between “dev says go” to “release is available”. Explicitly I do *not* measure from “fix is reported” to “release is available”, because I don’t want to put any further time pressure on a developer trying to fix a problem under pressure. It feels much better to me to work a little longer to get the fix right instead of adding even more time pressure looking for a quick fix, and then having to do another emergency release a few days later to fix the “quick fix”.

2) The super fast mirror uptake was due to some great work by mrz and justdave. Not that we would do that for every release, but it was great to have the assist when time really mattered!

3) As usual, our blow-by-blow scribbles are public, so you can read all the details here or in tracking bug#607228. For FF3.5.15, the build notes are here or in tracking bug#607240.

Thank you
John.

Breakthrough on Android automation

1 Comment

Android talos test results on TBPLIf you are watching closely, you might have noticed a small change recently to TBPL and tinderbox server and graphserver. The circled little green “T”s mean that the Talos suites tdhtml, ts, tsvg are now automatically reporting valid results on Android systems for every checkin on the mobile tree.

Of course, we’ve had automated Android builds for a while now – those little green “B”s on TBPL are Android builds triggered per checkin, and available like all our other builds on ftp.m.o. The big news here is that now we have 3 talos test suites correctly reporting green, each reporting their results to graphserver, tinderbox and TBPL. Just like we do for any other fully supported OS on our infrastructure.

This is still only the beginning. There’s still the rest of the Talos suites and all the unittest suites to get reporting green. But at least, we now know that the basic infrastructure works!

From RelEng this took aki and bear leading a ton of work, both within RelEng and also with ctalbert, jmaher, bmoss, mwu, blassey, and others across Mozilla… the list goes on and on. Thank you – each and all.

If you are interested to help get more test suites reporting green and showing on TBPL, please follow along in bug#538524, or ping aki or bear. At this point, the tricky part for us is that we cant enable broken/failing tests in production – that would close the tree. :-( Instead, we post the failing test suites on http://tinderbox.mozilla.org/showbuilds.cgi?tree=MobileTest, look through the logs and then go ask for developers/QA to help fix the broken code/tests. Only after suites are green can we have them report on the production mobile tree, and TBPL.

ps: So far, we have only 3 (soon 4) developer boards in production to run these tests. This means we are struggling with wait times for the checkin load on mobile branch. This also means we cannot realistically enable Android testing on high-load branches like mozilla-central, or TryServer – they simply couldn’t keep up with the number of jobs to test. I’ve been unsuccessfully trying for a month now to unjam this, so if you can help us get more developer boards, please drop me an email – I’d love to hear from you!

Please welcome Dustin Mitchell to Release Engineering

No Comments

We’re excited to have Dustin join Coop’s group here within RelEng. If you’ve been using Buildbot over the last couple of years, you’d know that Dustin has been maintainer of the Buildbot project through some large new features, while also helping grow the community.

He’ll bring additional buildbot expertise to our group and help make sure our non-Mozilla-specific work continues to be upstream-able to the general buildbot community. Also, part of his time will be spent providing further outreach to the buildbot community, helping others make buildbot even more cool.

He’ll be another remote RelEng person – working from Chicago – but you can find him in irc.mozilla.org in the #build channel as “dustin”. You can also follow his blog.

[Updated to include URL for Dustin's blog. joduinn 24oct2010]

Piano stairs

4 Comments

Here’s a nice experiment showing people drawn to do something good, simply because its fun! Check out some of their other experiments on thefuntheory.com.

Anyone know of other examples like this?

How to fix “Things freezes at start of Sync”

No Comments

A few days ago, my Things-on-Mac stopped synchronizing with my Things-on-iPhone. I tried everything on the CulturedCode forums and from the CulturedCode support emails without success. It took a while to debug this, so here are details, in case it helps others (or I have to do this again!)

What am I running:

  • MacBookPro running OSX 10.6.4
  • Things-for-Mac v1.4.2 (1420)
  • iPhone 3G running v4.1 (8B117)
  • Things-for-iPhone v1.6.1

Symptoms:

Individually, I could use Things on Mac, and on iPhone, just fine. However, any attempt to synchronize between the two would cause a progress dialog box on Mac saying “Preparing…” which would just hang, until I force-quit it. In case I was being too impatient, I left it running overnight once but it was still just as hung in the morning.

This hang happened 100% of the time. This hang happened regardless of whether I started synchronization on Mac with File->”Sync with … now”, or on iPhone, by starting Things-on-iPhone while on same wifi network as mac. This hang happened on my home wifi network and also on the office wifi network, and even when I had no other applications running on my mac.

This setup has been working without problems for months, and I hadnt installed any new software or updated any existing software, so I’m still baffled what caused this problem.

Here’s some things I tried first, unsuccessfully:

  • rebooting mac, rebooting phone, clicking sync. Hang.
  • rebooting mac, rebooting phone, removing phone from list of devices, adding phone into list of devices, re-pairing iPhone to mac, clicking sync. Hang.
  • removing things from iphone and reinstalling through itunes, rebooting mac, rebooting phone, removing phone from list of devices, adding phone into list of devices, re-pairing iPhone to mac, clicking sync. Hang.
  • Repeat all of the above on home wifi, and then again on work wifi.
  • At home I also tried all of this after rebooting my home wifi access point.
  • All to no success.

At that point, I remembered the idea of taking backups, so backed up the entire Things data directory, which in my case, was in /Users/john/Things:
$ cd /Users/john
$ rsync -av Things Things-2010-10-01

Note: using “rsync” preserved the timestamps, in case that was part of the synchronization logic.

Here’s the steps that fixed it:

  • remove Things from iPhone
  • exit Things on Mac
  • inside the Things directory on Mac, there is a “Backups” directory. This contains daily backups of your Things data. I copied the oldest backup over the current “latest” Things data file, as follows:
    $ cd /Users/john/Things/Backups
    $ cp DatabaseBackup\ 2010-09-29\ \(653\).xml ../Database.xml

  • reinstall Things on iPhone
  • start Things on iPhone
  • start Things on Mac
  • remove phone from list of devices, add phone into list of devices, re-pair iPhone to Mac
  • exit Things on Mac, Things on iPhone
  • start Things on Mac, Things on iPhone, and click sync.
  • It took several minutes of “Pending…”, but this time the progress bar was moving which gave me hope. After a few minutes of this, success! I could now see all the items on my ToDo on both devices!! OK, it was all from almost a week old backup – but still, encouraging progress.

At this point, my theory was that something happened during the week that corrupted the Things data xml file. The files were all still valid xml files, so something more subtle was wrong. To find when the corruption happened, I repeated these steps for each different backup, each time copying up the next newest backup. In theory, once I found the corrupted xml file, sync should not work again. However, following the process above, each restore attempt worked, all the way to the latest backup! I ended up with the latest contents of my Things-on-Mac finally visible again on my Things-on-iPhone.

Final step was to do a quick test update on Mac, along with another test update on iPhone, then syncing to verify that both changes were handled correctly. This worked fine too – so everything is good!

take care
John.

Speed up TryServer by using TryChooser

1 Comment

If you haven’t already started using TryChooser, please try (pun intended!) it next time you are pushing to TryServer. If you have any questions about the syntax, please use the handy command line generator here, or read the docs. If you have difficulty after doing either of those ping the RelEng buildduty person.

Background

Earlier this summer, we redesigned TryServer so each push to TryServer would run every possible type of job available in mozilla-central. This was the easiest and fastest way to get developers using TryServer as quickly as possible, which helped reduce breakages on mozilla-central. People loved TryServer, usage rocketed and it quickly became the heaviest load on our infrastructure. However, this also meant that every push to TryServer ran a lot of builds/unittests/talos jobs – whether the developer wanted them or not. A lot of CPU time is being wasted on jobs that people did not actually want, which in turn caused delays for other jobs coming next in the queue.

A month ago, lsblakk setup TryChooser [https://bugzilla.mozilla.org/show_bug.cgi?id=473184]. This allows developers to ask for only the builds and tests that they actually want in their push-to-try commit comments. This avoids wasting CPU cycles on unwanted jobs – which speeds up TryServer wait times for everyone. Great! Some people quickly started using TryChooser, which was great in terms of freeing up CPU cycles. However, the uptake has leveled off. Last week, and also the week before, only approx 1/2 the people using TryServer used TryChooser to specify what jobs they wanted. The piechart shows who did/didnt use TryChooser in the period between Mon, 20 Sep 2010 00:00:00 -0700 (PDT) and Sun, 26 Sep 2010 00:00:00 -0700 (PDT)

Please, use TryChooser to help save CPU cycles, and make TryServer faster for everyone!

Infrastructure load for August 2010

1 Comment

Summary:

There were 2,707 pushes in August 2010. This is well above our previous record of 1,971 in January, and 50% above the 1,838 jobs we handled last month. TryServer continues to be the busiest branch of the entire infrastructure, and its worth noting that we did more pushes to TryServer during this month then we did to the entire RelEng infrastructure, combined across all branches, in any given month during first half of 2009.

Overall load since Jan 2009The numbers for this month are:

  • 2,707 code changes to our mercurial-based repos, which triggered 336,910 jobs:
  • 51,217 build jobs, or ~69 jobs per hour.
  • 162,909 unittest jobs, or ~219 jobs per hour.
  • 122,784 talos jobs, or ~117 talos jobs per hour.

Infrastructure load by branch

Details:

  • You can clearly see the drop in load over the last few days in August – caused by a US national holiday, and a Canadian national holiday, on the same long weekend.
  • The trend of “what time of day is busiest” changed again this month. Not sure what this means, but worth pointing out that each month seems to be different. This makes finding a “good” time for a downtime almost impossible.
  • We are still double-running unittests for some OS; running unittest-on-builder and also unittest-on-tester. This continues while developers and QA work through the issues. Whenever unittest-on-test-machine is live and green, we disable unittest-on-builders to reduce wait times for builds.
  • The entire series of these infrastructure load blogposts can be found here.
  • We are still not tracking down any l10n repacks, nightly builds, release builds or any “idle-timer” builds.
  • Anamaria is getting closer to having dashboard reports like this generated automatically – something I’ll rejoice!

Detailed breakdown is :
#Pushes this month

#Pushes per hour

Here’s how the math works out (Descriptions of build, unittest and performance jobs triggered by each individual push are here:
the math behind the graphs

Buildbot job queues now visible from Tinderbox waterfall

No Comments

The Tinderbox waterfall now has two new links at the top.

  • What jobs are ahead of you in the queue? If you landed a patch, but TPBL doesnt show it running yet, you should look here. This will show you what build/unittest/talos jobs, on what OS, are ahead of you in the queue, and give you an approximate idea of how long you’ll have to get to the top of the queue.
  • What jobs are running right now? Self-explanatory, and has more in-progress details than Tinderbox/TBPL while the job is still running.

For both of these, you can search by changeset, and you can sort on the different column headings. If people find this helpful, please send beer+chocolate to nthomas. :-)

Older Entries Newer Entries