De-tangling timestamps: part2

Yesterday, Alice de-tangled one more part of the messy time stamp problem by fixing bug#419487.

All data points for time on Talos and graph server now use “build time”, not “testrun time”.

This will greatly simplify a lot of manual regression triage work for people looking at performance graphs on graph server. Now, if you are tracking down what change caused a perf regression:

  • Before 8am PDT Wednesday, 25 June 2008, all charts use “testrun time”. This means debugging regressions require manually padding a regression range multiple hours wider  – enough to catch from start of build through queue to job starting on available slave. Different O.S. take different amounts of time, and any machine hiccups really complicate this padding-guess-work further. If you get it wrong, you can incorrectly rule out bad changes, so pad out more then you think. It means extra triage work, but is safer.
  • After 8am PDT Wednesday, 25 June 2008, all charts use “build time”. We’re still fixing other problems with timestamps, so you still need *some* extra range padding, but much less padding then before. At most manually pad out to the next/previous hour. This padding should be fixed once the BuildID changes in bug#431270 and bug#431905 are landed.

Anyone curious for details should read Alice’s recent post to mozilla.dev.builds and mozilla.dev.performance (”change in talos time stamps (as of 8am PDT June 25th 2008)”), bug#291167, bug#417633 and bug#419487. This is a continuation of the work described (in tedious detail!) in my previous blog post.

This sounds like a small simple change, but it was not. Its a tricky, complex, area in the infrastructure, with lots relying on it, and lots of different people with different assumptions about how time is used here. There was lots of behind-the-scenes homework on this, and to avoid causing any confusion, we held off landing this until after Firefox3 shipped.

Tip of the hat to Alice for pushing this through to production so smoothly.

One thought on “De-tangling timestamps: part2

  1. […] the Tinderbox, Builds, Unittest, Talos and GraphServer systems all needed ways to uniquely identify builds, so each system invented their own (different!) way of doing this… all because the old BuildID was not good enough. Matching up these different systems – how to post a talos result on tinderbox for example – is quite a headache. We’ve already done some cleanup (here and here), but this new BuildID should let us simplify much more of that cross-system integration. […]