12 Mar 2009
JohnUncategorized
Each set of Talos machines used to report results that varied from each other by some unexpected amount. This has been true since Talos started. To deal with that, Talos always had three machines, per o.s., per branch, running on each build, so that humans could eyeball the 3 varying results and go with the majority vote.
The Talos machines became more consistent with each other once we auto-rebooted every Talos machine after every ’5′ runs. (See details in a previous post). We’ve since tweaked it further to reboot each Talos machine after *every* run, so we get results *very* consistent with each other. This has been working great, so:
recently we started to run catlee’s “auto-detect result drift” code on those more-consistent Talos machines. So far so good, and this has quickly detected regressions after checkin.
today Alice enabled the same “auto-reboot-after-every-run” code on the try talos machines. This should make Talos results on the try server more consistent; if you did two talos-try runs in a row on the same code, you should now get very similar results. Details in bug#473819 if curious.
There’s more Talos goodness coming soon: stay tuned!
11 Mar 2009
JohnMozilla, Tech tips, Travel
One of the drawbacks of working when newly arrived in a new, different timezone is how it complicates coordinating meetings with people in other timezones. Being here in Tokyo, this is my first time working while on the other side of the International Date Line and it took me a while to get used to that.
Adding to the confusion, I’ve had trouble keeping all my various electronic calendars in sync with each other; some calendars were in one timezone, some in another timezone, while some ignored timezone and displayed meetings at mixed times. Having finally figured it out, I’m posting here in case others find this useful, and also so I can remember what exactly to undo when I get back to MountainView.
- In Zimbra: click on the “Preferences” tab, and select new timezone from the “Default Timezone” popdown list. Click “save”. Logout. Login. Notice the calendar map existing events to the new local time.
- In OSX
10.5 10.6 on my MacBookPro: click on clock/timer on menu bar, and select “Open Date & Time…”. Select the “TimeZone” tab, pick your new home timezone and then close that dialog box.
- In iCal, go to Preferences->Advanced, make sure that “Turn on timezone support” is enabled and close the dialog box. Now, back to the iCal main display of calendar events, in the top right corner, click on whatever timezone is written in gray font above the search box. This will show a popup list which includes all time zones already enabled in iCal, and has “Other…” at the bottom of the list. If your new timezone is not listed, select “Other…”, add it to the list, and click “ok”. Now back at the iCal main display, make sure your new timezone is selected in the popup list. Notice the calendar now display existing events in the new local time.
- In iPhone: go to “Settings->Mail,Contacts,Calendars”. At the bottom of the list, select “Time Zone Support”. Make sure “Time Zone Support” is “on”, and set “TimeZone” to your new local city. Notice the calendar now display existing events in the new local time. UPDATE: With iOS4.1, I noticed that the new timezone did not change as expected. I went into “General->Date&Time”, turned off “Set Automatically”, then turned it back on,a and presto the timezone changed as expected. joduinn 13nov2010
- Add one test calendar entry into the iPhone and another into Zimbra. Force a sync, and confirm you can see both test entries in iPhone, iCal, and Zimbra – and all are at the right time.
Finally: Get the FoxClocks addon. Its accurate. It takes up very little screen space. Its a real gem. And it handled all the Daylight Savings changes this week just perfectly in “real world testing” (ie when I watched the US-times change Sunday and then re-asked someone in each timezone what their new time was!). Because of my calendar woes, I missed a few meetings this week, but things would have been much worse without FoxClocks!
10 Mar 2009
JohnUncategorized
A cafe near my house claims this sign is very effective! 

09 Mar 2009
JohnUncategorized
After the last few blog posts [1], [2], [3] explaining details of some gotchas, here are a few mechanical ideas which might help. I’m not sure which, if any, make sense to try, but wanted to point out some new options we have available to us now after infrastructure improvements over the last year, and would love to hear what people thought.
1) Make major update available on the day of a release but only visible to users on latest dot release who do manual “check for updates”.
To take a concrete example, this would mean on the day of the FF3.5 release, that any user on the latest FF3.0.x release would be able to “check for updates” and immediately major update to FF3.5. Explicitly only users who were on the latest FF3.0.x and who actively “check for updates” would get this offer. Any users who are on older FF3.0 releases who do “check for update” would be updated to the latest FF3.0.x release.
This new option is only possible because of pre-release work done by both RelEng and QA, to verify that major update from FF3.0.n->FF3.5rc goes smoothly before the FF3.5 release, and an enhancement to AUS by
morgamic. We first practiced this in the lead up to the FF2->FF3.0 release, and were able to do major update much faster and smoother as a result, but the FF3.5 release will be first time we do this live on release day. Why is doing this on release day so important?
- Better mindshare: Mozilla typically gets more attention/blogs/press coverage on major release days. Making the most of this momentum in user mindshare by having major updates available and waiting for users who want them seems useful.
- Better experience for existing users: During the FF3.0 release, we saw an abnormally high spike in number of people doing manual “check for updates” on release day. We also saw a drop in FF2 users and an increase in FF3 users long before we released our first FF2->FF3 major update. One theory was that FF2 users saw press coverage about the new FF3.0 release, did “check for updates”, saw “no update available” so then manually downloaded FF3 and do a pave-over install. We can make the upgrade experience for our existing users better by letting any existing user who does “check for updates” get the major update, instead of having to manually download + pave-over manual install.
- Safer: A user on older FF2 who did check for updates would first get the update to FF2.0.0.20, and if they do “check for updates” again, would get the FF3.0 update. What we’d like to avoid is having users on old FF2.x doing paveover installs of FF3.0, and potentially discovering unknown migration bugs that we have not tested for. Having users follow a tested migration path seems safer.
- This would *not* do major update offers to idle users, only to users who explicitly take manual steps seeking out new release. This addresses worries about a) too much load on mirrors on release day and b) users upgrading to something that has some critical bug we don’t know about yet.
2) Every time we do a dot-release on new release line, make a new major update offer available to the older release line.
This approach basically views each major update offer as a new “security release” for the EOL’d older product line. For example, if we did this today, it would mean that whenever we release a new FF3.0.7, we would also release a new FF2.0.0.20->FF3.0.7 major update offer… and later when we release a new FF3.0.8, we also release a new FF2.0.0.20->FF3.0.8 major update offer.
- this would mean that users on the older FF2.0 release will be prompted to major update as often as users on FF3.0.x are prompted for new security releases. If there are concerns about update fatigue, we could make this major update offer always be visible to users who do a manual “check for update”, and only chose to make this major update offer visible to idle users at certain times.
- this would mean users who skipped over a major update offer previously (see earlier blog for details) would have improved odds of seeing *any* major update offer.
- mechanically, we could do this new major update offer at the same time as the actual new dot release, but I think it makes more sense to wait a few days just to make sure there are no respins/zerodays.
3) Offer major update for multiple dot releases
Historically, we only provide major update offers from latest dot release because:
- it has the most users
- they are already shown to care about updating and
- thats all we had time to generate and test updates for.
However, mechanically now that we have streamlined the process, we can now also cherry pick some other dot-releases to provide major update offers for at the same time. For example, we could provide major updates
for the top three most populated dot releases (today that is FF2.0.0.20, FF2.0.0.18 and FF2.0.0.14) at the same time. However, some concerns are:
- this takes some extra work in RelEng, and has more significant testing load on QA, so how often this is even possible depends on other work at the time.
- the users still on FF2.0.0.14, FF2.0.0.18 have already said “no” to minor updates for FF2.0.0.19, then FF2.0.0.20, so its not clear how many would accept a major update offer.
4) Change the major update dialog
Changes to the update system are always scary, and slow to roll out, because we have to wait an entire product release cycle. However, while they are kinda beyond the scope of this blog post, I wanted to raise two suggestions here:
Right now, when a user gets a major update offer, and clicks “Later”, that actually means “no and don’t ask me again until you create a new different major update offer”. What if we changed the client updater as follows:
- Rename the “Later” button to “Not this time”; no change in functionality
- Add a new “Ask me again when I next start/exit” button, which would find what major update offer was available on next start/exit of browser, and re-prompt the user.
- or maybe simpler to change the “Later” button functionality to ask user again on next start/exit, and not add a 4th button?
- In a recent brainstorming session with beltzner one Friday night, another possible enhancement was to track when the user sees a major update offer, and don’t re-prompt that user for another major update offer within ‘n’ days of the last major update offer that user saw. This would help address the user skipping over major updates, and also prevent user-update-fatigue concerns.
Hopefully those scenarios all make sense. There’s been a lot of behind-the-scenes plumbing improvements over the last 18months, so hopefully these blog posts help raise awareness for some new options that are now mechanically possible.
UPDATE: Added dialog box screenshot, along with suggestion of changing functionality of existing “Later” button. John 10mar2009
09 Mar 2009
JohnUncategorized
There were lots of comments and questions raised from the last two blog posts [1] [2]. As some people were asking variations of the same questions, I thought it would be less confusing to summarize the questions and answers all here instead of replying in comment threads. Please let me know if I missed any of the comments, or misunderstood the questions, ok?
Q) We make minor updates to all previous dot releases, why not make major updates available to all previous dot releases also?
A) Every time we do a minor release, we make updates available for *all* previous dot releases in that product release train. For user efficiency, we make partial updates (smaller, faster downloads) available for the previous dot release. For our own sanity, we make full/complete updates (larger, slower downloads) available for all other older dot releases. To give a concrete example, when we released FF3.0.5, we made partial updates available to users on FF3.0.4, and we make full/complete updates available to users on FF3.0.3, FF3.0.2, FF3.0.1 and FF3.0.0. In the diagram, the partial updates are noted by dotted-lines, the full/complete updates are noted by dashed lines. (For larger image, click here). This works on the assumptions that:
- the release-drivers team only allows highly restricted set of changes into a security release that do not impact user migration, and
- that the physical bits users get for “full/complete update” are identical.
This means that QA can test partial update and then test some of the full/complete upgrade combinations, but not have to test *all* of them. This is a very significant time saving.
When we released the major update from FF2.0.0.20->FF3.0.5 in December2008, we took the same identical existing full/complete updates already produced as part of the FF3.0.5 release, and made those full/complete updates visible to FF2.0.0.20 users.
If we wanted to, we could certainly make those same full/complete updates available to migrate users from FF2.0.0.19->FF3.0.5, FF2.0.0.18->FF3.0.5, FF2.0.0.17->FF3.0.5…. However, major updates *do* contain changes that impact user migration, which makes QA correctly more concerned about thorough testing each possible migration path, on each o.s., in each locale, which implies a *lot* more manual work. To keep this combination-explosion practical, we only make major update offers available to one specific dot release at a time. By convention, we’ve always targeted the latest dot release, because:
- that is where most of the users are, and
- those are the users who’ve already shown interest in keeping up to date. This does also mean that someone on an older dot release has to upgrade first to latest, and *then* get to see the MU offer. For those laggard users, it seems like a suboptimal two-upgrades-in-a-row, which would be totally avoidable if the user upgrades as soon as we make each offer available, more importantly this is safer because the user is moving forward along tested migration paths.
Q) Can we somehow tell users who are prevented from getting major update (“no admin privs”, or “unsupported o.s.”), exactly what is preventing them from getting the updates, and maybe advise them what to do?
A) For users with “no admin privs”, the “check for updates” menu is grayed out, and the background idle checking is disabled. This means that we never hear from them, and therefore cant send any messages back. It might be possible to have something on the client side which notifies the users that they are running on locked-systems, but it might quickly frustrate the users, as they cant do anything about it! Any and all suggestions are really welcome here.
For users on “unsupported o.s.”, they do contact us, we just do not have any FF3.x updates available for them. We could, in theory, generate some fake major update which was only a message dialog, telling people to upgrade to a secure, supported o.s. However, historically, we’ve been cautious about doing that, because users on old computer might not have a choice, or the spare cash to do this, and they’d then get bothered by the recurring notices.
Hopefully, that explains how we got to where we are today. However, both of these areas are excellent problems that we’re still wrestling with. If you’ve any ideas/suggestions, please do let us know – we’re all ears!!
Q) Currently, whenever we offer a minor update, it cuts off users from seeing the major update. Shouldn’t it be the other way around – and the major upgrade be more important then the minor one?
A) No. As much as we love our shiny new release and would love people to major update to it, doing a major upgrade is more disruptive to users. Profiles migrate. Awesome bars get added. Icons move/change. All this needs some getting used to, and some people may / maynot like the changes. However, the minor changes only contains critical security fixes, few (if any) user visible changes and the sooner we get that to users, the safer they are.
Q) Could Mozilla always leave a major update option available for users on latest FF1.5.0.x and FF2.0.0.x?
A) Yes, we could, and I believe we should always leave a major update offer standing on the last dot release of a product. There are some limitations, but personally, I really like that idea. Its something that we were never great at in the past because there was just too many things in the air at a time. However, now that we’re stabilizing the infrastructure, we’re able to start paying better attention to things like this. Part of the reason for all these posts about major updates was because I noticed that for the recent FF2->FF3.0 major updates, we had long periods of time where there were *no* major updates available to users. And this was when we were supporting both release trains, were actively trying to move people from FF2->FF3.0, and were trying to figure out why so many people were not migrating!?!?!? Now that producing these major updates is streamlined, we can consider some options that were not possible before.
Answering this question properly is a little more complex, and involves some proposals in my next blog post, so I’ll defer the rest of the answer to there, ok?
Hope all that makes sense.
John.
08 Mar 2009
JohnUncategorized

Tonight’s homework was figuring out the combined washing-machine-and-tumble-dryer here in the hotel room.
Each of the 4 big dials on the left, and the 5 smaller buttons in the middle, are actually interconnected multi-state buttons. By repeatedly pressing one of those buttons, you light up different parts of text on the big buttons, or text above the smaller buttons – and also restrict what choices all the other buttons can make. To add to the fun, there are buttons that seem to duplicate functionality. This all felt needlessly confusing.
On the cool side, you can program this to start washing ‘n’ hours later, it has a button to enable “low noise washing for night operation” and all the buttons have braille on them.
For all the complexity and high tech stuff around here, it feels low-tech to have to guess how long a specific mix of clothes will take to dry, and then set that time on dial#4. By contrast, my dryer at home has a built-in moisture sensor, and will automatically stop when the clothes are dry. Of course, it has a few buttons to allow you customize cycles if you want, but the defaults are good enough that I usually just throw clothes in the washer or dryer, and press the one “go” button.
After all my clothes came out clean and dry, I treated myself to some coffee with milk and biscuits:
 
08 Mar 2009
JohnUncategorized
Just before leaving San Francisco, John Lilly lent me this brochure.
Not sure how many others already knew of this park, but it was news to me, so I thought I’d share a couple of quick photos.
Of course, Shiretoko National Park is famous for its red (fire?) fox! 

While it looked interesting, my trip is already booked solid, so I’ll have to leave this until another visit.
06 Mar 2009
JohnUncategorized
Here’s another one. The ubiquitous “Do Not Disturb” signs you see hanging on hotel room doors all over the world.

I didn’t think anything of it until I was heading out, tried to use my “Do Not Disturb” sign, and realized that something was wrong. My “Do Not Disturb” sign was not hanging on the inside handle of my hotel room door as expected. Instead it was somehow stuck *on* the inside of my room door.
Turns out the door is actually metal, coated with wood-like veneer, and the “Do Not Disturb” sign is basically a large fridge magnet in the shape of a “Do Not Disturb” sign.
Pros:
- its easier to throw this sign anywhere on the door, instead of threading it on the door handle
- normal signs can swing when opening/closing the door and fall off the handle, or get caught in the door frame. Using a magnetic sign avoids all that.
- this is designed so the old way of using this sign still works – you can still hang it on a door handle if you want to.
- the *shape* of a “Do Not Disturb” sign is important. If they had made this in a regular rectangle fridge-magnet shape, I would have ignored it, assuming it was a permanent sign inside the door, along with the signs for fire escape routes, and posted room rates.
Cons:
- more expensive to make then a “normal” paper sign
- designed too well to look like a “normal” door sign. It looks so convincingly like a “normal” paper sign that every sign I’ve seen posted in this hotel so far were all hanging on the door handle like usual. I suspect most people don’t even realise the sign is also a magnet.
So, while I like the idea, it seems like people just keep using the sign like it was the cheaper paper version, so not sure if this counts as a “success”.
05 Mar 2009
JohnUncategorized
In all the edits/revisions of my earlier “Fun and Games with Major Updates” blogpost, somehow I dropped the following by accident:
Disclaimer: for the purposes of simplicity, I intentionally avoided mentioning two edge cases:
- We never make major update offers to a user if they are running on an o.s. that is supported in FF2, but *not* supported in FF3. After all, if the user accepted that major update offer, they would be broken as soon as the upgrade completed! A user on an older unsupported o.s. who does “check for updates” would simply see “no updates available”… which is true for that o.s.! As you would expect, if that FF2 user upgrades their o.s. to something that *is* supported in FF3, then when the user next does “check for updates”, they will see the major update. The numbers are small enough anyway as to not make a significant difference to the discussion, and it would have really complicated the diagrams even further!
- Firefox installations on locked-down machines never check for updates. This is true for both minor updates and major updates. Typically these are machines that are locked down by IT dept, or where the user does not have Admin privileges to install new software updates. As we never hear from these users ever, we have no idea how many users this impacts, and they are not included in any of our user counts.
tc
John.
05 Mar 2009
JohnUncategorized
The data for January on the volume of changes (see here, and here) was quite useful, so here is the data for February.
In February, people pushed 940 code changes into the mercurial-based repos here in Mozilla. That is slightly lower then last month, but then again, February is a shorter month, and there’s been a prolonged freeze for FF3.1b3.

As each of these pushes triggers multiple different types of builds/unittest jobs, the *theoretical* total amount of work done by the pool-of-slaves in February was 9,559 jobs. For each push, we do:
- mozilla-central: 11 jobs per push (L/M/W opt, L/M/W leaktest, L/M/W unittest, linux64 opt, linux-arm)
- mozilla-1.9.1: 10 jobs per push (L/M/W opt, L/M/W leaktest, L/M/W unittest, linux64 opt)
- tracemonkey: 7 jobs per push (L/M/W opt, L/M/W unittest, linux64 opt)
- theoretical total: (579 x 11) + (221 x 10) + (140 x 7) = 9,559 jobs. Or an average of 14.2 jobs per hour for the month. (Considering how many of our jobs take over an hour to complete, this is quite scary!)
Hopefully people find this is interesting, its certainly useful for RelEng as we try to make sure we have enough machines to keep up.
Older Entries Newer Entries