“Brag! The art of tooting your own horn without blowing it” by Peggy Klaus

1 Comment

Normally, a small book like this (193 pages) would be a quick read for me, but this book took me literally months. Not, I hasten to add, because of any problems with the book or the writing style, that was all fine. The problem was that this book uncovered a bunch of things I am personally working through. I found myself reading a few pages, highlighting some lines, then walking away thinking. Repeat a few times a week. Occasionally, I’d go back and re-read entire chapters.

For me, bragging has negative connotations and is something I avoid like the plague. Stereotypes of obnoxious, pretentious people, loudly telling all within range just how great they are. The very last thing I ever want to be. Whether that is cultural, learned from family, something I developed myself growing up, or a mixture, I don’t know. But it is part of who I am. This book is all about encouraging people to find a comfortable place in between these extremes. As Peggy is quick to note, this means different things to different people, so you need to pay attention to what is authentic for you, as that authenticity is important. People have generations of experience spotting fakes, and worst of all, deep down, you’ll know you are faking it too.

Because of the book title, it took several people pushing to get me to even start reading this book. Chapter#1 opened with a line that stopped me dead in my tracks.

“Myth#1: A job well done speaks for itself.”

I’ve always thought that if I did a good job, or handled a tricky situation well, people would notice. If I solved some complex problem, that people would understand the complexity, understand the importance of the achievement and appreciate the work. In those circumstances, having others recognize and complement the achievement was fine, but any attempt on my part to “brag” about my work would in some way “cheapen the victory”. After reading this book, I now think that is *sometimes* true but not always true. While the people working beside me in the same trenches, working side-by-side with me on the problem might understand the scale of the accomplishment, most people simply don’t know the details. Over time, people might eventually notice that a recurring problem hasn’t happened in a while, or they might simply forget about a previously-annoying problem because it hasn’t happened in a while… but they’d never stop and wonder why. Another common trend is for people to not notice one problem is fixed, but instead notice that a different problem has “appeared”. Oh, and meanwhile, people don’t know what you are working on. Over time, this becomes frustrating for everyone. After reading this book, I’ve learned that I need to make sure I inform people of the work I’m doing, and why it’s important to them. I don’t need to go into all the complexities of the project, unless they ask for more details, but it’s important to make sure others are aware of my work, and the impact it has on them and their work.

I found this a tough read, yet super worth the time. And, yes, I strongly recommend it.

“It ain’t bragging if you done it” (Dizzy Dean)

San Francisco Car Culture: Flowers on your antennae

1 Comment

If you’re going to San Francisco
Be sure to wear some flowers on your hair^Wantennae?!?

(With respectful apologies to the late Scott McKenzie’s song from 1967: “If you’re going to San Francisco, be sure to wear some flowers in your hair…”)

protip: I like how the flowers aerodynamically sweep back, and how zipties keep the flowers safely attached when driving fast.

UCBerkeley “New Manager Bootcamp”

1 Comment

Earlier this week, I had the distinct privilege of being invited to be on a panel at UCBerkeley’s “New Manager Bootcamp“.

This was my first time participating on an “expert panel” like this, so I really wast sure what I was getting myself into.

The auditorium was packed with ~90 people, all seasoned professionals from a range of different companies and different industries. They’d spent a bunch of time in workshops, listening and learning in an intensive crash-course. Now the tables were turned – they got to set the pace, and ask all the questions. After intros, and one “warm up” question from the organizer, the free-flow open questions started. From all corners of the room. Non-stop. For 75mins.

panel speakers

The trust and honesty in the room was great, and it was quickly evident that everyone was down-to-earth, asking brutally honest questions simply because they wanted to do right with their new roles and responsibilities.

The first few questions were “easy” black-and-white type questions. Things quickly got interesting with tricky gray-zone questions for the rest of the session. Each panelist responded super-honestly on how we’d each handled those tricky situations. Given that we all came from different backgrounds, different cultures, different careers, it was no surprise that we had different perspectives and attitudes for these gray-zone questions. We even had panelists asking each other questions, live on stage!?! As individual panelists, we didnt always agree on the mechanics of what we did, but we all agreed on the motivations of *why* we did what we did: taking care of people’s lives, and careers, individually, as part of the group, and as part of the company.

I found this educational, and I hope it was useful for the people asking the questions! Afterwards, I spent time in a nearby coffee shop quietly thinking about the questions, and reliving the different experiences behind the answers I shared on stage.

Unexpectedly, I was also asked to come back the next day, to talk about “we are all remoties“. Turns out that geo-distributed groups was a popular topic of discussion throughout the bootcamp, but I was still surprised at the level of interest when Homa asked for a quick show of “who would be willing to skip lunch for an extra session on remoties” and almost everyone jumped up! The “remoties” presentation was rushed, because of the tight time grabbing food-to-go, making sure not to delay the other scheduled sessions, and the flood of questions. Yet, people were fully engaged, sitting on the floor with food, asking great questions, and really excited by what was possible for distributed groups when the mechanics were debugged.

Distributed work groups are obviously a big issue, not just in open source software projects, but also in a lot of other companies in the bay area.

Big thanks to Homa and Kim for putting it all together. The timing of this was fortuitous, and I found myself thinking about possible ideas for Mozilla’s ManagerHacking series that morgamic revived recently and will be coming up again in a few weeks.

Infrastructure load for March 2013

No Comments

  • #checkins-per-month: We had 6,433 checkins in March 2013. This is well past our previous record of 6,247 in Jan2013. Every working day was consistently busy (>200 checkins per working day) and load-per-day was busy across longer periods of each day.


    Overall load since Jan 2009

  • #checkins-per-day: On 18mar, we had 323 checkins – a new record for a single day, breaking our previous record of 307 checkins-per-day on 06jan2013. During March, 20-of-31 days had over 200 checkins-per-day – thats every working day except 28mar (because of Easter weekend?). 13-of-31 days had over 250 checkins-per-day (3-of-31 days had over 300 checkins-per-day!).
  • #checkins-per-hour: Checkins are still mostly mid-day PT/afternoon ET, but the load has increased across the day. For 9 of every 24 hours, we sustained over 10 checkins per hour, the heaviest sustained use we’ve seen so far across our day. Heaviest load times this month were 2-3pm PT (13.22 checkins-per-hour).
  • As usual, our build pool handled the load well, with >95% of all builds consistently being started within 15mins.

    Our test pool situation continues to improve, as we continue migrating any test jobs that do not *require* hardware to AWS. As before, any test suite which we can run on AWS means double goodness: the AWS-based test suites have great wait times on AWS, and the remaining physical-hardware-based test suites have slightly improved wait times because fewer jobs are being scheduled on our scarce hardware. Even so, its not yet as great as the situation with our builders. For the tests that *do* require hardware, it continues to be a slow process to bring those additional physical machines online. Meanwhile, RelEng, ATeam and devs continue the work of finding test suites which should (in theory!) be able to run on AWS, then fixing them to make them run green. Once a test suite runs green on AWS, RelEng stops scheduling that test suite on physical machines.

    If you know of any test suites that no longer need to be run per-checkin, please let us know so we can immediately reduce the load a little. Also, if you know of any test suites which are perma-orange, and hidden on tbpl.m.o, please let us know – thats the worst of both worlds – using up scarce CPU time and not being displayed. Every little helps put scarce test CPU to better use.

mozilla-inbound, mozilla-central, fx-team:
Ratios of checkins across these branches remain fairly consistent. mozilla-inbound continues to be heavily used as an integration branch, with 27.9%% of all checkins, consistently far more then the other integration branches combined. As usual, fx-team has ~1% of checkins, mozilla-central has 1.6% of checkins.

The lure of sheriff assistance on mozilla-inbound continues to be consistently popular, and as usual, very few people land directly on mozilla-central these days.

Infrastructure load by branch

mozilla-aurora, mozilla-beta, mozilla-b2g18, gaia-central:
Of our total monthly checkins:

  • 2.4% landed into mozilla-aurora, very similar to last month.
  • 1.6% landed into mozilla-beta, very similar to last month.
  • 1.5% landed into mozilla-b2g18, very similar to last month.
  • 4.8% landed into gaia-central, slightly higher then last month. gaia-central continues to be the third busiest branch overall, after try and mozilla-inbound. Obviously, these checkins are *only* for the B2G releases, so worth calling out here.

misc other details:

  • Pushes per day
    • You can clearly see weekends through the month. Its worth noting that we had >200 checkins-per-day every working day in March except 28mar (because of Easter weekend?).
      #Pushes this month

    • Pushes by hour of day
        Mid-morning PT is consistently the biggest spike of checkins, although this month the checkin load stayed high throughout the entire PT working day, and particularly spiked between 2-3pm PT, with 13.22 checkins-per-hour.

      #Pushes per hour

Behind the scenes prep for B2G workweek

2 Comments

In case anyone missed this during this morning’s Mozilla Foundation call – here’s a quick summary of all the invisible prep-work that helped make last week’s B2G workweek so awesome.

1) Nightly builds
* now generated for Arm (panda boards), Otoro, Unagi, Unagi-ENG, Inari, Hamachi, Leo
* for that set of devices, we generate “nightly” builds twice a day. Once for 8am PDT morning. Once for 8am Madrid CET morning.
* … on each of mozilla-central, mozilla-b2g18, mozilla-b2g18_v1_0_1

2) Stood up an extra 250 slaves. More importantly, created 22 masters in AWS so we now have 70 masters total (with 30 in AWS) and can quickly burst-grow-capacity to create more slaves if needed.
* Reimaged 80 in-house build & test machines to optimize for Firefox OS development, based on watching load and usage at the last workweek.

3) Set up an alternate “birch” branch to use mozilla-inbound; By having b2g workweek developers use “birch” instead of mozilla-inbound, this allowed b2g-workweek developers a faster, less crowded, branch to land on, and reduced risk of blocking whenever a non-b2g change blocked mozilla-inbound.

Did all that work help? By all accounts yes. But of course, the proof in the numbers. Last week, 1490 checkins were landed, and all systems held super-responsive (>95% of jobs handled on time throughout the week, with one dip down to >90%!). Impressive to see the infrastructure handle the load like that.

Please give a big hug and thanks to RelEng/ATeam/IT, especially the following:

catlee, rail, hwine, armenzg (RelEng)
ctalbert, jmaher, jgriffin, edmorley, ryanvm (ATeam)
dmoore, arr, fox2mike, vinh, jakem, solarce, sheeri, klibby, sal, van (IT)

At Mozilla, ReleaseEngineering == Release Automation + Continuous Integration

3 Comments

Recently, I was asked to lead a discussion with a few VPs within Mozilla about the scope of Release Engineering at Mozilla. Each VP was well established in their career, technically-seasoned, smart, and each brought their own different preconceived notions of what RelEng means, each with different terminology, each from their own perspectives from their own different previous companies. To make things even more interesting, different organizations have different ideas and terminology on what they mean by “Release Engineering”, so getting everyone on the same page was going to be interesting… and important to get right, if we were all to work well together.

This blogpost is a quick summary and if curious, PDFs of slides are here.

At Mozilla, Release Engineering covers two main topics:

1) Release Automation:
People who are not day-to-day-developers typically think of this first. How efficient is the software delivery pipeline within a software organization? How long it takes from “go to build a release” to “users can start downloading updates”? The faster and more reliable this software delivery pipeline, the more competitive the company can be in the marketplace. This used to be where Mozilla’s RelEng, as a group, spent most of their time, sleeping in the office, getting bribes for releases, and all that drama. Now, thankfully, our automation is really great, so chemspills are super-quick (great for our users) and mostly-hands-off (great for the humans in RelEng). There’s still lots to improve, and always some adjustments because of changing-product-requirements, but its already night-and-day improved since 2007. It continues to improve even since we wrote about it in a book!

2) Continuous Integration:
Day-to-day developers think of this, and deal with this, every single day. Anyone doing code changes at Mozilla keeps an eager eye on tbpl.m.o to see if their change is all green (good!), they can close out their bug as FIXED and move on to the next bug. Making the Continuous Integration process more efficient has allowed Mozilla to hire more developers to do more checkins, transition developers from all-on-one-tip-development to multi-project-branch-development, and change the organization from traditional releases to rapid-release model. This required RelEng to scale up significantly in the last <6 years, from a humble 86 machines to ~3,400 machines spread across 4 physical Mozilla colos as well as 3 Amazon AWS regions. Here’s a quick summary diagram of how all these machines are interconnected, which RelEng knows by heart, but which I couldn’t find posted anywhere so I drew as part of doing this presentation.

This was a fun meeting. My favorite quotes from the lively back-forth were: “every software company lives-or-dies by the efficiency of its development process and its software delivery pipeline” …and… “everyone interacts with different parts of the elephant, so everyone has very different ideas of what they are looking at”.

Hopefully, others find this interesting too. Of course, if you have questions or comments, please post them below, or drop me an email.

Firefox 19.0.2 by the (wall-clock) numbers

2 Comments

(Its been a while since my last “by the wall-clock numbers” post. After last week’s CanSecWest, I thought people might be interested in how much Mozilla’s pipeline for delivering code to users continues to improve. This was even noted by the pwn2own contest sponsors during the CanSecWest conference! ).

Firefox 19.0.2 was released on Thursday 07-mar-2013, at 16:40PST. From “go to build” to “release is now available to public” was 16h 26m wall-clock time, of which, the Release Engineering portion was 11h 27m. The wall clock times were:

00:21 07mar: ReleaseCoordinators say “go” for FF19.0.2
02:14 07mar: FF19.0.2 builds started
04:06 07mar: FF19.0.2 android signed multi-locale builds handed to QA
07:07 07mar: FF19.0.2 linux builds handed to QA
08:03 07mar: FF19.0.2 mac builds handed to QA
11:40 07mar: FF19.0.2 signed-win32 builds handed to QA
11:58 07mar: FF19.0.2 update snippets available on test update channel
13:05 07mar: ReleaseCoordinators say “go for release”; “ok to start mirror absorption”
13:11 07mar: ReleaseCoordinators say “go for push to Google play store”
13:29 07mar: FF19.0.2 android pushed to Google play store
13:55 07mar: mirror absorption started
14:00 07mar: mirror absorption good enough for testing
14:31 07mar: QA signoff on updates on test channel
15:26 07mar: ReleaseCoordinators say “go” to make updates snippets live.
15:40 07mar: update snippets available on live update channel
15:54 07mar: QA signoff on updates on live release channel
16:40 07mar: release announced; all done.

In addition to FF19.0.2, I note that we also had to ship FF17.0.4esr, FF20.0b4, Thunderbird 17.0.4, Thunderbird 17.0.4esr, Thunderbird 20.0b1 (build2) in the same super-fast way. Obviously, we don’t want to ship this number of products, this quickly, all the time, but its nice to know that we can if we have to. Really, really, nice. And yes, we quietly continue work to make this delivery pipeline even more efficient! :-)

Notes:

1) I continue to measure the time between “dev says go” to “release is available”. Explicitly I do *not* measure from “fix is reported” to “release is available”, because I don’t want to put any further time pressure on a developer trying to fix a problem under pressure. It feels much better to me to work a little longer to get the fix right instead of adding even more time pressure looking for a quick fix, and then having to do another emergency release a few days later to fix the “quick fix”.

2) As usual, if you are curious for more details of the actual work done, you can follow along in tracking bug#848753, and various linked bugs.

Thank you to everyone in OpSec, RelEng, QA, IT and ReleaseCoordinators who make this all possible. It was a really busy few hours, but great to see everyone calmly pile, doing what they can to help out. The end result made us proud by our users.

John.

Infrastructure load for February 2013

No Comments

  • #checkins-per-month: We had 5,382 checkins in February 2013. This drop from last month surprised me. Maybe January was abnormally high because of the first-week-back-after-holidays rush, combined with the B2G workweek? Maybe February was abnormally low because it was a short month, combined with restrictions to checkins as we approached B2Gv1.0.0, B2Gv1.0.1 and Mobile World Congress? Next month’s numbers will help show the trend here, but meanwhile, if you have opinions, I’d be curious to hear them.


    Overall load since Jan 2009
    As usual, our build pool handled the load well, with >95% of all builds consistently being started within 15mins.

    Our test pool situation continues to improve, but is not yet as great as the situation with our builders. We’re making good progress, but the rate of checkins, the improved capacity of the build machines to generate more builds that need testing, the ever-increasing number of test suites to run on each build and the hardware specific nature of some test suites make this test capacity problem harder to solve. New hardware is still (slowly) coming. Meanwhile, RelEng, ATeam and devs continue the work of finding test suites which should (in theory!) be able to run on AWS, then fixing them to make them run green. Once a test suite runs green on AWS, RelEng stops scheduling that test suite on physical machines. This means double goodness: the AWS-based test suites have great wait times on AWS, and the remaining physical-hardware-based test suites have slightly improved wait times because fewer jobs are being scheduled on our scarce hardware.

    Of course, some tests *need* hardware, so we’re continuing work to buy and power up more test machines to increase test capacity anyways; please continue to bear with us while this happens. Oh, and of course, if you know of any test suites that no longer need to be run per-checkin, please let us know so we can immediately reduce the load a little. Every little helps put scarce test CPU to better use.

  • #checkins-per-day: During February, 18-of-28 days had over 200 checkins-per-day, and 8-of-28 days had over 250 checkins-per-day (the high-water-mark for the month was 20feb with 270 checkins).
  • #checkins-per-hour: Checkins are still mostly mid-day PT/afternoon ET, but the load has increased across the day. For almost 33% of every day (7 of every 24 hours), we sustained over 10 checkins per hour. Heaviest load times this month were 10-11am PT (14 checkins-per-hour – a new record, exceeding our previous record of 13.36 checkins-per-hour set in November2012!).

mozilla-inbound, mozilla-central, fx-team:
Ratios of checkins across these branches remain fairly consistent. mozilla-inbound continues to be heavily used as an integration branch, with 28.3% of all checkins, consistently far more then the other integration branches combined. As usual, fx-team has ~1% of checkins, mozilla-central has 2.2% of checkins. The lure of sheriff assistance on mozilla-inbound continues to be consistently popular, and as usual, very few people land directly on mozilla-central these days.

Infrastructure load by branch

mozilla-aurora, mozilla-beta, mozilla-b2g18, gaia-central:
Of our total monthly checkins:

  • 2.5% landed into mozilla-aurora. This is slightly lower then normal
    aurora levels, and expected since b2g changes are no longer being landed
    into aurora and beta.

  • 1.3% landed into mozilla-beta. This is slightly lower then normal
    beta levels, and expected since b2g changes are no longer being landed
    into aurora and beta.

  • 1.8% landed into mozilla-b2g18. These checkins are *only* for the
    B2G releases, so worth calling out here.

  • 3.1% landed into gaia-central, making gaia-central the third
    busiest branch overall, after try and mozilla-inbound. Obviously, these
    checkins are *only* for the B2G releases, so worth calling out here.

misc other details:

  • Pushes per day
    • You can clearly see weekends through the month.

    #Pushes this month

  • Pushes by hour of day
      Mid-morning PT is consistently the biggest spike of checkins, although this month the checkin load stayed high throughout the entire PT working
      day, and particularly spiked between 10-11am PT.

    #Pushes per hour

“We are all remoties” at Haas MBA, U.C.Berkeley

6 Comments

Last weekend, I was super-honored to be a guest speaker at the Haas Berkeley MBA program. My session was part of their “Global Teams” module, where they cover the theory and practice of effective teamwork, managing in global companies, and managing in fluid/rapidly changing environments.

My host, Homa Bahrami, invited me to show how Mozilla’s Release Engineering group has pushed the envelope, and had developed a well-tested concrete set of tips+tricks which allow a geo-distributed group to work highly effectively.

People’s attention was caught right at the start by my summary and graphic showing just how distributed Mozilla’s RelEng group actually is:

    * 16 people
    * 15 locations
    * 4 non-adjacent timezones
    * 0 in “headquarters”


 

By comparison, most people think of remoties as either:
…or…

The fact that any group could work together this effectively while being so geo-distributed was startling to them. Add to that, the fact that this group has been able to create strategic-level improvements to Mozilla’s software development abilities, hence increasing Mozilla’s options in the marketplace, generated even more interest.

Overall, the entire session was lively and interactive, with great questions, and discussions back-forth across the room. Everyone was fully engaged all the way… even after the lunch food arrived, we continued the discussions in the corridor outside the room.

I was delighted by the insightful questions, and very interested to hear the different perspectives that everyone brought from their varied backgrounds outside the MBA program.

For me, personally, I found it re-affirming to hear that the tips+tricks that we’ve built up within RelEng over the years are applicable to other groups, and other organizations.

It was a thoroughly wonderful experience. Big thanks to Homa for the invite, and to everyone for their full-on engagement.

[For a PDF copy of the entire presentation, click here or on the smiley faces! For the sake of my poor blogsite, the much, much, larger keynote files are available on request.]

Infrastructure load for January 2013

No Comments

  • #checkins-per-month: We had 6,247 checkins in January 2013. This exceeds our previous all-time record of 5,893 in October2012.

    As usual, we handled this load with >95% of all builds consistently being started within 15mins. Sadly, our test pools continue to have a hard time, both with the increased rate of checkins, and the ever-increasing number of test suites being run per checkin. Some test jobs are now runnable on AWS, and are now running there. Some *should* run on AWS, but fail for some reason – work continues. And some tests *need* hardware, so we’re continuing work to buy and power up more test machines to build out capacity there; please continue to bear with us. Oh, and of course, if you know of any test suites that no longer need to be run per-checkin, please let us know so we can immediately reduce the load a little. Every little helps put scarce test CPU to better use.

  • #checkins-per-day: During January, 18-of-31 days had over 200 checkins-per-day, and 2-of-31 days had over 300 checkins-per-day (06jan had 307 checkins; 10jan had 302 checkins). The pattern is to be expected as both of these days were during our first full week back after holidays – typically our busiest week of the year…and this year coincided with a B2G workweek!
  • #checkins-per-hour: Checkins are still mostly mid-day PT/afternoon ET, but the load has increased across the day. For 33% of every day (8 of every 24 hours), we sustained over 10 checkins per hour. Heaviest load times this month were 1-2pm PT (13 checkins-per-hour) and 2-3pm PT (13.36 checkins-per-hour – which matched our previous record of 13.36checkins-per-hour set in November2012!).

Overall load since Jan 2009

mozilla-inbound, mozilla-central, fx-team:
Ratios of checkins across these branches remain fairly consistent. mozilla-inbound continues to be heavily used as an integration branch, with 29% of all checkins, consistently far more then the other integration branches combined (fx-team has 1% of checkins, mozilla-central has 2.3% of checkins). As usual, very few people land directly on mozilla-central these days.

Infrastructure load by branch

mozilla-aurora, mozilla-beta, mozilla-b2g18:

  • 3.1% of our total monthly checkins landed into mozilla-aurora. This is back down to normal aurora levels. This is expected since b2g changes are no longer being landed into aurora and beta.
  • 1.6% of our total monthly checkins landed into mozilla-beta. This is back down to normal beta levels (maybe even slightly lower). This is expected since b2g changes are no longer being landed into aurora and beta.
  • 4.6% of our total monthly checkins landed into mozilla-b2g18. These are all fixes *only* for the B2G releases, so important enough to be worth calling out here, like the aurora and beta branches.

misc other details:

  • Pushes per day
    • You can clearly see weekends through the month… and which week was the “first-week-after-holidays-combined-with-B2G-workweek”.

    #Pushes this month

  • Pushes by hour of day
      Mid-morning PT is consistently the biggest spike of checkins, although this month the checkin load stayed high throughout the entire PT working day.

    #Pushes per hour

Older Entries