21 Jun 2016
Mozilla, Release Engineering
I’ve never worked in government before – or even considered it until I read Dan Portillo’s blog post when he joined the U.S. Digital Service. Mixing the technical skills and business tactics honed in Silicon Valley with the domain specific skills of career government employees is a brilliant way to solve long-standing complex problems in the internal mechanics of government infrastructure. Since their initial work on healthcare.gov, they’ve helped out at the Veterans Administration, Dept of Education and IRS to mention just a few public examples. Each of these solutions have material impact to real humans, every single day.
Building Release Engineering infrastructure at scale, in all sorts of different environments, has always been interesting to me. The more unique the situation, the more interesting. The possibility of doing this work, at scale, while also making a difference to the lives of many real people made me stop, ask a bunch of questions and then apply.
The interviews were the most thorough and detailed of my career so far, and the consequence of this is clear once I started working with other USDS folks – they are all super smart, great at their specific field, unflappable when suddenly faced with un-imaginable projects and downright nice, friendly people. These are not just “nice to have” attributes – they’re essential for the role and you can instantly see why once you start.
The range of skills needed is staggering. In the few weeks since I started, projects I’ve been involved with have involved some combinations of: Ansible, AWS, Cobol, GitHub, NewRelic, Oracle PL/SQL, nginx, node.js, PowerBuilder, Python, Ruby, REST and SAML. All while setting up fault tolerant and secure hybrid physical-colo-to-AWS production environments. All while meeting with various domain experts to understand the technical and legal constraints behind why things were done in a certain way and also to figure out some practical ideas of how to help in an immediate and sustainable way. All on short timelines – measured in days/weeks instead of years. In any one day, it is not unusual to jump from VPN configurations to legal policy to branch merging to debugging intermittent production alerts to personnel discussions.
Being able to communicate effectively up-and-down the technical stack and also the human stack is tricky, complicated and also very very important to succeed in this role. When you see just how much the new systems improve people’s lives, the rewards are self-evident, invigorating and humbling – kinda like the view walking home from the office – and I find myself jumping back in to fix something else. This is very real “make a difference” stuff and is well worth the intense long days.
Over the coming months, please be patient with me if I contact you looking for help/advice – I may very well be fixing something crucial for you, or someone you know!
If you are curious to find out more about USDS, feel free to ask me. There is a lot of work to do (before starting, I was advised to get sleep!) and yes, we are hiring (for details, see here!). I suspect you’ll find it is the hardest, most rewarding job you’ve ever had!
16 Jun 2016
Mozilla, Release Engineering
(Suddenly, its June! How did that happen? Where did the year go already?!? Despite my recent public silence, there’s been a lot of work going on behind the scenes. Let me catchup on some overdue blogposts – starting with RelEngConf 2016!)
We’ve got a venue and a date for this conference sorted out, so now its time to start gathering presentations, speakers and figuring out all the other “little details” that go into making a great, memorable, conference. This means two things:
1) RelEngCon 2016 is now accepting proposals for talks/sessions. If you have a good industry-related or academic-focused topic in the area of Release Engineering, please have a look at the Release Engineering conference guidelines, and submit your proposal before the deadline of 01-jul-2016.
2) Like all previous RelEng Conferences, the mixture of attendees and speakers, from academia and battle-hardened industry, makes for some riveting topics and side discussions. Come talk with others of your tribe, swap tips-and-gotchas with others who do understand what you are talking about and enjoy brainstorming with people with very different perspectives.
For further details about the conference, or submitting proposals, see http://releng.polymtl.ca/RELENG2015/html/index.html. If you build software delivery pipelines for your company, or if you work in a software company that has software delivery needs, I recommend you follow @relengcon, block off November 18th, 2016 on your calendar and book your travel to Seattle now. It will be well worth your time.
I’ll be there – and look forward to seeing you there!
08 Oct 2015
Books, Mozilla, Release Engineering, Remoties
My previous post described how O’Reilly does rapid releases, instead of waterfall-model releases, for book publishing. Since then, I’ve been working with the folks at O’Reilly to get the first milestone of my book ready.
As this is the first public deliverable of my first book, I had to learn a bunch of mechanics, asking questions and working through many, many details. Very time consuming, and all new-to-me, hence my recent silence. The level of detailed coordination is quite something – especially when you consider how many *other* books O’Reilly has in progress at the same time.
One evening, while in the car to a social event with friends, I looked up the “not-yet-live” page to show to friends in the car – only to discover it was live. Eeeeek! People could now buy the 1st milestone drop of my book. Exciting, and scary, all at the same time. Hopefully, people like it, but what if they don’t? What if I missed an important typo in all the various proof-reading sessions? I barely slept at all that night.
In O’Reilly language, this drop is called “Early Release #1 (ER#1)”. Now that ER#1 is out, and I have learned a bunch about the release mechanics involved, the next milestone drop should be more routine. Which is good, because we’re doing these every month. Oh, and like software: anyone who buys ER#1 will be prompted to update when ER#2 is available later in Oct, and prompted again when ER#3 is available in Nov, and so on.
You can buy the book-in-progress by clicking here, or clicking on the thumbnail of the book cover. And please, do let me know what you think – Is there anything I should add/edit/change? Anything you found worked for you, as a “remotie” or person in a distributed team, which you wish you knew when you were starting? If you were going to setup a distributed team today, what would you like to know before you started?
To make sure that any feedback doesn’t get lost or caught in spam filters, I’ve setup a special email address (feedback at oduinn dot com) although I’ve already been surprised by feedback via twitter and linkedin. Thanks again to everyone for their encouragement, proof-reading help and feedback so far.
Now, it’s time to brew more coffee and get back to typing.
15 Sep 2015
Books, Mozilla, Release Engineering, Remoties
As a release engineer, I’ve designed and built infrastructure for companies that used the waterfall-model and for companies that used a rapid release model. I’m now writing a book, so recently I have been looking at a very different industry through this same RelEng lens.
In days of old, here’s a simplistic description of what typically happened. Book authors would write their great masterpiece on their computer, in MSWord/Scrivener/emacs/typewriter/etc. Months (or years!) later, when the entire book was written, the author would deliver the “complete” draft masterpiece to the book company. Editors at the book company would then start reading from page1 and wade through the entire draft, making changes, fixing typos, looking for plot holes, etc. Sometimes they could make these corrections themselves, sometimes the changes required sending the manuscript back to the author to rewrite portions. Later, external reviewers would also provide feedback, causing even more changes. If all this took a long time, sometimes the author had to update the book with new developments to make sure the book would be up-to-date when it finally shipped. Tracking all these changes was detailed, time-pressured and chaotic work. Eventually, a book would finally go to press. You could reduce risk as well as simplify some printing and shipping logistics by having bookshops commit to pre-buy bulk quantities – but this required hiring a staff of sales reps selling the unwritten books in advance to bookstores before the books were fully written. Even so, you could still over/under estimate the future demand. And lets not forget that once people start reading the book, and deciding if they like it, you’ll get word-of-mouth, book reviews and bestseller-lists changing the demand unpredictably.
If people don’t like the book, you might have lots of copies of an unsellable book on your hands, which represents wasted paper and sunk costs. The book company is out-of-pocket for all the salaries and expenses from the outset, as well as the unwanted printed books and would hope to recover those expenses later from other more successful books. Similar to the Venture Capital business model, the profits from the successful books help recoup the losses from the other unprofitable books.
If people do like the book, and you have a runaway success, you can recoup the losses of the other books so long as you avoid being out-of-stock, which could cause people to lose interest in the book because of back order delays. Also, any errors missed in copy-editing and proofreading could potentially lead to legal exposure and/or forced destruction of all copies of printed books, causing further back order delays and unexpected costs. Two great examples are The Sinner’s Bible and the Penguin cook book.
This feels like the classic software development “waterfall” model.
By contrast, one advantage with the rapid release model is that you get quick feedback on the portions you have shipped so far, allowing you to revisit and quickly adjust/fix as needed… even while you are still working on completing and shipping the remaining portions. By the time you ship the last portion of the project, you’ve already adjusted and fixed a bunch of surprise gotchas found in the earlier chunks. By the time you ship your last portion, your overall project is much healthier and more usable. After all, for the already shipped portions, any problems discovered in real-world-use have already been fixed up. By comparison, a project where you write the same code for 18-24 months and then publish it all at the same time will have you dealing with all those adjustments/fixes for all the different chunks *at the same time*. Delivering smaller chunks helps keep you honest about whether you are still on schedule, or let you quickly see whether you are starting to slip the schedule.
The catch is that this rapid release requires sophisticated automation to make the act of shipping each chunk as quick, speedy and reliable as possible. It also requires dividing a project into chunks that can be shipped separately – which is not as easy to do as you might hope. Doing this well requires planning out the project carefully in small shippable chunks, so you can ship when each chunk is written, tested and ready for public consumption. APIs and contractual interfaces need to be understood and formalized. Dependencies between each chunk needs to figured out. Work estimates for writing and testing each module need to be guessed. A calendar schedule is put together. The list goes on and on… Aside: You probably should do this same level of planning also for waterfall model releases, but its easy to miss hidden dependencies or impact of schedule slips until the release date when everything was supposed to come together on the day.
So far, nothing new here to any release engineer reading this.
One of the (many) reasons I went with O’Reilly as the publisher for this book was their decision to invest in their infrastructure and process. O’Reilly borrowed a page from Release Engineers and invested in automation to switch their business from a waterfall model to a rapid release model. This change helps keep schedules realistic, as schedule slips can be spotted early. It helps get early feedback which helps ship a better final product, so the ratio of successful book vs unprofitable books should improve. It helps judge demand, which helps the final printing production planning to reduce cost wasting with less successful books, and to improve profits and timeliness for successful books. This capability is a real competitive advantage in a very competitive business market. This is an industry game changer.
When you know you want “rapid release for books”, you discover a lot of tools were already there, with a different name and slightly-different-use-cases. Recent technologies advances help make the rest possible. The overall project (“table of contents”). The breaking up of the overall project into chunks (“book chapters”). Delivering usable portions as they are ready (print on demand, electronic books + readers, online updates). Users (“readers”) who get the electronic versions will get update notices when newer versions become available. At O’Reilly, the process now looks like this:
Book authors and editors agree on a table of contents and an approximate ship date (write a project plan with scope) before signing contracts.
Book authors “write their text” (commit their changes) into a shared hosted repository, as they are writing throughout the writing creative process, not just at the end. This means that O’Reilly editors can see the state of the project as changes happen, not just when the author has finished the entire draft manuscript. It also means that O’Reilly reduces risk of catastrophic failure if an author’s laptop crashes, or is stolen without a backup.
A hosted automation system reads from the repository, validates the contents and then converts that source material into every electronic version of the book that will be shipped, including what is sent to the print-on-demand systems for generating the ink-on-paper versions. This is similar to how one source code revision is used to generate binaries for different deliverables – OSX, Windows, linux, iOS, Android,…
O’Reilly has all the automation and infrastructure you would expect from a professional-grade Release Engineering team. Access controls, hosted repos, hosted automation, status dashboards, tools to help you debug error messages, teams to help answer any support questions as well as maintain and improve the tools. Even a button that says “Build!”!!
The only difference is that the product shipped from the automation is a binary that you view in an e-reader (or send to a print-on-demand printing press), instead of a binary that you invoke to run as an application on your phone/desktop/server. With this mindset, and all this automation, it is no surprise that O’Reilly also does rapid releases of books, for the same reasons software companies do rapid releases. Very cool to see.
I’ve been thinking about this a lot recently because I’m now putting together my first “early release” of my book-in-progress. More info on the early release in the coming days when it is available. As a release engineer and first time author, I’ve found the entire process both totally natural and self-evident and at the same time a little odd and scary. I’d be very curious to hear what people think… both of the content of the actual book-in-progress, and also of this “rapid release of books” approach.
Meanwhile, it’s time to refill my coffee mug and get back to typing.
02 Sep 2015
Mozilla, Release Engineering
The USENIX Release Engineering Summit 2015 (“URES15”) is quickly approaching – this time it will be held in November in Washington DC along with LISA2015. To register to attend URES15, click here or on the logo.
Given the two (!) great URES conferences were last year, I expect URES15 to be fun and informative and very-down-to-earth practical all at the same time. One of the great things about these RelEng conferences is that you get to hear what did, and did not, work for others working the same front lines – all in a factual constructive way. Sharing important ideas help raise the bar for everyone, so every one of these feel priceless. It is also a great way to meet many other seasoned people in this niche part of the computer business,
If you are a Release Engineer, Site Reliability Engineer, Production Operations, deal with DevOps culture or are someone who keeps wanting to find a better way to reliably ship better quality code, you should attend! You’ll be glad you did!
Note: If you have a project that you worked on which others would find informative, you can submit a proposal here. The deadline for proposals is Friday 04sep2015.
See you there!
27 Jun 2015
Books, Mozilla, Release Engineering
Long time readers of this blog will remember when The Architecture of Open Source Applications (vol2) was published, containing a chapter describing the tools and mindsets used when re-building Mozilla’s Release Engineering infrastructure. (More details about the book, about the kindle and nook versions, and about the Russian version(!).
Dr Dobbs recently posted an article here which is an edited version of the Mozilla Release Engineering chapter. As a long time fan of Dr Dobbs, seeing this was quite an honor, even with the sad news here.
Obviously, Mozilla’s release automation continues to evolve, as new product requirements arise, or new tools help further streamline things. There is still lots of interesting work being done here – for me, top of mind is Task Cluster, and ScriptHarness (v0.1.0 and v0.2.0). Release Engineering at scale is both complex, and yet very interesting – so you should keep watching these sites for more details, and consider if they would also help in your current environment. As they are all open source, you can of course join in and help!
For today, I just re-read the Dr. Dobbs article with a fresh cup of coffee, and remembered the various different struggles we went through as we scaled Mozilla’s infrastructure up so we could quickly grow the company, and the community. And then in the middle of it all, found time with armenzg, catlee and lsblakk to write about it all. While some of the technical tools have changed since the chapter was written, and some will doubtless change again in the future, the needs of the business, the company and the community still resonate.
For anyone doing Release Engineering at scale, the article is well worth a quiet read.
11 Jun 2015
Mozilla, Release Engineering, Remoties
After my recent “We are ALL Remoties” presentation at Wikimedia, I had some really great followup conversations with Arthur Richards at WikiMedia Foundation. Arthur has been paying a lot of attention to scrum and agile methodologies – both in the wider industry and also specifically in the context of his work at Wikimedia Foundation, which has people in different locations. As you can imagine, we had some great fun conversations – about remoties, about creating culture change, and about all-things-scrum – especially the rituals and mechanics of doing daily standups with a distributed team.
Next time you see a group of people standing together looking at a wall, and moving postit notes around, ask yourself: “how do remote people stay involved and contribute?” Taking photographs of the wall of postit notes, or putting the remote person on a computer-with-camera-on-wheeled-cart feels like a duct-tape workaround; a MacGyver fix done quickly, with the best of intentions, genuinely wanting to help the remote person be involved, but still not-a-great experience for remoties.
There has to be a better way.
We both strongly agree that having people in different locations is just a way to uncover the internal communication problems you didn’t know you already have… the remote person is the canary in the coal mine. Having a “we are all remoties” mindset helps everyone become more organized in their communications, which helps remote people *and* also the people sitting near each other in the office.
Arthur talked about this idea in his recent (and lively and very well attended!) presentation at the Annual Scrum Alliance “Global Scrum Gathering” event in Phoenix, Arizona. His slides are now visible here and here.
If you work in an agile / scrum style environment, especially with a geo-distributed team of humans, it’s well worth your time to read Arthur’s presentation! Thought provoking stuff, and nice slides too!
24 May 2015
Mozilla, Release Engineering
Last week, I was honored to give the closing talk at RelEng Conf 2015, here in Florence, Italy.
I’ve used this same title in previous presentations; the mindset it portrays still feels important to me. Every time I give this presentation, I am invigorated by the enthusiastic response, and work to improve further, so I re-write it again. This most recent presentation at RelEngConf2015 was almost a complete re-write; only a couple of slides remain from the original. Click on the thumbnail to get the slides
The main focus of this talk was:
1) Release Engineers build pipelines, while developers build products. When done correctly, this pipeline makes the entire company more effective. By contrast, done incorrectly, broken pipelines will hamper a company – sometimes fatally. This different perspective and career focus is important to keep in mind when hiring to solve company infrastructure problems.
2) Release Engineers routinely talk and listen with developers and testers, typically about the current project-in-progress. Thats good – we obviously need to keep doing that. However, I believe that Release Engineers also need to spend time talking to and listening with people who have a very different perspective – people who care about the fate of the *company*, as opposed to a specific project, and have a very different longer-term perspective. Typically, these people have titles like Founder/CxO/VP but every company has different cultural leaders and uses slightly different titles, so some detective work is in order. The important point here is to talk with people who care about the fate of the company, as opposed to the fate of a specific project – and keep that perspective in mind when building a pipeline that helps *all* your customers.
3) To illustrate these points, I then went into detail on some technical, and culture change, projects to highlight their strategic importance.
As usual, it was a lively presentation with lots of active Q+A during the talk, as well as the following break-out session. Afterwards, 25 of us managed to find a great dinner (without a reservation!) in a nearby restaurant where the geek talk continued at full force for several more hours.
All in all, a wonderful day.
It was also great to meet up with catlee in person again. We both had lots to catch up on, in work and in life.
Bram Adams, Christian, Foutse, Kim and Stephany Bellomo all deserve ongoing credit for continuing to make this unusual and very educational conference come to life, as well as for curating the openness that is the hallmark of this event. As usual, I love the mix of academic researchers and industry practitioners, with talks alternating between industry and academic speakers all day long. The different perspectives, and the eagerness of everyone to have fully honest “what worked, what didnt work” conversations with others from very different backgrounds is truly refreshing… and was very informative for everyone. I’m already looking forward to the next RelEngConf!!
[updated to fix some typos. John 28may2015]
04 Jan 2015
HortonWorks, Mozilla, Release Engineering
Preparations for RelEngConf 2015 are officially in full swing. This means two things:
1) RelEngCon 2015 is now accepting proposals for talks/sessions. If you have a good industry-related or academic-focused topic in the area of Release Engineering, please have a look at the Release Engineering conference guidelines, and submit your proposal before the deadline of 23-jan-2015.
2) Both RelEngCon 2014 and RelEngCon 2013 were great. The mixture of attendees and speakers, from academia and battle-hardened industry, made for some riveting topics and side discussions. Its too early to tell who exactly will be speaking in 2015, but its not too early to start planning your travel to Florence, Italy!! Also of note: RelEngCon 2015 will be just a few weeks after the publication of IEEE 1st Special Issue on Release Engineering. Looks like RelEngConf 2015 is going to be special also.
For further details about the conference, or submitting proposals, see http://releng.polymtl.ca/RELENG2015/html/index.html. If you build software delivery pipelines for your company, or if you work in a software company that has software delivery needs, I recommend you follow @relengcon, block off May 19th, 2015 on your calendar and book now. It will be well worth your time.
See you there!
07 Nov 2014
HortonWorks, Mozilla, Release Engineering
I was honored to give the opening keynote for USENIX URES14 East in Philadelphia in June 2014.
“The Value of Release Engineering as a Force Multiplier” keynote built on top of the “RelEng as a Force Multiplier” presentation I gave at RelEngConf 2013 and as then as a Google Tech Talk. (To get the slides in PDF format, click on thumbnail. Happy to share the original 25MB keynote file, just let me know and we’ll figure out a way to share without hammering my poor website.)
Anyone who has ever talked with me about RelEng knows I feel very strongly that Release Engineering is important to the success of every software project. Writing a popular v1.0 product is just the first step. If you want to keep your initial early-adopter users by shipping v1.0.1 fixes, or grow your user base by shipping new v2.0 features to your existing users, you need a reproducible pipeline for accurately delivering software in a repeatable manner. Otherwise, you are “only” delivering a short-lived flash-in-the-pan one-off project. In my opinion, this pipeline is another product that software companies need to develop, alongside their own unique product, if they want to stay in the marketplace, and scale.
Its typical for Release Engineers to talk about the value of RelEng in terms that Release Engineers value – timely delivery, accurate builds, turnaround time, etc. I believe its important to also describe Release Engineering in terms that people across an organization can understand. In my keynote, I specifically talked about the value of RelEng in terms that people-who-run-companies value – unique business opportunities, market / competitive advantages, new business models, reduced legal risk, etc.
Examples included: Mozilla’s infrastructure improvements which reduced turnaround time for delivering security fixes as well as helped deter future attacks… Hortonwork’s business ability to provide enterprise-grade support SLAs to customers running mission critical production “big data” systems on 100% open source Apache Hadoop… and even NASA’s remote software update of the Mars Rover.
People seemed to enjoy the presentation, with lively questions during, afterwards… and even into the end-of-day panel session.
Big thanks to the organizers (especially Dinah McNutt (RelEng at Google), Gareth Bowles) – they did an awesome job putting together a unique and special event.
Oh, and one more thing! Next week, USENIX URES14 West will start on Monday 10nov2014 in Seattle. If you are in the area, or can get there for Monday, you should attend! And make sure to see Kmoir’s presentation “Scaling Capacity While Saving Cash” – if you follow her blog, you know you can expect it to be well worth attending.
[Updated to include links to usenix recordings. joduinn 08nov2014]