27 Nov 2012
tl;dr: We’re now running our build infrastructure across 3 different Amazon regions. This makes us more robust and *cheaper*?! Whats not to love?!
This is important for 3 reasons:
1) It means RelEng can keep up with load, and hence keep all trees open, even if an Amazon region goes offline, or if a VPN link fails. Amazon doesn’t lose a region often, but it can happen. Mozilla doesn’t lose a VPN link often, but it can happen. Using 3 different regions, with 3 different VPN links, makes it unlikely we’d lose all at one time. In fact, multi-region outages on AWS are so rare, that the most recent multi-region outage I could find was in June2008.
2) As our first “go hybrid on AWS VPC” is a clear success, we’re experimenting with VPCs that are further away (ie slightly slower connection) from our inhouse colos. This allows us to start using regions that are cheaper (good!), and not inside the same earthquake zone (also good!).
3) We have a month or so of realistic usage data on AWS, as all the builds which we can run on AWS are now running on AWS. (The only exceptions are recent requests for new B2G builds, which we’re still setting up). This means that we can now make decisions about bulk-purchase-in-advance instances (called “reserved instances” in AWS lingo), which buys us the same compute power for ~1/4 the price. As these reserved instances are region-specific, and cannot be refunded or swapped around to other regions later, it was worth bringing new cheaper regions online first before we start bulk purchases in those cheaper regions.
All in all, this is a big deal.
Oh, and it is worth noting for the record that these additional regions were brought online, and into production, without needing any downtime, and without any hiccups! Big thanks to catlee and rail and ravi for their work.