16 new slaves – a drop in the pool

During Monday+Tuesday morning, we added 12 new slaves to the main production pool-of-slaves. This was in addition to the 4 new slaves we added to tryserver pool-of-slaves last week. Despite adding those 16 slaves, we were totally unable to keep up with the volume of incoming jobs today. This was a problem for Build/Unittest/Talos production servers and TryServers.

There’s no nice way to say this – the backlog of incoming jobs today was ugly.

I don’t know if it was because its a Tuesday (typically busy day), made worse by people surging back from Easter vacations… or because of a rush of checkins in the leadup to FF3.5b4. All I know is that, despite the new extra machines, we were totally overrun today – we simply did not have enough machines to keep up with the sheer number of incoming build/unittest/talos jobs.

  1. Sorry. I know its frustrating (to put it very mildly).
  2. Believe me we are working flat out to ramp up capacity to handle this. The 4 new ESX hosts that IT+RelEng installed last week, along with extra disk space gave us much needed extra capacity to setup more slaves.
  3. Another 8 slaves are still finishing their move from staging into tryserver production tonight. Given how today went, we’re already working on bringing up another set of slaves as quickly as possible.

Please hang in there.