Operations is Just Like The Fire Department Minus The Burning Buildings…

Many years ago when I was working at a startup and my manager at the time (still a mentor to me today) was very adamant that I read Failure Is Not an Option by Gene Kranz. It’s about the NASA mission control from the earliest days of Mercury through Apollo. You might not know who Gene Kranz is however if you’ve seen the movie Apollo 13 he is the character portrayed by Ed Harris. The book is a fascinating read for anyone. I found it especially interesting since a lot of what I do in operations is planning for the unexpected and incident management when things do go badly.

The person suggesting I read the book wanted to have me model our group a lot like mission control. A lot of what I read and learned did carryover into day-to-day life running ops in a startup. When I began to realize though was we are much less like mission control then we are like a fire department. Yes we have to plan for the unexpected and have clear methods of work for what we do expect to happen when it comes to pass. We also need to think like the fire department. The most basic sense what that means is if you get a call that something’s wrong you show up like it’s a five alarm fire. Even if you think it’s just a cat in a tree you show up in full turnout gear ready to go. If it turns out to be just a cat in a tree one guy stays behind takes care of the situation and everyone else breaks and returns back to the station. However if what sounded like a cat in a tree turns out to be something more substantial you’re ready to go and you can jump into action.

What I just described is exactly what we do when starting an incident recovery call.  You have to act like Emergency services do. You have to constantly drill people. No matter what the situation sounds like you go in assuming the worst.  Even after false positive after false positive you still have to go into every situation like a major event. The alternative can be disastrous.

I originally wrote a version of this entry years ago after several large incidents me and my team had been handling. Its sentiment still holds true several years later.  Since then I have been using this analogy a lot. It holds true for me several groups later in dealing with incident management. What took me a little while to realise after first writing this is that the same analogy goes for training initiatives as well. When a fire department isn’t going on calls for maintenance in their equipment their drilling. In order to have quick reactions in situations they know they have to work together as a team and drill together the scenarios that likely will come up. In the group I managed at the time I wrote this we did a lot of training.  To the point where people were complaining of training. Yet the outcome of the drills and training were reaction times improved a noticeable amount. No one can be expected to be shown something once and be executed perfectly six months later when it comes up again. We all need to constantly drill using the tools we have while working together in common likely scenarios that may come up.

At the time of first writing another example of Ops acting like the fire department was we had a situation that put the fire department mentality to the test. We received an email around a problem that had been kicking around with others for two days. At first pass it didn’t sound like it was much to do with our group however it didn’t feel right. We were not sure what was going on so we made the decision to mobilise to rule anything out. It was the right decision. The lead of our incident recovery call confirmed after a few hours that there was a problem, identified the upstream service, and got the right people engaged to solve the issue. The same type of thing could have come in and been nothing. Many times it is nothing. By mobilising we headed off a potentially worse problem.

In that group back in 2016 my manager at the time gave me a baseball bat that I kept at my desk.  He used to use it when talking to people to tell them if they ran into issues he could help with getting things done.  He gave it to me in a very public way to show my team I could do the same for them. It was an important symbolic gesture. I never really used it but it was nice to roll around on the floor or otherwise fidget with it during a long incident. When I moved out of that operations group to a more delivery role the person who took over from me was a good friend of mine.  I made a very public hand over to him of the same bat. When still doing day to day operations stuff I had been meaning to get around to buying a fire hat. I feel that is more appropriate token to have around. Now a days its less appropriate since I am not doing operations / application support.