Technical Redundancy – A Crucial Business Requirement

This post comes in light of recent events in New Jersey and New York, hit by hurricane Sandy.  Like Katrina, it has been a very difficult moment and is nice to see people help each other.   Businesses too were affected by Sandy.  They suffered power loss or loss of hardware due to flooding.  Individuals and business alike will be changed forever.

While working for General Motors, I was given the opportunity to learn and work on disaster recovery and business resumption plans.  This included researching tremendously in something I knew little about.  To my surprise, a lot of horror stories came out of Katrina, many businesses effectively shutting down and liquidating.  These business owners having written about their losses, hoping that others would learn from their mistakes.  GM as you can imagine, has a significant amount of employees, business apps and data required to run day to day operations.  If the headquarters is hit by a tornado or blocked by disgruntled union workers, how do we ensure continuity as if nothing happened?  Working on the Disaster Recovery Plan (DRP) and Business Resumption Plan (BRP) was an eye opening experience for me.

Just to make sure I am not confusing anyone, DRP is a plan that is used to recover data and ensure that the tools used by the business are recovered.  BRP is the plan that is executed when the physical business local is no longer operable and requires setting up remote locations to resume business as normal.  Each business will have different requirements for resuming operations, including timelines and services that are crucial to operations.

I operate under the assumption that anything that can go wrong will go wrong and the edge cases, while rare, will also happen when you least expect it.  For instance, who knew that of all things, a CAW blockade would require execution for the BRP for GM?  Looking at Amazon over the past few months, they’ve had numerous large scale failures.  Sandy has caused major disruptions and forced multiple websites and services to shut down as the backup generators ran out of fuel.

I’ve asked many small and medium sized business owners to describe their disaster recovery process.  To my disbelief, most are unprepared or do not understand the severity of potential events.  I live in a world filled with paranoia, so I asked them “what if your hosting provider disappears tomorrow?” which is often followed up by a puzzled look. Amazon could never crash right?  What about pushing code to live the accidently purges live data?  Or even an intern who runs a query that deletes data?   Companies and developers are assuming that edge cases never happen because they pay attention and they can fix problems as they arise.  They need plans for when things go terribly bad, even if it never will.  I won’t try and claim that I haven’t made mistakes and that I have everything implemented, but I have the plans.  Now if I had money to execute my plans, I’d perhaps be in a better position to convince everyone to follow my lead.

Regardless of your situation, you should plan.  I won’t get into business resumption too much. Unless you have a decently sized company or a corporation, you won’t necessarily need it, your developers likely could work from home and be as productive as they are in the office.  If you operate under VPN and have a variety of services in house, then you will more than likely need a BRP.  I may get into that for another blog post if I get requests. Plan the implementation of the DRP as you get cash and the scale of which you deploy this plan.

Continue reading…