Disaster Recovery Planning Is All About The Application Stack

bcp-server

Having done so much work recently on the vCloud Air – Disaster Recovery, and having done quite a bit of DR planning in past lives, I wanted to touch on a few things that have started cropping up.  What I am about to chat about is not something new, but it seems history as always tends to repeat itself.  As any new technology comes forward, it really is only a tool to make something else previously done easier in some way.  Just like the assembly line made automobile and other manufacturing revolutionary so comes along many technologies that are tools to aid in processes that may already exist.  So where is the process oriented brain of my going with this?  Well, simply put Disaster Recovery is not just about being able to replicate data from point A to point B, pushing a magic button and walking away.  That being said I don’t care what you are using to replicate data for disaster recovery purposes, if that’s all you are focussing on you will never have a DR plan.

I touched on this in a previous article about failure to plan is planning to fail, but I wanted to take is a step deeper and focus on the applications we may be trying to provide disaster recovery on.  I cannot count how many times someone refers to “Failing over a Server” instead of “Providing DR for the application”.  This seems like a subtle difference, but how many individual servers make up the entire application?

What do you know about your applications?

Let’s just take E-Mail for an example.  As an “application” it may contain many more than one virtual machine.  It could include things like these just to name a few:

  • Database Server
  • Public Web front end
  • Spam server(s)
  • SMTP relay(s)

To be honest many people I meet cannot with certainty say what or who is accessing any given application like E-Mail.  When planning disaster Recovery it is not only important, but trying to get a grasp on what/who is accessing the application or a component of an application is just as important.  For example thinking about E-Mail, you may identify:

  • Users access externally via Web
  • Users access externally via mobile (Using same or different Web API)
  • User access internally from corporate desktop client or web client
  • Other systems access SMTP capability for notifications
    • Which systems and how many of each?
  • What systems does this application stack have for dependencies I also need to have available?

Both of these aspects are just the tip of the iceberg.  The more you know or have documented about your applications and their interactions the better you can plan for your recovery.  Large organizations that have been doing DR planning know this.  However, as lower end solutions geared to smaller businesses come out, these details may not be even known by those smaller shops.  I think this is where I am seeing the gap in knowledge frankly.

Manual Run Books are Still Key

Even the best of the automation folks out there don’t do anything, I hope, without a validated documented process first.  That process needs to be tested, vetted, and implemented long before you start automating it so you have a reference for where the automation came from.  Even if it’s on the back of a napkin I bet every automation script has had some written process to it somewhere that was done first.  That being said, if you can truly understand the ENTIRE application stack before you start, then you have a fighting change of having a successful disaster recovery plan.

I don’t care what technology you are using to MOVE data between locations, at the end of the day it’s what you DO with the data during a disaster that matters.  You can only do something if you have though through all of the interconnection points for that application.  Think in larger scope than single servers.  Understand your environment.  Document everything and know where you might run into issues.  Will you need hosted desktops for the client to even access the application you have failed over or is it all web-based?  All of these are factors you need to think of and if you make disaster recovery all about the application stack first, you will head in the right direction.

I have tried to show something like this in my vCloud Blog Posts around dealing with desktops and internal hosted applications.  If you have not read them, it was a pretty interesting exercise.  You will see some of this in my upcoming VMworld presentations as well.

About Chris Colotti

Chris is active on the VMUG and event speaking circuit and is available for many events if you want to reach out and ask. Previously to this he spent close to a decade working for VMware as a Principal Architect. Previous to his nine plus years at VMware, Chris was a System Administrator that evolved his career into a data center architect. Chris spends a lot of time mentoring co-workers and friends on the benefits of personal growth and professional development. Chris is also amongst the first VMware Certified Design Experts (VCDX#37), and author of multiple white papers. In his spare time he helps his wife Julie run her promotional products as the accountant, book keeper, and IT Support. Chris also believes in both a healthy body and healthy mind, and has become heavily involved with fitness as a Diamond Team Beachbody Coach using P90X and other Beachbody Programs. Although Technology is his day job, Chris is passionate about fitness after losing 60 pounds himself in the last few years.

Leave a Reply

Your email address will not be published. Required fields are marked *