Sunday, January 2, 2011


Importance Of Creating A Recovery Plan

A Few Steps Today Will Save Your Data Center Tomorrow:  NO DATA CENTER is exempt from disaster. The truth is disasters happen all the time, caused by such things as human error, system breakdowns, and natural disasters. Worst of all, you never know if or when it will happen to you. 

The main question is: Are you ready for a disaster if it happens to your data center? Part of being ready for a disaster in the enterprise is knowing how to test a disaster plan and also how often you should test it. Here are a few suggestions to help you prepare for when disaster strikes.

The Plan Is A Priority:
A disaster plan should be a priority in any data center. Disaster preparedness ensures that the IT operations of an enterprise are able to recover from some form of outage-inducing interruption. “Given that enterprises these days are essentially completely reliant on their IT operations to perform their business operations, the ability to recover from such outages means that they are able to continue to operate as a viable functioning business. A number of trends are pushing the increasing need for data center disaster planning. 
  • First, Data is growing at an alarming rate; in fact, many companies are reporting data growth rates of 50% or more per year. 
  • Secondly, the current trend for business is moving toward a geographically dispersed, 24/7 service model. Customers want to submit purchase orders online, pay bills online, and access their accounts through customer- facing online portals from anywhere, at any time. 
The productivity cost of unplanned downtime is increasing, and so is the revenue cost from lost transactions and service failure. Although it might be tempting for businesses to improve server performance and storage efficiency by implementing more aggressive deletion policies, regulatory compliance demands are forcing them to keep older data on file for several years. As the rate of data production continues to grow, it creates a snowball effect. Another trend is that companies are cutting back on their IT spending, training, and staffing because of difficult economic conditions. This increases the likelihood of unexpected disasters.

Testing Methods:
There is a well-established hierarchy of testing types for disaster recovery infrastructure and operations. Walkthrough tests are essentially document reviews where a hypothetical disaster is posited and the team walks through the resolution according to the details outlined in the plan. These tests should always occur first and are used to find gaps and oversights in the plan itself.

After walkthrough tests, there are simulation tests and parallel tests. In the former, the recovery infrastructure is brought online to make sure processes work and that systems can be made functional. In the latter, historical data is processed to ensure appropriate results are generated. Only after all these types of tests have been conducted should an interruption test (wherein production processing is failed over to the recovery site) even be considered. Also, the only effective way to test a disaster plan is through simulation tests that are run at least once a year. And when you test your plan, you should try a number of different scenarios. “For example, what if your CEO’s laptop was stolen and it contained important data? Or what if human resources needed to retrieve six years’ worth of old files for a wrongful dismissal suit—how long would it take you to search through six years of historical email and locate all conversations relating to a specific topic, theme, or incident? Another consideration is how quickly you can recover a critical server in the event of power failure causing disk failure.

Additionally, how quickly can you rebuild a server from bare metal? And if the data center caught fire, how much downtime would the company have to endure before coming back online? How long would this take? And how long would it take you to set up a new server at another location on a moment’s notice? As far as frequency is concerned, Realistically speaking, enterprises should be continually testing their plans. The point of testing is less about building the skills to operate the plan in the event of a disaster than it is about discovering mistakes, oversights, and errors in the plan and supporting infrastructure, adding that as each error is found and corrected, subsequent testing is needed.

Enterprises should try to avoid testing the disaster recovery in “broad strokes. The likelihood that catastrophic failures will occur is far lower than the likelihood of localized “small-scale” outages. Further, broad testing requires a tremendous leveraging of resources, while scenario testing can be accomplished with less effort. Over time, the sum of the work performed in scenario testing will more than equate to the gains that can be made with catastrophic failure testing.

Work With Trustworthy, Capable Vendors:
When it comes to disaster recovery planning , corporate data protection is very complex. You have to deal with many different systems (email, databases, operating systems, laptops, compliance, high-availability, etc.), and each of these requires a different disaster protection approach. The best advice would be to pick a partner that offers many different types of business data protection solutions and to have them put together a tailored disaster recovery plan based on your IT needs. When you work through a trusted vendor for your entire backup, recovery, and availability systems, it simplifies your IT management and reduces or eliminates the possibility of overlap, waste, or system conflicts.

Key Points
  • Walkthrough tests, in which a hypothetical disaster is posited and the team walks through the plan for resolution, should always occur first. These tests can help find gaps and oversights in the plan itself.
  • An effective way to test a disaster plan is through simulation tests, which should be run at least once a year. Try a number of scenarios when you test the plan.
  • Testing a plan is also about discovering mistakes, oversights, and errors in the plan and supporting infrastructure.


About bench3 -

Haja Peer Mohamed H, Software Engineer by profession, Author, Founder and CEO of "bench3" you can connect with me on Twitter , Facebook and also onGoogle+

Subscribe to this Blog via Email :