Anand M Naik, director–System Engineering, Symantec, talks to Biztech2.com about the latest in the Disaster Recovery space.
What are the latest trends in the Disaster Recovery (DR) space in India?
DR has been a priority for IT managers as events that cause unplanned business downtime continue to surprise us. Natural disasters, human conflicts and constant exposure to security breaches and attacks have driven organisations of all types and sizes to recognise the need to implement or improve their comprehensive Business Continuity Plan (BCP) that includes a robust IT Disaster Recovery Plan.
According to Symantec’s 2009 Disaster Recovery survey, the most important thing that is driving DR adoption among enterprises is virtual/ physical breaches. There is a constant fear of losing data/ information during disasters among enterprises.
Which technologies are being largely considered and purchased for information protection?
To stay competitive in today’s fast-changing global market, the enterprise must guard against downtime and data loss while at the same time boosting productivity and lowering operating costs. In addition to reducing the risk of data loss or corruption, one also needs to ensure that data can be quickly and completely recovered.
As users often store data via different methods, the data protection solution needs to examine everything. The best way to manage data protection is to categorise things into simpler forms - whether it’s an application, database, server, disk, tape, or network, it all produces data. Data should be managed from a single pane of glass.
Often, organisations implement single-point products to address each individual data protection challenges. The result is often a proliferation of tools that ultimately increase the cost of management. Data protection should work across a wide variety of platforms, should handle disk, tape, and virtual machines, and should adapt as the infrastructure changes.
It is important for businesses to keep their infrastructure up and running 24x7 and as well as to be able to access information — anytime and anywhere. This means physical systems, operating environments and applications must be protected across all tiers of the infrastructure.
What attributes and capabilities should a DR service provider offer?
The service provider should be dedicated to Disaster Recovery services. It is desirable that the service provider, and in particular their disaster recovery business, be certified for quality. International quality accreditations certify that the service provider will deliver international standard services. Providers should have the expertise to manage ’live’ disasters. They should understand the technology involved in maintaining and restoring vital documents and equipment. There should also be an understanding and fulfillment of the full range of an enterprise’s critical service requirements, such as different operating platforms, communication services, integrated applications and so on.
An untested recovery plan is useless. The service provider should have a proper plan to do fire drills and test resources under conditions that meet an enterprise’s recovery planning requirements. Service providers should have proper security arrangements where critical servers and applications are hosted. The service level agreement (SLA) should be read carefully. It is important that expectations in terms of RPO and RTO are defined clearly and stated in definite terms.
Why are enterprises switching from the old tick mark system to automated processes?
For many enterprises, system recovery is a manual process. It often requires time-consuming troubleshooting to identify and solve the problem. An automated approach, such as high availability clustering, eliminates vast amounts of downtime as compared to a traditional manual recovery process. If a system fails in the primary data centre, the software can restart the application automatically on another server. The administrator may be notified by a text message or an e-mail and has visibility into problems at all times. The series of activities required to maintain business continuity are handled by the software; with limited action required by IT employees.
If a disaster threatens to cripple an entire data centre, an automated approach can eliminate human error and reduce downtime by triggering failover of the critical applications to the secondary site. The failover solution should determine which replicated data the application needs to continue operating. A single click starts an automated procedure that restarts the application and connects the users to the secondary site. Automated failover also addresses a common weakness in many disaster recovery plans – the assumption that key employees will be available to physically enter the data centre and manually re-start applications. If the employees are unavailable, business continuity suffers. Automation helps reduce this potential system failure.
How many organisations have a tested DR plan in place?
One cannot provide an absolute figure in terms of how many organisations have a tested DR plan in place. However, all major financial institutions and firms in the BFSI sector are required to have a tested DR plan in place. According to our Disaster Recovery India survey 2009 findings, 27 percent of organisations carry out full-scenario testing of their DR plans, involving relevant people, processes and technologies every three months, while 21 percent claimed to do it on a yearly basis. Another 21 percent said they carry out DR testing every month. About three percent claimed to do it on an ad-hoc basis.
How do organisations carry out DR system tests?
Recent studies have shown that few companies test their DR plans on a regular basis. Companies have been reluctant to conduct DR testing because it often involves bringing down production systems, mobilising a large segment of the workforce, thus, taking them off more urgent projects, and forcing employees to work during inconvenient hours such as weekends or nights.
With automated failover capabilities, IT organisations can test recovery procedures using a copy of the production data – while keeping its integrity intact, of course. This capability means that tests can be run during business hours instead of over the weekend, reducing staff overtime. As an added benefit, automated tests run during peak production periods and can re-create and approximate the conditions that would occur during a true failover situation.
Configuration management tools can also give more confidence to IT managers that their DR plans will work by ensuring that servers at DR sites are consistent and in tune with those at production sites. Some of the latest configuration management tools can run consistency checks that will alert administrators when servers have drifted from the standard build.