What are common disasters that jeopardize IT systems?
Bob Lullo: Depending on the geographic location, any number of natural events may pose a threat, including hurricanes, flooding, tornadoes, earthquakes and forest fires. Man-made events are a little harder to pinpoint, as they can be anything from hardware equipment failures to employee sabotage.
What are the fundamental elements of an IT disaster plan?
BL: A step-by-step task plan for recovery should include detailed jobs assigned to specific resources. A timetable and communication plan also should be established in advance. Let me stress that these plans do need to be tested at least once a year. Ideally, they also should be revised and retested whenever there is a significant change to the infrastructure, personnel, or software applications involved. If a recovery plan includes a mirrored and a fully replicated standby environment, then any testing should involve a full switch to the standby system to ensure that all systems work as expected. If recovery includes backing up to tape, then those tapes should be tested to verify that they are valid and can be read.
The most common metrics used to evaluate these plans are recovery time objective (RTO) and recovery point objective (RPO). The RTO represents the time it would take the organization to return operations to a normal state in the event of a disaster. The RPO represents the time period just prior to the disaster that includes the most recently entered data at risk of loss.
This most recent data may not yet have been appropriately backed up and therefore would need to be manually re-created in the event of a total system failure. It’s important that the firm is satisfied with the current RTO and RPO of a plan.
What are some common blind spots in developing these plans?
BL: Of the many possible blind spots, three critical areas often get overlooked. First, make sure that each member of the recovery team knows his or her own area of responsibility and the method of communication with other members of the team in the event of a disaster.
Second, include a detailed step-by-step task plan for recovery in the event of a disaster. This becomes the script for testing and should be familiar to each member of the recovery team before a disaster occurs.
Finally, one of the most important planning items is the location of IT assets. If a firm is fortunate enough to have two or more data centers, then those should be adequately separated. A best practice is to make sure that each data center is in a separate Federal Emergency Management Agency flood zone. IT resources love to be able to touch their own hardware, but this is not necessary in today’s world. Creating proper separation between data centers prevents having both primary and standby locations affected by the same natural disaster event.