The Road to Business Resilience: December 2008

Monday, December 29, 2008

What is the difference between a disaster recovery plan and a business continuity plan?

Given the human tendency to look on the bright side, many business executives are prone to ignoring "disaster recovery" because disaster seems an unlikely event. "Business continuity planning" suggests a more comprehensive approach to making sure you can keep making money, not only after a natural calamity but also in the event of smaller disruptions including illness or departure of key staffers, supply chain partner problems or other challenges that businesses face from time to time.

Despite these distinctions, the two terms are often married under the acronym BC/DR because of their many common considerations.

Disaster recovery is the process by which you resume business after a disruptive event. The event might be something huge-like an earthquake or the terrorist attacks on the World Trade Center - or something small, like malfunctioning software caused by a computer virus.

What do these plans include?
How employees will communicate, where they will go and how they will keep doing their jobs.
The details can vary greatly, depending on the size and scope of a company and the way it does business. For some businesses, issues such as supply chain logistics are most crucial and are the focus on the plan. For others, information technology may play a more pivotal role, and the BC/DR plan may have more of a focus on systems recovery. All BC/DR plans need to encompass critical mainframes with vital data at a backup site within four to six days of a disruptive event, obtain a mobile PBX unit with 3,000 telephones within two days, recover the company's 1,000-plus LANs in order of business need, and set up a temporary call center for 100 agents at a nearby training facility.

But the critical point is that neither element can be ignored; physical, IT and human resources plans cannot be developed in isolation from each other. (In this regard, BC/DR has much in common with security convergence) At its heart, BC/DR is about constant communication. Business leaders and IT leaders should work together to determine what kind of plan is necessary and which systems and business units are most crucial to the company. Together, they should decide which people are responsible for declaring a disruptive event and mitigating its effects. Most importantly, the plan should establish a process for locating and communicating with employees after such an event. In a catastrophic event (Hurricane Katrina being a relatively recent example), the plan will also need to take into account that many of those employees will have more pressing concerns than getting back to work.

Backups? I can hear it now, "Not another security pundit talking at me about why I should do something I already do..."

But that's not what this is about
When was the last time you sat down with other members of the IS team to discuss WHY you perform backups. Those tapes sitting in off-site storage might just cause you and the other members of IS - as well as Legal and Internal Audit - more pain than you realize.

The Challenge
I've been involved in many recovery activities over the past 26 years. In many instances, we found we needed to change our backup processes so we didn't expend an extraordinary amount of resources, or risk sanctions because we were unable to recover discoverable data. One truth that has penetrated my gray matter after all these years is we can always do better anticipating and testing for the types of requests coming in from internal and external sources. The activities that provide the most valuable information about backup process issues are proper data backup design and periodic testing. Both should be based on expected restoration scenarios. The following four scenarios seem the most common:
1) Recovering failed systems
2) Replying to litigation discovery requests
3) Recovering from user or programmer error
4) Ensuring data integrity

Recovering databases for failed systems:
When we talk about backups, most of us immediately think system recovery. Initial backup solution configurations usually focus on this aspect of data recovery, planning for the day when systems fail. The reason we create these backups, usually on tape, is to recover from catastrophic events. However, many organizations keep tapes in off-site storage for years. Regulatory requirements (e.g. payroll records retention) are often used as an excuse. However, information you might have to provide to auditors or government agencies should be easily accessible. Instead of letting the bits deteriorate over time on forgotten tapes on dusty shelves, consider a better data archival system - one that uses inexpensive near line storage (e.g. cheap magnetic or optical disk) and contains only the information you absolutely need to meet legal requirements.

Further, you need to ensure quick recovery when the time comes to rebuild critical systems. The amount of negative impact on a business from a server or data center failure is directly proportional to how long it takes to recover. Recovery solutions must restore critical systems before the maximum tolerable downtime is reached. Innovative approaches can even allow restoration of environments to different hardware platforms.

Another problem caused by most DR backups is the tendency to copy everything to one or two sets of tapes. So in deciding to keep tapes indefinitely, you open yourself to issues related to the number two reason tapes are pulled and placed back into tape drives.

Replying to litigation discovery requests:
Similar to the security adage, "You don't have to secure what you don't store," you don't have to produce for discovery what you don't have. Information stored on tape or other backup media is potentially subject to discovery. Backup processes should conform to company records retention and legal hold policies. What to store, how long, and on what media are decisions based on regulatory constraints and the potential for litigation. Information destroyed in the normal course of business, in compliance with records retention policies, is not subject to discovery. And you don't have to incur six-figure costs trying to extract if from outdated or poorly indexed backups intended for DR.

If your records retention policies call for saving several years worth of messages, word processing documents, spreadsheets, PDFs, etc., be sure to evaluate the best way to search for and recover this information. Design your electronic archive solutions accordingly. Consider solutions that index information based on content, protect documents and messages found to be relevant to legal hold via integrated search capabilities, and can age with the need to get to the information. In other words, the technology used should not be obsolete and unusable when the time comes to use it.

Recovering from programmer or user error:
If I told you that during my 14 years as a programmer I never caused database issues in a production environment, would you believe me? Would you believe anyone who told you? If you answered yes, you haven't been in the IT business very long.

Programmers and users occasionally whack (technical term) databases, resulting in system failure or questionable data integrity. Data restore solutions must allow "surgical recovery," enabling an administrator to quickly search for and restore only what is needed. Recovering small amounts of easily located information reduces recovery time and downtime costs.

Ensuring data integrity:
This recovery category is similar to programmer/user error, covering anything else that might taint data integrity. Again, surgical recovery methods should be available to administrators to restore confidence in the data without severe financial business ramifications.

Backup/Recovery Plan Maintenance
A backup solution is a potential business vulnerability unless it's been tested and the rough spots smoothed. Regular testing against recovery scenarios defined during the design activities is the final step in ensuring recovery processes actually work as expected. Further, review system modifications, changes to retention policies, or shifts in the legal climate to ensure backup and recovery design remains acceptable.

Do you still need to run security scan if you are using package software to protect your environment?

While it is true that the antivirus packages will scan most files against known conditions established by the most currently installed signatures, they do not "scan" the file system in real time. The file system is effectively monitored for accesses and file manipulations done in a way that the antivirus program considers to be a threat.

Okay, let's analyze these facts. I have attended several conferences on IT security and read more than my fair share of reference material on hacking and forensic techniques to protect computers from intrusion. While there is no gospel on this subject, most IT pros that I know, who have a fair amount of exposure in these topics, agree that no one antispam or antivirus product can catch everything.

For example, I am not picking on Symantec specifically, nor will I cite a precise example, but this issue actually happened. The antivirus had current signatures to within a few hours. The server was patched to the most current critical and recommended updates. Yet, there was suspiciously high memory usage on the server in question. It was only upon scrutinizing with Process Explorer from Systinternals, PsList (also from Sysinternals) Netstat, Task Manager, a remote UNC file connection, and a remote port scanner that I was able to confirm that there was an intrusion attempt in progress.

The server had been patched only after a 16-hour time period when a known exploited vulnerability had been published. Through this pin hole, an elevation of privilege attack had occurred. Then a hack tool was installed and a root kit planted.

The root kit hid registry keys, processes, and files from view. Once it was discovered, it was removed easily enough with known tools.

However, other problems were left behind (this was confirmed by file date stamps and checking backups) resulting in another trojan - which the AV supposedly knew about and cleaned - had hold of the machine. This is where the interesting part comes in. The trojan was not actually cleaned. There was human error in that the logs were not scrutinized to confirm that the clean attempt actually failed. This trojan was not the same iteration displayed in the AV package. As the server was being monitored using filemon, psexplorer watching threads, and Netstat, the original infection had remained.

A copy was submitted to the AV vendor anonymously and within a couple of hours, a rapid release was put out which would catch the file in real-time protection. The AV vendor said it was the same iteration of a known virus, but a programmer from a competing vendor cited the mutation differences.

While this was happening, another system was infected so the same process was used to monitor it. A real time scan was performed before the rapid release came out, and the file was quarantined successfully.

Clearly, the AV companies are doing their best to update their documentation precisely as information is put out, but the solution is critical and usually gets published faster. In part, this is likely why vendors accept anonymous file submissions - to help keep in check with viruses in the wild.

My point is just to say that the real-time AV scan does not catch everything. To be honest, a scheduled scan could miss a virus as well, but if a file has similar symptoms to a known virus, it may still have additional hidden code or functionality which can hide it from current real-time scanners.

So my answer to the questions is YES - scheduled third party scans would be highly recommended as part of your defense-in-depth strategy against spyware, malware, trojans, and viruses.