Friday, June 18, 2010

Data Recovery: 5 Things You Must Know

Hard drive data recovery is one of the most difficult tasks a sysadmin can attempt to perform, so it should be considered a last resort that, after many frustrating hours, may not even work at all. Data Recovery or Disaster Recovery can be thought of in several categories, and it is important to have a plan in place, and know the options before resorting to one of your local hard disk data recovery services. Many people fail to consider the value of their data until it is too late. This article discusses general principles, with some specific examples from experience with Novel, Microsoft, and Linux operating systems. Detailed tool discussions will be reserved for separate posts.

Here is a short list of categories, roughly in order of increasing level of difficulty to recover.

  • Backups and off-site storage
  • RAID
  • Documents
  • File Systems
  • Data Recovery from Hard Drives

There is software available that can recover from several types of failures, but it does not always work when there is corruption, or failure of disk drives, media or other hardware. Not only that, but recovery can cost hundreds of dollars for each disk, and is not guaranteed to recover any data. This is why backups are essential. If you are not a sysadmin, and just have a single computer to work on, there is probably some data that you consider important. If nothing else, copy it to another location such as a USB drive, another computer, a CD, or an ftp site.

Reliable Backups

Obviously, when there are reliable backups, there should be little difficulty in restoring the data, but it may not be as easy as it sounds. Some of the problems with backup and restore procedures could be:

  • The data may be outdated.
  • The missing data may not have been backed up yet.
  • The tape or other media may fail.
  • The tape, or media may not be found because it was misfiled, mislabeled, or too old to keep.
  • The time lost in restoring data may be important.
  • The backup software may have failed, or not been setup properly.
  • Directories, drives, or systems may not have been selected to be backed up.
  • The backed up files may have been corrupted, or infected with a virus.
An important part of disaster recovery is reliable backups. How can the backups be verified? Try a restore once in a while to make sure it is working properly. I have seen restores fail when backed up on one file system, and restored to another. The file was restored, but the data was corrupt. It took a while to figure out, but when an Excel file was backed up from Netware and restored to NT some of the cells were corrupted. Another time a problem was noticed when attempting to restore. Some new directories had not been selected to be backed up. Sometimes software will allow you to select directories, but will not automatically include subdirectories. These are the types of things that should be tested by attempting to restore once in a while.


RAID Systems

One of the best inventions for sysadmins was the RAID system. RAID, or a Redundant Array of Inexpensive Disks provides a means to quickly recover data in the event of a drive failure. In normal operation, systems will use RAID 5, which includs three or more disks in an array. When any one drive fails, the system continues to function normally, and the drive can be replaced at the leisure of the system administrator. If it was on good hardware, there is usually no downtime. The drive can be "hot-swapped", meaning that the defective drive can be removed, the replacement drive inserted, and the array rebuilt -- all without taking down the computer system.

Here is an example of how RAID arrays saved time when moving servers. We had a couple dozen servers to move across town, and were concerned about the data on the drives as the servers were moved. There were reliable weekly and daily backups, but the current day had not been backed up. The standard tape backup process would take several hours. The solution was to remove a drive from each system, and replace it with a spare. How did that work? Each system had a RAID 1 array, which consists of two mirrored SCSI drives. Removing one drive gave us an instant backup that could be used to restore the array in case the other drive failed while being transported. Replacing the drive with a spare caused the array to rebuild in a matter of minutes, so that there was even less chance of failure. Of course we were careful in moving the hardware, and none of the drives failed, but we saved several hours of time-wasting tape backups. All we had to do was reconnect the servers, and assign new IP's of course, and then let the nightly tape backup continue as scheduled.

Document Autosave

Documents are in a little different category of data recovery, since we are discussing recovery of documents that have not been backed up by the daily backup process. It is common to be working on a document during the day, and have backups run at night. If there is a corruption, or crash while working on a new document, there may be a way to recover the most recent work. We will look at vi as an example, but other text editing, and word processing software often have a similar feature.

Using Autosave in vi.

Documents are automatically saved in vi, but it is still a good idea to save your work as you go. Simply type Esc:w! to save the current file.

Try this. Open a file, and type some text.

$vi test.txt
i
This is a test
Esc:wq!

Now look at the directory.

$ ls -a
. .. .test.txt.swp test.txt test.txt~

Notice that there is a hidden swap file, your original file, and a backup file. Vi will automatically delete the file on exit, or use the file, if it still exists the next time you open the original file. The file ending with a tilde ~ is the auto-save version of the file, which may or may not have the latest data in case of a crash.


File Systems

File system integrity can be handled with several utilities, each of which could have a discussion of their own.

One of the most commonly used Linux utilities is fsck. The fsck utility will run automatically at boot up if it detects that it has not been run in a while, or if the system was not shut down properly. More advanced users will need to learn this and other utilities which can be run on unmounted file systems, often from a boot disk. Be careful about running fsck on mounted drives, or RAID arrays.

Data Recovery Software and Services

Before resorting to a data recovery service, it may be worth the time to look for some software that can recover the data you need. A data recovery service can be helpful when there is no other way to retrieve the data. They are able to open drives that do not spin, and read them with a microscope, but the process can cost thousands of dollars. If the data can be recovered by using software, the recovery service may be able to restore the data for a few hundred dollars. Wait a minute. If they are using software, can't I do the same thing? The short answer is - maybe. Some software is free, and some costs several hundred to several thousand dollars. As our readers know, we love free open source software. The first thing to try is some of the recovery software, and boot disks listed on freshmeat.net. Then search for data recovery software on search engines such as google, bing, or yahoo. If the free ones don't work, some of the other might do the trick, but the free software should at least show that a partition exists on the drive, and that there might be something to recover. Remember, if the hardware is noisy, or unreliable, it is best to take it to a data recovery service, and not keep it powered on.

No comments: