Return to Webwrench Previous Articles Who is the Webwalker? Webwalker Articles in Print
 
WebWalker's World October 1998
Just Say No!Where Backups Are Concerned...

"Where Backups are concerned, there are two kinds of people: Those who do them, and those who wish they had."
- Rick Webber
PAUG President

Disaster Recovery. Just the phrase makes accountants and system administrators cringe. 75% of companies that are struck by major data loss never recover. Small wonder, since it costs approximately $17,000 per megabyte to recreate the missing data.

The worst call I ever got while working for Microsoft was one from a junior partner in a law firm who had just lost the database containing the case notes for current clients and cases, including cases as far back as ten years ago. The poor guy was crying on the other end of the line, begging me to show him how to get the data back. After about ten minutes of diagnosis, I had even worse news for him: the (single!) hard drive for the server had experienced a physical failure and corrupted the File Allocation Table. The FAT is like a Table of Contents of the drive; when it is gone, the data on the disk may as well be random ones and zeros.

Fortunately for this legal nebbish, there are ways to recreate a FAT, but they are expensive and provided by ambulance chasing services that see other people's calamity as their bread and butter. Oh, and they don't usually work.

To avoid falling prey to either the danger of data loss or the pricey vultures that claim to be able to get it back, you must be vigilant and informed.

Unfortunately, vigilance is a tough road to hoe these days. With all the stress most Information Technology folks are under, it is easy to get sloppy and let "non-critical" issues slide. The unfortunate fact of Disaster Recovery is that most management doesn't consider it "critical" until something goes wrong. Then it is too late.

If you are responsible for your organization's data integrity, you need to be sure you understand the issues, dangers and technologies associated with Disaster Recovery. Being out of touch with current data protection technologies could be detrimental to your data and your job. Apply these basic principles of Disaster Recovery to your data integrity planning and you'll avoid most of the obvious potholes.

Avoid any Single Point of Failure (SPF)
This is the number one wrestling match you'll have with accountants, for they see every duplication of hardware as an extra and unnecessary expense: "Why buy two them you only need one to do the job?" Reducing the number of Single Points of Failure is the Golden Rule of data integrity planning. The others are just applications of this maxim.

An SPF can be software, hardware, network infrastructure, company organization, or the knowledge of a single individual. If the backup software your small business uses was written by a company that went out of business ten years ago, you're tempting fate. By the same token you may have redundancy built in to all of your servers, but if there is only one network router to connect them all, you fail the SPF test. The scariest SPF scenario is when one person has been doing your company's data integrity planning in his head for the last five years and gets hit by a cross-town bus. Redundancy must be built into every facet of your data management or you will discover (the hard way) where you failed to do so. Remember the old proverb about the weakest link.

Geographically Isolate Your Data
If your accounting business is on the coast and gets wiped out by a flood, the best onsite backup in the world won't save your data. Move your backup media to a remote location and that way, if you office gets wiped out, your records survive. Hardware can be replaced easily when the waves have receded. It is theoretically possible for a service business to be running again within a week after a natural disaster IF their data survived.

A footnote to geographic isolation: Some people work out a backup scheme that moves their data to a safety deposit box. This is fine if you're using non-volatile media like Writeable CD (CD-R), Rewritable CD (CD-RW), or Floptical. If you're using volatile magnetic media, avoid bank depositories; they are not always environmentally controlled. This can lead to tape de-spooling as the catalyst of the glue that attaches the ends of the media to its spools burns off over time. And since depositories are designed to limit physical access only, they don't take into account magnetic fields, thus potentially leaving your data exposed to corruption or erasure.

My former employer, Boeing, makes very sure its data survives in the event of a natural disaster. (When you are located in earthquake/tidal wave/volcano country, it pays to be prepared.) Each of their regional data centers in the Puget Sound transmits frequently to a vault in Moses Lake, a rural speedbump hundreds of miles inland. The decentralization of the data centers and the vault makes for excellent redundancy.

Plan Your Backup System for Success
Just following the recommended backup scenario that comes with the software may not meet you needs. The more production critical data is, the more frequently it needs to be backed up.

When I worked for the Security office at Seattle Pacific University, I was put in charge of being sure that the incident report database was securely backed up so that no more than 24 hours worth of data could ever be lost. The list of disaster scenarios I was given read like a Tom Clancy novel: Terrorist bombing of the Security office, Network hacked by a malicious student, Destruction Due to Student Riots. Your average series of worst case scenarios.

The system I designed was excessive, and probably more than a Security Office at a medium sized conservative religious school needed. But it was very secure: Each workstation backed itself up. The server backed itself and the workstations up. Then a machine across campus backed up the server. The tapes in the office were rotated off site at the end of each week and the tape in the remote location was replaced daily on a 3 day rotation. Short of a bomb that would level 10 city blocks, that data was safe.

Choose Your Media to Meet Your Needs
While I've had some tough things to say about tape media, it isn't all bad. But it isn't the only choice any more. Media that requires frequent changes or is otherwise "high maintenance" isn't real good for over worked, over stressed administrators. Total cost is another factor: Some solutions are costly in up front investment, but are less expensive in the long run. Systems that don't cost much up front usually require continued expense in consumable media. If you can sell it to your management, go for the more expensive hardware up front. It is easier and cheaper to maintain and takes less time to straighten problems out when they occur.

If you follow the preceding principles, you have a very good chance of surviving a major data disaster. Now let's get to the particulars of backing up your data. Some people labor under the presumption that you should archive every scrap of data on your computer or database every time you run your back up. That is not only time consuming, it isn't financially viable or practically necessary. Consider: If you have to replace a dead hard drive, you are going to have to install an Operating System (OS) before you install the backup software needed to get the contents of your old drive back. For single application users, the installation of the OS could constitute more that half of the total reload. So, don't backup your OS or other easy to reinstall applications if you don't have to. Reinstalling applications from CD is fairly fast, so don't waste time backing them up. Focus on the expensive, labor intensive information: the data that you have generated using your applications. This will significantly cut back the amount of backup media you'll be using, lowering your total cost without effecting your disaster preparedness.

Unless your job involved digital photography or programming, chances are your data files are going to be a lot smaller than you think. I just finished over hauling the system for an author/publisher. The total backup of all of his books, his dealer lists, and his administrative correspondence came to a mere 10 megabytes. Part of that is because text compresses very well, but also because the backup industry encourages users to back up more data than they truly need in an attempt to sell more media. AutoCAD data backups are probably going to be larger, owing to the complexity of the file types, but most systems can be very content with a 1.6 to 4.5 gigabyte tape system. Add up the files that you would need to get back in business and then add 60% as "room to grow" and you have a good ball park for your needs.

Backup and Data Integrity Buzzwords
The first time I tried to work out the difference between the types of backup technologies and techniques, I wound up being a candidate for a rubber room. What is the difference between a Differential and an Incremental backup? What is (I'm not kidding) "shoe-shining"? What are RAID and mirrored drives and are they a better way to protect data than just recording it on to decayable media? What about CD-R and CD-RW? After learning the hard way, I want to decrypt the buzzwords for you.

Differential and Incremental Backups -
If you saved data to your backup system, each differential backup you made would reflect all changes since that original complete backup. Differential backups look for the difference between the original file and what is on the drive now. Thus each differential is compared not to the previous differential, but to the last complete backup. Incremental backups differ in that, instead of comparing the files to be backed up to the last full backup, they check to see only what has changed since the last incremental backup. In a nutshell, they couldn't care less about how long it's been since you did a complete backup. Incremental backups tend to be much faster, but differential backups offer the chance to see every version of the data that has ever been recorded.

RAID - Redundant Array of Inexpensive Drives
Despite the word "inexpensive" in the name, installing 3 or more drives in a system can easily put a dent in the finances of a small business. Newer versions of RAID systems work like this: you have an array of 3 or more drives. Data is written in stripes on each drive, i.e., pieces of discreet files are spread across several drives. By saving a "parity" file at the end of the write session, the disk system not only knows how to reassemble the file from its "striped" pieces, it also provides instructions on how to reconstruct the file if one piece is lost. Nice. If a drive in the array goes down, the other two keep serving out the data, but at a slower rate, as it must be reconstructed on the fly via the parity file. Windows NT workstation and server both offer this very secure feature. You still need a backup system, but RAID will keep your system functioning and making money when others would have been out of business.

Mirroring
Drive Mirroring is where data written to one drive is, at the same time, written to another identical drive. As with RAID, if one goes down you simply lose your safety net until the mirrored drive is replaced. But you KEEP WORKING. Mirroring, while robust, can be a performance drain if two IDE drives share one drive controller. If you are running an IDE system (like the vast majority of us are) put the drives on separate controllers. SCSI drives sharing a single controller are no problem.

CD-R & CD-RW
These two technologies allow you to create your own CDs with data stored on them. They create the CD by focusing a laser on the mylar sandwiched between the acrylic platters of a CD. The laser "burns" spots on the mylar about the diameter of a human hair. The series of spots and unburned areas make up the ones and zeros to be read later.

I think CD-R & CD-RW are great if decayable media give you heart burn. They are unaffected by water, vibration, shock, magnetic fields, and campaign promises. Their life span is better than a century. The media price is to die for: One to two dollars for a CD-R (680MB) and about twelve dollars for a CD-RW. The hardware, however is a bit more pricey. A quality CD-R writer will cost about $300, and a CD-RW writers will run about $650. This is steep for a small company, and the amount of data that can be archived per CD isn't as large as tape. For doing your backups on a workstation where you will be reusing the media, CD-RW is a good deal.

As denoted by the names, the difference is the ability to rewrite new data to a disk once it has already been used. If you just want to archive data on a regular basis and don't mind paying a moderate rolling cost for media, get the cheaper CD-R.

Magnetic Media
There are more formats here than I will get in to (QIC, TRAVAN, etc.). Suffice it to say, magnetic media remains popular despite its volatile nature. It is cheap, easy to find, can store huge volumes of information, is reusable, and is a "known entity." The hardware is inexpensive as well.

But, as was mentioned above, it can be too easily erased, corrupted, or the media may just plain break from usage. There are a high degree of media defects right out of the box.

I presently run magnetic media in my backup system, but intend to get off that hobby horse as soon as possible. Besides being painfully slow, it also grieves me to see how may incompatible software packages are floating around for it. Magnetic media needs to get the hint and go away like the 8-track tape did.

ZIP Drives
I'll be perfectly frank here: I don't like ZIP drives. The media price is equivalent to magnetic tape (because it IS magnetic media, with all of its associated dangers), the capacities are smaller, the printer (parallel) port interface seems designed to give your other parallel devices a migraine and ZIP drives have a MTBF (Mean Time Between Failure) so low that some die within the first year of operation. ZIP owners have come to fear the "click of death", the sharp pop that accompanies the death of their drive. It is an expensive sound because the drive won't disgorge the ZIP disk that was in it when it shuffled off it mortal coil. Better hope your critical data wasn't in the drive that just died.

"Over the Wire"
This is new, hot and controversial. A service company offers a contract to safeguard your data. You pay their monthly fee (about thirty dollars) and in return you download a small program from the Internet. You instruct the program what you want to back up and then turn it loose. At night, it kicks off and (after you've done an initial full backup) collects the differential data, compresses it and encrypts it. Then, using your modem or network connection, the client program connects with a secured web site and sends your backup. At the remote location, data is stored on a fail-safe, physically secured system. To retrieve data, you use that same client piece of software to securely request the file or files you want. You can either get them back over the wire or have the company send you a CD-R of the missing or damaged data.

I like this. I like it because I don't have to buy any more hardware, media, or software. I like it because my data is hidden from prying eyes. I like it because it solves the problem of geographic isolation that I mentioned earlier. I have yet to see anything about a space limitation. In short, this solves most of the issues that are problematic with other backup technologies.

Some folks might balk at the slow speed of a modem transfer, or get nervous about pumping their data over the wire. My experience with this system, marketed by the company @Backup, is that it solves a multitude of headaches for those of us just trying to keep our data safe and would rather pay someone else to manage the infrastructure for us. An 800 number is even provided should you need help setting up or retrieving data.

Whatever solution you choose, keep in mind that data integrity and backups are a critical part of the infrastructure that makes up the modern information driven society. Ignoring them is like leaving your seat belt off. Yes, it takes effort to remember. Yes, it cramps your style. But it can save your hide. And like the seat belt, it is too late when you've already been hit.

Peace,

WebWalker

(R. Marshall Webber is an Independant Technology Consultant. He and his wife, Sarah, make their home near Philadelphia.)
Return to Webwrench Previous Articles Who is the Webwalker? Webwalker Articles in Print