The idea of data compression has been around for a long time. When reporters used to use telegrams to send and receive messages, they developed abbreviations to shorten their message: If you wanted your office to send you something, you would say to send it "mewards" instead of "to me" because the price for the message was figured by the number of words used.
While the speed and complexity of compression has developed greatly in the past 50 years, the basic principles don't change: Repetition is your friend. When things repeat, patterns emerge. Develop an algorithm to deal with that pattern and you have the basic principles of data compression.
An example of this would be data tokens. Each character of text is worth 8 bits. Thus the word MISSISSIPPI is worth, all told, 88 bits. Now suppose we replace each REPEATED letter with a token worth 3 bits. Each token is keyed to a character that it replaces. Thus the text string now looks like this: MISsissiPpi. (The capitals are the original 8 bit characters, the lower case are the token characters that replace them.) NOW, if we add it all up we have a mere 53 bits. That is a better than 1/3 savings of space.
This is, of course, only an example. Most compression algorithms do a better job than I've illustrated here, but the principle still stands. Binary data (like graphics) compresses well when you have long strings of data that run together and repeat frequently.
Armed with this technology, PKWARE released PkZip archive compression in the 1980s. This software crunched normal files into an archive file with the ZIP extension that was unusable (though highly portable) until decoded. Unfortunately, due to the less than snappy floating point math of the early PCs, compression was so SLOW that archiving data for storage or transport was its only practical use. The decompression process just took too darn long to be uses "on the fly."
Eventually, some bright chap solved the problem with an innovative solution: Hardware assisted drive compression. Stac, Inc.'s Stacker drive compression system for the PC solved the problem by adding an ISA card that took the math load off of the CPU (Central Processing Unit.) The whole drive could be compressed at sometimes as much as 1.6:1. But there is no free lunch: Stacker, while cheaper than buying a new drive and faster than leaving the poor CPU to do the work, still wasn't blazing. But, with 40 MG hard drives retailing for $1500, $150 for the Stacker system seemed like a good deal. But then problems cropped up...
Stacker had the bad habit of garbling data. A bit jammed sideways in the compression algorithm tended to make a terrible mess of someone's drawing or letter to mom, or even system files. Several class action lawsuits cropped up aimed at Stacker and hardware gurus lined up to assert that drive compression was of the devil. (Well, almost...)
Microsoft decided to get into the compression game with the advent of DOS 6, releasing DoubleSpace. It was supposed to be more reliable than Stacker, but Microsoft had their own problems with data corruption when they neglected to mention that it was prudent to defragment the drive first. Later it was discovered that Microsoft engineers had simply recycled (plagiarized) the compression algorithm from Stac. Again, lawsuits flew furiously but this time it was Stac on the winning end. Microsoft changed its compression algorithm and discontinue sales of that version of DOS and released the new version under the name DriveSpace.
Now that we're current, I'll discuss the current drive compression in Windows95 and the newer DriveSpace3.
DriveSpace (DOS 6.22) and DriveSpace2 (Windows95) have come a long way to making your data safe while compressed. If you want to make a 40 MG compressed drive, DriveSpace creates a 40 MG Compressed Volume File (CVF) called DRVSPACE.000 and then hides it such that unwitting users can't kill their own data accidentally. After having made the file, DriveSpace then adds an executable called DRVSPACE.BIN to the root directory. This is a compression interpreter that is responsible for making your operating system think that it has a new drive. Yes, that's right: a compressed drive is just a BIG file with its name ground off and replaced with a drive letter. Your Operating System doesn't know the difference, and DRVSPACE.BIN makes sure data gets in and out of the compressed volume file (CVF) without letting the OS know that it is being duped.
Most people, when faced with shrinking headroom on their computer, invariably compress their WHOLE drive. This isn't necessarily a bad idea, but it does bring up other questions: What happens if the DRVSPACE.BIN interpreter fails? Why do you only see one drive when DriveSpace creates a file and then says its a drive? What happened to the original drive?
When compressing a whole drive, DriveSpace first creates the file DRVSPACE.000 and then assigns it the drive name H:. Then it starts grabbing double handfuls of data off of the C: drive and moving them into the H:. When it has moved almost all of the data over (it does keep a few system files on C:) it then does some razzle-dazzle with the drive letters and SWITCHES the drive letters. C: becomes H:, H: becomes C: and you appear to have only one much larger drive because H: is hidden. C: (the CVF) is all that remains visible.
If you have compressed your whole hard drive and you boot your system, the following steps occur:
- Having done its Power On System Test (POST) the Basic Input Output System (BIOS) goes to the Master Boot Record (MBR) of the C: drive and executes whatever programs it finds there.
- The first thing that runs is DRVSPACE.BIN. This little fellow renames C: to H: and then finds the hidden DRVSPACE.000 and names it C:.
- The CVF, now named C:, appears as a drive to the BIOS, which happily loads the system files and command interpreter from the MBR on the CVF.
The problem with this is that if something happens to your DRVSPACE.BIN file, you are left with an unconverted C: drive and no visible data because all of your data (and most of your OS) is in a hidden CVF. This is the point most people call Microsoft either blubbering or threatening lawsuits. The former will get you quick sympathetic help. The second will get you transferred to Microsoft Legal, and they don't care if your drive gets fixed. Advice: be polite.
Thankfully, the DriveSpace designers created a way out of this dilemma: just replace the DRVSPACE.BIN file that was damaged with the one you have on the startup disk you made while installing Windows95. If the CVF should fail to mount, restart the computer with the startup disk in the drive and when you get to a prompt, type SYS C: and hit enter. The DRVSPACE.BIN will be fixed and your can reboot the system (this time with the disk out) and your drive will be restored to normal operation. If your don't have that startup disk, or can't lay hands on it, you are playing Russian Roulette with a fully loaded gun. The first time something goes critically wrong, the only thing your system will be is dead. Curtain. No flowers.
While there isn't much corruption danger from DriveSpace anymore, I would encourage you to avoid DriveSpace3 that comes with the Microsoft PLUS! companion pack for Windows95. DriveSpace3 has the capacity to compress at better than 2:1, sometimes as much as 3:1. But you pay, friends and neighbors, you pay. SLOW speed, and algorithms that don't work the way they should. It is risky and not a good step forward for Microsoft. I wholeheartedly recommend avoiding it like the plague. The best way to avoid it is to be sure when installing PLUS! that you choose what components to install and uncheck DriveSpace3.
Above all, keep in mind that without backups, done regularly and OFTEN, every system is a train wreck waiting to happen. There is something kindred about cars with recently expired warrantees and hard drives that operate one minute beyond their MTBF (Mean Time Between Failure): They both have the karmic predisposition to explode on you during your hour of need. Backups are your only defense.
[As I finish writing this, the new prices for Hard Drives have just come out for the month of November. When you can get a 2 gig drive (2000 Megabytes) for a mere $250, why cripple your computer's performance by making it go through the extra step of drive compression? I predict that drive compression will experience a renaissance some time in the next 10 years, but right now we are making drives bigger and cheaper faster than we can fill them. Perhaps a small drive could be compressed and used as an "on the fly" archive for occasionally referenced data, but resorting to compression in today's market is self defeating and unnecessary.]
Peace,
Webwalker