In this column, you’ve frequently heard me echo the refrain that "past is future", and that there is "nothing new under the sun." My peers in the computer industry wonder if I am quite right in the head when I make such declarations. I think some clarification is needed: Truly new uses of existing and emerging technology are happening all the time. Markets are being inundated by amazing innovations. No argument from me on these points.
But what isn’t new (and what interests me the most) is the way PC companies exhibit habits of business culture and ethics that were old when the Judeo-Christian Patriarch Jacob stiffed his uncle Laban over some sheep, circa 2200 BC. The modern equivalents have changed little. An example is Intel driving the market to Slot 1 to strangle competitors AMD & Cyrix. And Microsoft? Bill & Co. are following the finest tradition of 1890’s robber barons. That is how nothing is "new." New can, same spam.
With that point clarified, let’s move on to this month’s dirt: 64 bit CPU architecture. This is going to be some acronym heavy material dealing with how the brains of your bit-box work, so we need to cover some history and definitions first.
Architecture can refer to either hardware or software and defines the basic process and constraints of how the computer Central Processing Units (CPU) queues up data to be processed. The motherboard bus is the data highway between components on the motherboard.
A few years ago it was fashionable to judge a computer’s "speed" by the number of Megahertz (or millions of cycles per second, MHz) of the CPU. This has always been a useless benchmark, as there are many other components that must also be measured to determine a computer’s speed. Like a big engine in a car with only one gear, measuring by only one criteria is misleading and a costly mistake.
Like most products or systems, innovations in technology follow cyclical and, in some cases, oscillating advancement patterns. For computing the cycle is:
New Architecture, faster processor, more memory, bigger hard drive, faster bus, start over.
How do processors work? Basically like a bureaucracy. (I saw the libertarian in the back row flinch.) Data, stored on the hard drive is routed from the drive to the memory, via the bus. How fast it gets from one to another depends on how "wide" the bus is, measured in bits, and how "fast" it is, measured in MHz. When called for, the data again travels out onto the bus, over to the CPU, where it is again stored in a smaller memory called the cache, where it is prepared for processing. The compiler (sort of like a music conductor) decides what pieces of data should go first, and when. The data is then sent from the cache to the actual processor and falls out the other side as usable information, where it is again sent through the bus, and then off to its final destination, be that the screen, printer or other peripheral. The speed at which it is processed is again determined by the "fast" and "wide" measurements of the processor.
Like a bureaucracy, no?
As you can see, a fast processor doesn’t mean zip if the bus is choking the data flow. By the same token, you can have a screaming fast and wide bus, but if your processor is fast and narrow, you’re at another bottle-neck. Thus, the most efficient computers have a balance of CPU, memory, hard drive space, and bus capacity. All of this is constrained by the design of the architecture.
System architecture has come a long way in just 20 years, but the changes were all driven by market needs and constraints, not a daring vision. The following are significant milestones in the PC architecture history:
- In 1978, Intel debuted the 8086 CPU, capable of executing 0.5 million instructions per second (MIPS). It had a 16 bit processor and worked with a 16 bit bus (both pretty narrow) at a speed of either 5, 8, 10 MHz. 640K of memory was "more than enough." Hot stuff then. Furniture now.
- In 1985 Intel presented the 80386, with a 32 bit processor and 32 bit bus. The bus was still slow, but the wider path (measured in how many bits it could transport per cycle) meant the system appeared much faster. Two megs of RAM memory was average. Hot Rods had four megs.
- In 1989 the 486 bowed, and brought along a host of speeds, from 25 MHz at its introduction to 120 MHz at its demise. Yet bus speed continued to plod along at a conservative 16 MHz. Hard drives were over 300 megs and growing, and 8 megs of RAM was "just right." Begin to see the bottlenecks?
- In 1993 Intel expanded the 8086 architecture and introduced the Pentium CPU. This new architecture remained 32 bit (the same "width" as the 386) but significantly speeded up the rate at which data was retrieved by upping the bus width to 64 bits and speeding it up to 50 MHz. Five Hundred Meg hard drives were standard, as were 16 megs of RAM. Now the bus was faster than the CPU. Do you see where we are at in the cycle?
- In 1997 Intel abandoned Pentium architecture for Pentium II architecture, increasing the bus speed to 100 MHz (and capable of increasing to more) and increased its 32 bit processor to 233 MHz. Hard drives were average at 3000 megs and 64 megs of RAM was considered reasonable for a production workstation.
- In 1999...it all changes. Again.
While the bus width has been at 64 bit for about 6 years, the CPU has been strangling within the constriction of the 32 bit processor for the same amount of time. The reason for the long delay was the same as it was from 1978 to 1985: No advanced Operating System to justify the increase in speed. Windows 95, being a partially 32 bit OS was Microsoft’s attempt to drag the DOS command line zealots into the graphical environment Apple pioneered in the early eighties with the Lisa and Mac computers. And the fact that it was "partially" 32 bit shows that they were hedging, too.
Now Intel, with AMD and Cyrix snapping at it heels, has to find a way to get faster and more proprietary before they drop below that magic 60% market share that turns a giant company into a "has been." In November, I told you about how the Slot 1 initiative is designed to give Intel breathing room to make the next big jump in processing power for the PC environment. The next big jump is called "Merced", a CPU that will be the first of an IA-64 family of processors jointly designed and developed by Hewlett Packard and Intel, with Intel handling the actual production of the chip.
The jump to 64 bit processing is more than just an increase of width or speed, it also includes a fundamental restructuring of how the CPU orders internal processes. The x86 architecture is a CISC (Complex Instruction Set Computer) system that enhances its speed by processing a single, complex instruction during each clock cycle. But the Pentium has some more advanced technology tacked on, called RISC (Reduced Instruction Set Computer). RISC differs by processing a greater number of smaller instructions in parallel. This means that the compiler (remember, the music conductor?) must be much smarter about what instructions go first, and how time consuming it will be for the processor to finish the job.
While Pentium and AMD’s K6 offer a few RISC behaviors, they are very much "tacked on" to a preexisting CISC architecture. IA-64 and Merced get to start from scratch, and even write a few new rules. Using the best of RISC and then improving on the concept, HP & Intel have designed RISC’s progeny: EPIC (Explicitly Parallel Instruction Computing.)The primary features that this new architecture offers are:
Inherent Scalability
With CISC and RISC, the parts of the processor (called execution units) that perform calculations are of a fixed number and capacity. IA-64’s design unhooks the number and capacity of execution units from the architecture. Beefier processors can have scores more execution units, while systems for home use just have a few. Both of these systems will be IA-64 systems, but will vary in internal processing capacity based on their target market. This means cheaper machines for the low end.
Explicit Parallelism
Since EPIC systems like IA-64 Merced will be able to process multiple streams of data at a time, the compiler will be enhanced to organize the incoming jobs in the most efficient manner and schedule tough jobs milliseconds in advance. This enhanced compiler handles most of the rest of the features on this list.
Predication:
A performance limiter for CISC computers is branch prediction, or deciding which of two sets of additional instructions will be required to complete a process and loading that instruction so it is ready to go at the right time. This is great if the compiler guessed right, but if it guesses wrong only 10% of the time, it could slow the processor down up to 40% during that cycle. EPIC avoids the slow down by having the compiler prepare both sets of instructions during an idle cycle and then just throwing away the unused one. Predication can decrease mis-predicts by up to 40%.
Speculation:
Fetching data from memory takes time, and is another performance limiter of a CISC CPU because the processor "stalls" while waiting for the data to arrive. (Remember, CISC only does one thing at a time: it has no capacity to execute instructions in any order other than what it received them in.) While the parallel processes of EPIC systems will help, that execution unit waiting for data is still "out of service."
To solve this, EPIC uses Speculation to initiate a load (fetching data from slow external memory) far in advance of its need, that way the data arrives just in time to be processed and be sent on its way. Thus the compiler can schedule work in advance to avoid stalling one of the execution units.
As you can imagine, this is a tremendous undertaking. It is made all the more daunting by the fact that the Merced must be able to run 32 bit code (like Windows 95 or NT) to be reverse compatible with older programs when it hits the shelf as well as fit into a package not much larger than the current Pentium II. It also has to have had long enough development time for companies to write software for it. As my Jewish cousin would say, "Oi!"
Fortunately Intel and HP have been smart about this: IA-64 systems will be reverse compatible with 32 bit applications, and Intel’s forthcoming 0.018 micron die process should shrink the chip enough to make it small enough to avoid uneven cooling. But the big jump has been third party software development for an as yet unbuilt CPU. This has been overcome in a most unusual and innovative way: emulation.
Hardware emulators are software programs that are designed to look like actual running hardware to other components. I have an hardware emulation program in my PC at home that acts like the physical system board for an old Pac-Man game, so when I play Pac-Man, I’m playing the original game. By the same token, this virtual Merced will look and act like the real Merced for the development period. Certainly, it won’t run as fast but it will run, thus providing a development environment for software makers to create their wares for a CPU that hasn’t even rolled off the assembly line yet.
Intel and HP have been working on this project for four years, having begun this job in 1994. The move from platform to platform in the x86 environment has been a bumpy ride. Intel got its teeth jostled along with everyone else and decided to make this one go more smoothly. They’d better get it built right and priced right, because AMD and Cyrix won’t let Intel get too far ahead anymore.
Perhaps there isn’t anything "new" about how companies do business, but the compression of the product development cycle has become staggering. To compete in an increasingly crowded pond, the top frogs are finally having to produce some true innovations to keep their market share. Since more skull sweat invariably produces better products, the future looks very bright for all of us who push a mouse on a daily basis.
Peace,
Webwalker