Return to Webwrench Previous Articles Who is the Webwalker? Webwalker Articles in Print
 
WebWalker's World May 1999
Just Say No!Intel P-III & Xeon eat AMD Dust

The K7 Floating Point Screamer

I'm becoming a consumer rebel.

It's sad, because it didn't have to happen. I was always a good kid; I ate my peas, said please and thank you and never, ever, questioned authority.

NOT!

The truth is I have been asking "why?" and looking to find ways to trip up the establishment since I was in First grade. As you can imagine, it made me a pretty annoying kid and an even more annoying teenager. Now that I've settled into a lifestyle of asking "why", my biases are starting to show: I am almost always rooting for the underdog. I think people (and businesses) ought to be nice to each other. In a perfect world they would. This is NOT that perfect world.

Consequently, whenever I have good news to report, both for the underdog and the consumer, I tend to make more of it than may actually play out in the real world. To a certain extent, I've done that over and over when discussing consumer grade microprocessors for the AutoCAD market: I tend to berate Intel's products largely on the basis of their monopolistic and dirty-pool business practices. But it pains me to admit that, as of right now, the Pentium III and the Xeon are probably the better choice over AMD and Cyrix products based solely on their floating point math speed. (Ouch. That hurts.)

You'll notice I said "as of right now." That's because AMD introduces the K7 next month and THAT is the best news I've had ALL YEAR. There are some cautionary flags that need to be raised about K7 and in an attempt to be as balanced as possible, I'll address them.

So, to the facts: The AMD K7 whoops both the Pentium III and Xeon with one hand tied behind it's back. The reason is simple: AMD followed the 18th Century engineering method of improving reliability: if it breaks, double the size or capacity. Most of the capacities of the K7 are either equal or close to double the Intel offerings. There aren't any hard performance numbers yet, but I do have the specifications for the K7 and even conservative estimates are remarkable.

Before we get into a blow by blow comparison, we need to review processor technology. When you consider performance issues in a CPU, the worst bottlenecks aren't usually the processing units themselves. Most of the time, it is the memory capacity or configuration that is slowing the system up. Still, today's computers actually have a lot more RAM memory in them than is advertised: there is RAM in the video card, RAM on the mainboard, RAM in the hard drive or controller, and even RAM in the CPU. RAM has become cheap enough to be spread around the components of a computer to smooth out some of the wait time as components pass information between each other. Buffering, if you will.

While most devices use fairly slow RAM (60 or 70 ns) accessed at a crawl of 50 or 66 Mhz, on a narrow 16 or 32 bit bus, the CPU would die of boredom waiting for data to process or it's next instruction.

To solve this bottleneck, CPUs have two levels of memory cache on or near the chip package itself: The L2 cache, which is near the CPU, but not necessarily on it, and the L1 caches (one for data, one for instructions) which are small, fast, and physically on the CPU silicon wafer. For efficiency sake, the number crunching side of the CPU takes a phased approach to retrieving raw data to be processed. First it checks in the small and fast L1 caches (in it's living room, you might say), then it goes "out to the garage" to check the larger L2 cache. If it fails to find the data in either of those places, it goes downtown to the warehouse (the system RAM) where it takes quite a while to find what it needs because the system RAM is so slow. The absolute worst case scenario is that it has to fetch the data from disk, which is a dead crawl compared to any of the RAM caches. Therefore, the more "in-house" L1 and L2 cache you have, the less frequently you have to fetch data from "downtown."

In additional to RAM capacity issues, there is also the bus speed (Mhz) and width (number of bits) that impacts the rate at which data is moved between the CPU and the rest of the components on the mainboard. There is no point in having gobs of RAM in the L2 cache if the pipe running to it from the CPU is slow and narrow.

Finally, there is the issue of the floating point processor or "Math processor" that handles the floating point math necessary to make programs like 3D Studio and AutoCAD (not to mention your kid's QUAKE 3 games) scream. The superiority of Intel's FPU (Floating Point Unit) is what has kept it the workstation king for many years.

Now that we've got the right issues in mind, let's compare apples with apples first: the Pentium III and Xeon are 6th generation x86 series processors based on the 32 bit Pentium Pro design. Both of them feature an onboard L2 cache, that is, the separate cache that is usually located near the CPU socket on the motherboard has been moved onto the CPU package to speed the transfer of data between the L2 and the processor core. The L2 on Pentium III is 512K and that gets most people down the road in fine style. Heck, considering that the second generation Celeron (the stripped back Pentium II) has only 128K of L2 cache, most business desktop users aren't stretching their systems in such a way as to make the L2 an issue.

Intel's Xeon, on the other hand, is a Pentium II (or Pentium III) that is available with several different L2 options: 512K, 1 Meg, and 2 Megs. The 2 Meg offers a better than 2X performance over the standard Pentium II or Pentium III, but at a price. A very considerable price: A 500Mhz Xeon with a 2MB L2 cache comes in at $3000. Ouch. Xeon is also the only Intel processor that can be installed in a 4 or 8 way mainboard that allows many of them to act as one machine. At $3000 a hit, the price point soars rapidly. Most reviewers have suggested avoiding the Xeon, as a great deal of the processing power it has to offer won't produce measurably faster performance for most applications. However, AutoCAD and especially 3D Studio do take advantage of the power. It just seems like there ought to be a cheaper way to get that power.

Since Pentium II and III are the primary competitors for K7, let's look at them for a second: Both of these critters are based on the proprietary Intel Slot 1 interface that replaced the older Socket 7. Slot1 allows the edge (rather than the face) of the CPU to be plugged into the motherboard. Intel ostensibly claimed this was because Socket7 couldn't support faster bus speeds. That was two years ago. Now we see clearly that SuperSocket 7 allowed AMD and Cyrix to do just what Intel said wasn't practical.

An additional issue with L2 cache is the speed at which it runs relative to the CPU. The Pentium III runs it's L2 memory at half of the CPU speed, so if you have a 450 Mhz CPU, you have a 225Mhz path to the L2. The Xeon, on the other hand, runs it's L2 at the same speed as the CPU. Another performance benefit.

The issue of SLOT1 has earned Intel a lot of bad press and bad vibes because they proprietized it in an attempt to freeze AMD and Cyrix right out of the market. AMD took the roundabout approach: if you can't make it, fake it. AMD's "SLOT A" format is physically identical with the Intel SLOT 1, but it is electrically incompatible. This places the compatibility onus on the motherboard chipset makers who can make physically identical motherboards with the same CPU slot and just vary the chipset installed based on which CPU the board is designed for. AMD got their design from Digital Equipment Corporation, and now have a very viable platform on which to grow the K7 and it's variants. Naturally, Intel isn't too happy about this and will be abandoning the SLOT1 format because it's coup didn't work. The Intel SLOT1 runs at 100Mhz with a ramp up to 133Mhz by September. The AMD SLOT A already runs at 200Mhz! BANG-ZOOM!

Now that I've given you a picture of the environment that the K7 is coming into, Let me share the hard facts about AMD's newest spearhead.

The K7 will have 3 Floating Point Units (FPUs) and will be capable (on a 500Mhz processor) of producing 1 GigaFLOPS (Floating Point Operations Per Second) for non-MMX enhanced processing, and 2 GigaFLOPS for MMX and 3DNOW! Enhanced data. By comparision, the Pentium III at 500Mhz only produces a paltry 500MegaFLOPS for non-MMX enhanced processing. This adds up to a BIG boost in floating point power that AMD did not previously offer, thus removing one of it's primary barriers for use with AutoCAD and especially 3D Studio MAX.

Remember the Pentium III and Xeon specs about the amount of L2 cache? Well, the K7 offers everything from 512K all the way up to an 8 MEGABYTE L2 CACHE. Clearly, AMD has their sights set on the Xeon as a target to beat. With 8MB of L2 it would be hard NOT to beat it. Also, while the Intel series offers only dual 16K L1 cache, K7 will debut with a dual 64K L1. Do you feel your eyeballs getting shoved back into your head yet?

The K7 will also offer RISC like operations like out-of-order execution (which debuted on the K6) and speculative execution. Out of Order Execution numbers are significant: K7 handles 72 OoOEs at once, almost twice that of the Pentium III.

The second RISC feature also bears attention. Speculative Execution (SE) is the silicon equivalent of dice rolling. Essentially, before a decision branch is reached that would require a data load, the SE unit pre-fetches the most likely data and then that data is waiting to be used when the branch occurs, considerably speeding up the execution. If that data wasn't what was necessary, it is dumped and the correct data retrieved. The second possibility is (relatively) quite slow. The SE prefetch takes very little time, as the data has already been prepared for use by the processor. Avoiding "missed speculations" becomes the benchmark of how efficient the SE system is. If these features pan out, many workstations could have this very advanced processing feature for a fraction of the cost of a RISC workstation.

So get excited! This is very good news for you, the end user. If AMD can pull this off, Intel will sudden have the most serious competition to their business and consumer systems ever. AMD outpaced Intel during January and February for sales of sub $1000 PCs, and now it looks like they are aiming at the higher end workstation and server market.

As I promised, however, I'm going to raise some red flags that you need to be aware of. I'd prefer to give unqualified good news, but life in this mortal coil rarely works like that.

AMD has a great reputation for amazing specs that fail to translate in to real world performance. Example: The K5 was supposed to be a Pentium killer, as it was enhanced with technology that AMD acquired in their buyout of NEXT computers. K5 was a dull thud.

Also, despite some great designs, AMD is a fairly small company compared with Intel. That means that while Intel has some thirteen fabrication plans for their products, AMD has...two. And AMD has a reputation for production problems. So while K7 might be a great idea, if AMD can't produce it, K7 doesn't do anyone any good.

The final issue is that, on top of the potential production problems, AMD is trying to make a fairly complex CPU (22 million transistors!) in a package only slightly larger than the Pentium III. The chips are being etched using the .25-micron process, the same the PIII uses. With AMD's limited fab' capacity, this means the yield of processors per wafer will be lower. So the K7, even if it does live up to expectations, will be scarce and expensive. Not at the Xeon prices, but not in the K6 range, either. If you want one, get in line for the June launch and hope nothing goes wrong at the fab' plants.

After all of these details, I'm sure some workstation jockeys are wondering, "So will this thing work as advertised, or not?" My crystal ball is still in storage, but I can point to a significant predictor that surfaced on April 29th: Kryo-Tech demonstrated an unmodified "right out of fab" K7 that, with the aid of a small cooling system, allows K7 to run at 1GigaHertz. The "Super-G" K7 will be available commercially by year's end. Intel won't have their commercial 1Gig system ready until 2000.

Godspeed to thee, AMD.

Peace,

WebWalker

(R. Marshall Webber is Network Technology Consultant for Omicron Consulting of Philadelphia. He and his wife, Sarah, make their home in Maple Shade, New Jersey.)
Return to Webwrench Previous Articles Who is the Webwalker? Webwalker Articles in Print