Intel introduces 3-GHz desktop chip (2002)

Intel introduces 3-GHz desktop chip (2002)(computerworld.com)

29 points by mikektung 12 years ago | 46 comments

rogerbinns 12 years ago |

Instructions per cycle has been getting somewhat better, and it is that number multiplied by the clock speed that is a better indicator of actual performance https://en.wikipedia.org/wiki/Instructions_per_cycle

Everything else needs to keep up to - it is pointless having a fast processor if it has to keep waiting on memory, storage and the network. Those are very slowly catching up and also lead to overall improved performance.

I've been hoping that asynchronous implementations would take over. In theory parts of the chip can run at whatever speeds are best for them at that time, and not have to be synchronised with other parts. And when not in use they easily power down. There were some async ARM chips made, but no progress since 2000 https://en.wikipedia.org/wiki/AMULET_microprocessor

nilsbunger 12 years ago | |

This was in Intel's dark days of P4 / "Netburst" microarchitecture. They goosed lots of GHz out of the chip by going with a very deep pipeline, but performance in real-world applications was terrible. (deep processor pipelines kill you when you mispredict a branch).

I sat in on a few sales calls from Intel about their new Pentium M / "Centrino" mobile architecture in 2003. What was amazing was that their performance graphs showed that Centrino had all the performance of P4, but with much lower power.

Basically, the terrible P4 microarchitecture, plus Intel's incompatible 64-bit approach (Itanium, aka "the Itanic"), left a big hole in the market where AMD stepped in and mopped up for 3-4 years with Opterons, the first 64-bit x86 processors.

Even today, x64 architecture is called "AMD64" for this reason -- AMD defined the instruction set, and Intel had to follow (for once).

IPC is undoubtedly much higher today, plus now similar machines would have 4 cores or more.

yuhong 12 years ago | | |

I wonder why it took until later in 2006 before a version of Pentium M showed up with 64-bit support.

victorf 12 years ago |

Light has been stuck at c for the last decade, too. When will it break this barrier?

ioquatix 12 years ago | |

This is the most awesome comment in this post.

axaxs 12 years ago |

Clock speed is an unfortunate marketing gimmick anymore. I dare compare it to peak horsepower. A 3ghz chip from today will run circles around a chip from 2002, and with less power to boot. AMD is ahead in the clock speed race, but gets beat handily by "slower" Intel processors, while using twice the power. The focus going forward is going to be on power efficiency and using more cores, not clock speed.

ajross 12 years ago | |

This is true, but it's missing the point. A modern CPU gets probably 50% more work out of a median clock cycle and runs 33% faster for single threaded (turbo) workloads. So it's twice as fast. And sure, there are four of them on the die.

But back up another decade to 1992, where a top of the line PC was a 50MHz 486 with well under half the IPC of the linked Northwood running 60x slower.

For those of us who remember the 80's and 90's, it's a very different world we live in.

zokier 12 years ago | | |

The (single threaded) performance improvement is significantly larger. Anandtech has 2005 vintage Pentium in their benchmarks: http://anandtech.com/bench/product/92?vs=836 and there is probably a significant perf difference between 2002 and 2005.

And you can't really ignore the massive improvements gained via GPUs. There are your 100x differences

rwg 12 years ago | |

First off, you're absolutely right. The difference in IPC between a modern CPU and a 2002 vintage Pentium 4 is pretty incredible, and the way forward is all about power efficiency and cramming more cores onto one die.

However, I think it's unfair to say clock speed is an unfortunate marketing gimmick "anymore" when it was a gimmick all the way back in 2002 when Intel released the 3.06 GHz "Northwood" Pentium 4 that the OP's linked article references. In fact, it was a gimmick that caused one of the biggest strategy/roadmap blunders Intel ever made.

Intel designed NetBurst (the architecture that the P4 was based on) to do one thing really well: allow Intel to ramp up clock speeds quickly. The architectural choices they made to enable this severely hobbled the P4's performance, especially the 20 (later 31!) stage pipeline that made the penalty for branch mispredictions pretty awful.

Intel eventually released P4s that clocked as high as 3.8 GHz and had an unheard of (in the x86 space) 115 watt TDP, but when the Athlon 64 was released, AMD could smoke Intel's fastest P4s using slower clocked CPUs with lower TDPs. Instead of focusing on raw clock speed, AMD focused on architectural improvements like x86-64, HyperTransport, and an integrated memory controller. (Intel CPUs wouldn't see QPI or an integrated memory controller until Nehalem, released five years after the first Athlon 64.)

As you say, the tables are now turned — clock for clock, the IPC of AMD's Piledriver core is behind that of even Intel's (two generations old) Sandy Bridge core, and all AMD seems to be able to do is add more cores and crank up the clock speed. Unfortunately for AMD, adding more cores doesn't help single-threaded performance, and a very nasty side-effect of increasing clock speed is that the processor's TDP increases disproportionately: the 4.7 GHz FX-9590 has a whopping 220 watt TDP, while the 4.0 GHz FX-8350's TDP is 125 watts.

stephengillie 12 years ago | |

Actually, the performance difference is small. The biggest gain we have today is multi-cores, so no one thread can hog all processing pipelines. If you're pushing through 1 billion instructions, it will still take 1/3 of a second for your CPU to chunk through all of it, but other code can be processed simultaneously on another core.

I know, I know. My laptop (Lenovo T400) has almost the exact same specs as the old Sunfire V20 rackmounts I have (2x2.4ghz cores, 4gb ram, passable video), but the laptop can run on batteries for 5 hours.

But clockspeed is still king -- this T400 run circles around a Lenovo W520 with a 1.6ghz Core i7 and 3x the ram. I know because I had a W520 for work, and I could see the difference.

perlpimp 12 years ago | |

If you have more cores, and a single task then you have to deal with partitioning of a dataset for a task - if it is partitionable. Thats where the problem lies with SMP computing. Branch predictors have gotten whole lots smarter just as compilers better at giving code that fits better into multiple pipelines.

Mikeb85 12 years ago |

The problem is physics. We can't get to higher clock speeds with current materials, due to heat. It's kind of like how fighter jets haven't got any faster (top speed anyway) since the 60's...

marshray 12 years ago | |

The MIG-25 is rated at Mach 2.8 GHz, but can be overclocked to 3.2.

https://en.wikipedia.org/wiki/Mig-25

dingaling 12 years ago | | |

Indeed, and first flew in 1964. Note that the follow-on MiG-31 was considerably slower despite sharing the general aerodynamic platform.

Since then Vmax has been declining, as the aerodynamics and mechanical complications ( e.g variable intake ramps ) of higher-Mach flight were determined to be less useful than transonic manouevrability and sustained supercruising.

The exception to this trend has been the superfighter category ( F-111, F-14, F-15, F-22, Su-27 ) which have maintained the same ~ M2.5 Vmax due to their specific role. Yes, even the F-111 was meant to be a fleet fighter.

But none have pushed up past the heady M3.0 level that was routinely broken by a series of prototypes in the 1960s.

stephengillie 12 years ago | |

Well...it's more complex...

With die sizes as small as they are, we have a problem where electrons...jump...through basically solid walls from an electrified wire to an unpowered wire. Now, turning on one circuit means the circuit browns-out and a neighboring circuit gets half-powered.

akira2501 12 years ago | |

With current materials in the CMOS manufacturing process; to be a little nitpicky.

yxhuvud 12 years ago | |

Fighter jets havn't got any faster because more speed is worthless compared to better avionics.

That is also what is happening with computers - it is simply more efficient to cram out more instructions per clock cycle than it is to cram out more clock cycles.

mikektung 12 years ago |

Low power and multicore are cute and all, but imagine the type of machine learning we could do on 384GHz cores.

sliverstorm 12 years ago | |

Ok, now imagine what a 384GHz core would cost.

(hint: probably more than the world GDP. Each)

Ok, NOW imagine what 10,000 3.84GHz cores would cost (100 times more aggregate cycles per second than 1x 384GHz core). What's that, you figure a measly $10-30M instead of more money than exists on earth?

Any research simulation is going to want to be parallelized anyway. You'll bump into the limits of the 384GHz core, no doubt about that, at which point you are back to distributed computing. For limitless complexity and limitless appetite for computing power, distributed computing will always be the answer.

morkfromork 12 years ago | |

Imagine the type of machine learning we could do on low power 384GHz multi-cores

aheilbut 12 years ago | | |

Imagine the type of machine learning we could do on 384 low power 1GHz cores.

Aardwolf 12 years ago |

Cool. Back then there still were articles about a new faster desktop CPU! Today, whenever there's news about a CPU, it's about some other low power mobile whatever thing that is not faster. Yawn.

derefr 12 years ago | |

I wonder: what ratio of FLOPs would you get, between this chip, and an array of "low power mobile whatever thing"s adding up to an equivalent power-draw?

millstone 12 years ago | | |

Surely DSPs or GPUs achieve the best FLOPs per watt.

Aardwolf 12 years ago | | |

I would be interested in the numbers.

ilaksh 12 years ago |

If we can't make the clock speed faster what about massively increasing the size of the on chip cache? I think they call them like l2 and l3 or something. If I had 1gb of cache then maybe my whole program could run without doing much main memory access. That would be fast right?

wmf 12 years ago | |

Check out Haswell GT3e with 128 MB of L4 cache. It helps, but probably not enough to justify what they're charging for it.

m_mueller 12 years ago | |

As long as all users of your program also have that $5000 CPU (if it would exist), this might be a good idea. Cost-wise, with more than 5 users, it probably starts becoming viable to port that application to something more sane, for example $1000 GPUs like NVIDIA's Titan.

Impossible 12 years ago |

We've made some progress with GHz :). http://www.computerworld.com/s/article/9239098/Desktop_chips...

NoPiece 12 years ago | |

Indeed, AMD announced their first 5ghz CPU in June at E3.

http://www.amd.com/us/press-releases/Pages/amd-unleashes-201...

It is for sale now at Newegg for $699. 4.7ghz with 5ghz turbo.

http://www.newegg.com/Product/Product.aspx?Item=N82E16819113...

veemjeem 12 years ago | | |

Though speed bumps aren't exactly as crazy as the old days. I remember my next upgrade from a 50Mhz 486 ended up being a 266Mhz Pentium 2. This was over a span of about 3 years.

auctiontheory 12 years ago |

Power consumption is much better.

stephengillie 12 years ago | |

x86 & x86-64 cores still create heat like incandescent lightbulbs.