Intel introduces 3-GHz desktop chip (2002)(computerworld.com) |
Intel introduces 3-GHz desktop chip (2002)(computerworld.com) |
Everything else needs to keep up to - it is pointless having a fast processor if it has to keep waiting on memory, storage and the network. Those are very slowly catching up and also lead to overall improved performance.
I've been hoping that asynchronous implementations would take over. In theory parts of the chip can run at whatever speeds are best for them at that time, and not have to be synchronised with other parts. And when not in use they easily power down. There were some async ARM chips made, but no progress since 2000 https://en.wikipedia.org/wiki/AMULET_microprocessor
I sat in on a few sales calls from Intel about their new Pentium M / "Centrino" mobile architecture in 2003. What was amazing was that their performance graphs showed that Centrino had all the performance of P4, but with much lower power.
Basically, the terrible P4 microarchitecture, plus Intel's incompatible 64-bit approach (Itanium, aka "the Itanic"), left a big hole in the market where AMD stepped in and mopped up for 3-4 years with Opterons, the first 64-bit x86 processors.
Even today, x64 architecture is called "AMD64" for this reason -- AMD defined the instruction set, and Intel had to follow (for once).
IPC is undoubtedly much higher today, plus now similar machines would have 4 cores or more.
But back up another decade to 1992, where a top of the line PC was a 50MHz 486 with well under half the IPC of the linked Northwood running 60x slower.
For those of us who remember the 80's and 90's, it's a very different world we live in.
And you can't really ignore the massive improvements gained via GPUs. There are your 100x differences
However, I think it's unfair to say clock speed is an unfortunate marketing gimmick "anymore" when it was a gimmick all the way back in 2002 when Intel released the 3.06 GHz "Northwood" Pentium 4 that the OP's linked article references. In fact, it was a gimmick that caused one of the biggest strategy/roadmap blunders Intel ever made.
Intel designed NetBurst (the architecture that the P4 was based on) to do one thing really well: allow Intel to ramp up clock speeds quickly. The architectural choices they made to enable this severely hobbled the P4's performance, especially the 20 (later 31!) stage pipeline that made the penalty for branch mispredictions pretty awful.
Intel eventually released P4s that clocked as high as 3.8 GHz and had an unheard of (in the x86 space) 115 watt TDP, but when the Athlon 64 was released, AMD could smoke Intel's fastest P4s using slower clocked CPUs with lower TDPs. Instead of focusing on raw clock speed, AMD focused on architectural improvements like x86-64, HyperTransport, and an integrated memory controller. (Intel CPUs wouldn't see QPI or an integrated memory controller until Nehalem, released five years after the first Athlon 64.)
As you say, the tables are now turned — clock for clock, the IPC of AMD's Piledriver core is behind that of even Intel's (two generations old) Sandy Bridge core, and all AMD seems to be able to do is add more cores and crank up the clock speed. Unfortunately for AMD, adding more cores doesn't help single-threaded performance, and a very nasty side-effect of increasing clock speed is that the processor's TDP increases disproportionately: the 4.7 GHz FX-9590 has a whopping 220 watt TDP, while the 4.0 GHz FX-8350's TDP is 125 watts.
I know, I know. My laptop (Lenovo T400) has almost the exact same specs as the old Sunfire V20 rackmounts I have (2x2.4ghz cores, 4gb ram, passable video), but the laptop can run on batteries for 5 hours.
But clockspeed is still king -- this T400 run circles around a Lenovo W520 with a 1.6ghz Core i7 and 3x the ram. I know because I had a W520 for work, and I could see the difference.
Since then Vmax has been declining, as the aerodynamics and mechanical complications ( e.g variable intake ramps ) of higher-Mach flight were determined to be less useful than transonic manouevrability and sustained supercruising.
The exception to this trend has been the superfighter category ( F-111, F-14, F-15, F-22, Su-27 ) which have maintained the same ~ M2.5 Vmax due to their specific role. Yes, even the F-111 was meant to be a fleet fighter.
But none have pushed up past the heady M3.0 level that was routinely broken by a series of prototypes in the 1960s.
With die sizes as small as they are, we have a problem where electrons...jump...through basically solid walls from an electrified wire to an unpowered wire. Now, turning on one circuit means the circuit browns-out and a neighboring circuit gets half-powered.
That is also what is happening with computers - it is simply more efficient to cram out more instructions per clock cycle than it is to cram out more clock cycles.
(hint: probably more than the world GDP. Each)
Ok, NOW imagine what 10,000 3.84GHz cores would cost (100 times more aggregate cycles per second than 1x 384GHz core). What's that, you figure a measly $10-30M instead of more money than exists on earth?
Any research simulation is going to want to be parallelized anyway. You'll bump into the limits of the 384GHz core, no doubt about that, at which point you are back to distributed computing. For limitless complexity and limitless appetite for computing power, distributed computing will always be the answer.
http://www.amd.com/us/press-releases/Pages/amd-unleashes-201...
It is for sale now at Newegg for $699. 4.7ghz with 5ghz turbo.
http://www.newegg.com/Product/Product.aspx?Item=N82E16819113...
EDIT: it was actually mid-2004, as noted below.