AMD-powered Frontier supercomputer breaks the exascale barrier

AMD-powered Frontier supercomputer breaks the exascale barrier(tomshardware.com)

216 points by lelf 4 years ago | 168 comments

briffle 4 years ago |

What blows my mind is the newest NOAA super computer (that triples the speed of the last one) is a whopping 12 petaflops. It comes online this summer.

It kind of shows the difference in priority spending, when nuclear labs get >1000 petaflop super computers, and the weather service (that helps with disasters that affect many Americans each year) gets a new one that is 1.2% of the speed.

https://www.noaa.gov/media-release/us-to-triple-operational-....

jcranmer 4 years ago | |

The national labs aren't purely--or likely even mostly--dedicated to nuclear research. Instead, they cover a lot of the basic science research. These supercomputers will likely be used for projects like exploring cosmological models, or studying intramolecular interactions for chemical compounds, or fine-tuning predictions about properties of the top quark, etc.

gibolt 4 years ago | | |

Because these are so linked to research, everyone and their cousin is vying for time on them. Even though it may be massive, no individual will get anywhere near peak.

gh02t 4 years ago | | |

Oak Ridge in particular is in DoE Office of Science. They do some national security work, but their primary focus is basic science. Some of the national labs do primarily do nuclear weapons related research, but not Oak Ridge. Frontier is only doing unclassified work, primarily basic science and engineering.

briffle 4 years ago | | |

I didn't know that, thank you for clarifying for me!

mulmen 4 years ago | |

Would a faster computer improve outcomes for victims of natural disaster? How much is left undiscovered about weather?

Research spending is based on the potential for discovery. As a species we have studied weather since the beginning of time. How long have we been doing nuclear research? A century?

Is there even an opportunity cost here? Or is it an economy of scale? As we build more supercomputers the costs go down. So NOAA and ORNL both get what they need for less.

Twirrim 4 years ago | | |

> Would a faster computer improve outcomes for victims of natural disaster? How much is left undiscovered about weather?

The US is way behind on weather modelling, in part due to lack of computing power available to do the grids at sufficiently small cells compared to Europe and other parts of the world. That means less accurate predictions and less advance notice of impending disasters, which means more risk of loss of life and impact on infrastructure and the economy (and vice versa, inaccuracy can lead to more caution than is necessary, which has economic impact too). The US has to lean on Europe etc. for predictions.

https://cliffmass.blogspot.com/2020/02/smartphone-weather-ap...

Talks about the fact that IBM / Weather.com actually uses a more accurate system than the NWS uses, because the NWS is still stuck on GFS (been several years now since congress passed an act to force NOAA to update away from it, and unfortunately it takes time)

aeroman 4 years ago | | |

Although meteorology is in many ways a much older science, I think you are underselling the difference (and importance of computers here). Better computing power means a more accurate forecast, but typically also a longer forecast horizon. That is critical when preparing for natural disasters and absolutely saves lives all the time.

Even at a 3-day lead time, GFS was still suggesting landfall for hurricane Sandy outside the New York region, the longer lead times provided by other centers (with more computing power) were very important for preparation [1].

Even on the science side, increased computing power enables a host of new discoveries. Even storing the locations for all the droplets in a small cloud would require an excessive amount of memory, let alone doing any processing [2]. Increased computer power enables us to better understand how clouds respond to their environment, which is a key uncertainty in predicting climate change.

Many disciplines of meteorology are also much newer than nuclear physics. Cloud physics (for example) only really got started with the advent of weather radar (so the 1940s). Before that, even simple questions (such as can a cloud without any ice in it produce rain?) were unknown.

Even today, we still have difficulty seeing into the most intense storms. You cannot fly an aircraft in there, and radar has difficulty distinguishing different types of particle (ice, liquid, mushy ice, ice with liquid on the surface, snow) and is not good at coutning the number of particles either.

Even after thousands of years, we are onlyjust now getting the tools to understand it. There is a lot left to discover about the weather!

[1] - https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/201...

[2] - https://www.cloudsandclimate.com/blog/clouds_and_climate/#id...

Kuinox 4 years ago | | |

> Erik P. DeBenedictis of Sandia National Laboratories has theorized that a zettaFLOPS (1021 or one sextillion FLOPS) computer is required to accomplish full weather modeling, which could cover a two-week time span accurately.[121][122][123] Such systems might be built around 2030.

https://en.wikipedia.org/wiki/Supercomputer

ip26 4 years ago | | |

One of the more commonly discussed values is predicting where a major hurricane makes landfall. We can't reliably do that yet, but if we could, evacuation zones would be both smaller & more effective.

phonon 4 years ago | |

You're quite right.

"An estimate of future HPC needs should be both demand-based and reasonable. From an operational NWP perspective, a four-fold increase in model resolution in the next ten years (sufficient for convection-permitting global NWP and kilometer-scale regional NWP) requires on the order of 100 times the current operational computing capacity. Such an increase would imply NOAA needs a few exaflops of operational computing by 2031. Exascale computing systems are already being installed at Oak Ridge National Laboratory (1.5 exa floating point operations per second (EF)) and Argonne Labs (1.0 EF) and it is likely that these national HPC laboratories will approach 100 EF by 2031. Because HPC resources are essential to achieving the outcomes discussed in this report, it is reasonable for NOAA to aspire to a few percent of the computing capacity of these other national labs at a minimum. Substantial investments are also needed in weather research computing. To achieve a 3:1 ratio of research to operational HPC, NOAA will need an additional 5 to 10 EF of weather research and development computing by 2031. Since research computing generally does not require high-availability HPC, it should cost substantially less than operational HPC and should be able to leverage a hybrid of outsourced, cloud and excess compute resources."[1]

[1]https://sab.noaa.gov/wp-content/uploads/2021/11/PWR-Report_2...

the_svd_doctor 4 years ago | |

DOE computers are used by a wide variety of people/teams/projects, including academics and other institutions though.

systemvoltage 4 years ago | |

This is a weird take. There are so many things behind the scenes to say anything conclusively. Different compute loads, different problem domains, different accuracy and predictability requirements, etc.

Cynicism is unwarranted, but it fits the current zeitgeist, biases and feels good.

arinlen 4 years ago | |

> (..) when nuclear labs get >1000 petaflop super computers (..)

Would you prefer the research being performed based on empirical testing instead of running simulations?

ok_dad 4 years ago | | |

IMO: we had good enough nuclear weapons 50 years ago to glass the whole planet, so why continue to try and improve a weapon of armageddon? Just maintain and build the same old nuclear weapons that are effective enough and try and remove the need for the weapons over time with the diplomatic and political process.

irfn 4 years ago | |

I am curious as to what class of problems are being solved on these super computers. Also whats the abstraction of computation here. Is it a container :-t :-t :-t

chomp 4 years ago | | |

Weather modeling - X kilometers by Y layers of atmosphere can get expensive to compute really quick. And NOAA does more than just simulate weather, they're running climate/sea level rise/arctic ice modelling, aggregating sensor data from buoys/balloons/satellites, processing maps, and more.

I can't speak for NOAA, but my experience with supercomputing has been that there is no abstraction of computation, your workload is very much tied to hardware assumptions.

jandrewrogers 4 years ago | | |

They are used to do large-scale high-resolution analysis or simulation of complex systems in the physical world. The codes typically run on the bare metal with careful control of resource affinity, often C++ these days.

They aren't just used for global-scale geophysical processes like weather and climate or complex physics simulations. For example, oil companies rent time to analytically reconstruct the 3-dimensional structure of what's underneath the surface of the Earth from seismic recordings.

convolvatron 4 years ago | | |

no, its a gang scheduled process - at least that's been the standard model. those processes are run either close to bare metal or as a process on linux. containers would be useful to package up the shared dependencies, so that may have changed.

N1H1L 4 years ago | |

The nuclear lab computers are also rented out to anyone who applies for an XSEDE grant. Anyone with a successful grant gets free access (obviously limited to a reasonable core-hors). Anyway, a ton of university researchers, all the way from materials simulations to weather groups will be using this computer to run their codes, as they have done for the last ones too.

In fact, such use accounts for the vast majority of the compute use.

mrb 4 years ago |

The HN crowd would probably prefer reading the many technical details at the ORNL press release: https://www.ornl.gov/news/frontier-supercomputer-debuts-worl... which I just submitted here: https://news.ycombinator.com/item?id=31573066

Also, yesterday Tom's hardware had a detailed article: https://www.tomshardware.com/news/amd-powered-frontier-super... 29 MW total, 400 kW per rack(!)

And, anyone else is like me and wants to see actual pictures or videos of the supercomputer, instead of a rendering like in venturebeat article? Well, head here, ORNL has a very short video: https://www.youtube.com/watch?v=etVzy1z_Ptg We can see among other things: that it's water-cooled (the blue and red tubing), at 0m3s we see a PCB labelled "Cray Inc Proprietary ... Sawtooth NIC Mezzanine Card"

pvg 4 years ago | |

Not much point submitting a dupe with the discussion already on the front page but you can email your better links to the mods who are looking for better a better link:

https://news.ycombinator.com/item?id=31571551

uniqueuid 4 years ago |

Since they are using AMD's accelerators as well [1], I do wonder whether any usage of these will trickle down and give us improvements in ROCm.

Surely the people at these labs will want to run ordinary DL frameworks at some point - or do they have the money and time to always build entirely custom stacks?

[1] AMD Instinct MI250x in this case.

gsibble 4 years ago |

What an incredible achievement. Good for AMD. The Epyc is a fantastic processor.

And there are another 2 (3?) faster systems coming online in the next year or so.

adrian_b 4 years ago | |

Besides being the first system exceeding the 1 Exaflop/s threshold, what is more impressive is that this is also the system with the highest ratio between computational speed and power consumption (i.e. the AMD devices have the first place in both Top500 and Green500).

The AMD GPUs with the CDNA ISA have surpassed in energy efficiency both the NVIDIA A100 GPUs and the Fujitsu ARM with SVE CPUs, which had been the best previously.

Unfortunately, AMD has stopped selling at retail such GPUs suitable for double-precision computations.

Until 5 or 6 years ago, the AMD GPUs were neither the fastest nor the most energy-efficient, but they had by far the best performance per dollar of any devices that could be used for double-precision floating-point computations.

However, when they have made the transition to RDNA, they have separated their gaming and datacenter GPUs. The former are useless for DP computations and the latter cannot be bought by individuals or small companies.

Const-me 4 years ago | | |

> The former are useless for DP computations

Looking at “double-precision GFlops” columns there [1] they don’t seem terribly bad, more than twice as fast compared to similar nVidia chips [2]

While specialized extremely expensive GPUs from both vendors are way faster with many TFlops of FP64 compute throughput, I wouldn’t call high-end consumer GPUs useless for FP64 workloads.

The compute speed is not terribly bad, and due to some architectural features (ridiculously high RAM bandwidth, RAM latency hiding by switching threads) in my experience they can still deliver a large win compared to CPUs of comparable prices, even in FP64 tasks.

[1] https://en.wikipedia.org/wiki/Radeon_RX_6000_series#Desktop

[2] https://en.wikipedia.org/wiki/GeForce_30_series#GeForce_30_(...

visarga 4 years ago | | |

Computational speed is important, but more important is the data transfer speed. At least in ML. Is AMD the best for data transfer speed?

photochemsyn 4 years ago |

I wonder if having one supercomputer with x number of chips or having eight supercomputers each with x/8 number of chips would be the more practical working setup. Weather forecasting for example is basically a complex probabilistic algorithm, and there's a notion that running eight models in parallel and then comparing and contrasting the results will give better estimates of actual outcomes than running one model on a much more powerful machine.

Is it feasible to run eight models on one supercomputer, or is that inefficient?

timbargo 4 years ago | |

You can partition a large compute cluster into many smaller ones. Users can make a request specifying how many processors they want for how long. Check out this link to see the activity of a supercomputer at Argonne.

https://status.alcf.anl.gov/theta/activity

And I believe it is more efficient to have a single large cluster. As there are large overheard costs of power, cooling, and having a physical space to put the machine in. Plus a personnel cost to maintain the machines.

derac 4 years ago | |

You can run many programs on one supercomputer simultaneously, yes. Check out XSEDE. Cost-wise one big is going to be cheaper than 8 small due to infrastructure issues - cooling, maintenance, space, etc.

pphysch 4 years ago | | |

"XSEDE" proper is getting EOL'd in a couple months and transitioning to ACCESS [1].

[1] - https://www.hpcwire.com/off-the-wire/nsf-announces-upcoming-...

scardycat 4 years ago |

Congratulations to AMD, HPE and ORNL! This is an amazing achievement. Can't wait to see the spectacular science results coming from this installation.

Intel was supposed to build the first Exascale system for ANL [1] [2]. to be installed by 2018. They completely and utterly messed up the execution, partly drive by 10nm failure, went back to the drawing board multiple times, and now Raja switched the whole thing to GPUs, a technology that Intel has no previous success with and rebased it to 2 ExaFlops peak, meaning they probably expect 1 EF sustained performance, a 50% efficiency. No other facility would ever consider Intel as a prime contractor again. ANL hitched their wagon to the wrong horse.

1. https://www.alcf.anl.gov/aurora 2. https://insidehpc.com/2020/08/exascale-exasperation-why-doe-...

gigatexal 4 years ago |

I am still kicking myself every time I look at AMD’s share price. I sold a not-insignificant-to-me amount of shares when the price was basically below 10 a share. Now it’s above 100. All this is to say that the turn around at AMD is good to see and the missteps at Intel are hilarious.

This is like the time the Athlon64 and it’s on die memory controller was kicking the Pentiums around.

edm0nd 4 years ago | |

Now would be a pretty decent time to buy back in if you still wanna go long on AMD again.

gigatexal 4 years ago | | |

I did a few weeks ago. It’s the only thing other than Nvidia that is up in my portfolio right now, lol.

BooneJS 4 years ago |

While AMD gets top billing for the compute cores, HPE used the acquired Cray Slingshot network to create this heterogeneous supercomputer. It has a 64-port, 12.8 Tb/s bandwidth switch, it scales to >250,000 host ports with maximum of 3 hops, and it uses Ethernet "plus optimized HPC functionality".

dang 4 years ago |

This reads more or less like a corporate press release - (edit: actually, it reads exactly like a corporate press release) - is there a more substantive article on the topic?

eslaught 4 years ago | |

It's not an article, but there's always the front page for the supercomputer (includes some limited specs):

https://www.olcf.ornl.gov/frontier/

There's also detailed architecture specs on Crusher, an identical (but smaller) system:

https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide...

mrb 4 years ago | |

I like this one, it gets into the specifics of the hardware, specifically the 7 slides in the middle of the article: https://www.tomshardware.com/news/amd-powered-frontier-super...

dang 4 years ago | | |

Ok, we changed to that from https://venturebeat.com/2022/05/30/amd-powers-worlds-most-po.... Thanks!

jamesredd 4 years ago |

China has two exaflop supercomputers. It's doubtful whether this is the world's most powerful supercomputer.

https://www.nextplatform.com/2021/10/26/china-has-already-re...

ouid 4 years ago | |

I'm not really sure why you would trust this claim from China. Its not impossible, but its also not impossible to lie about

sekia 4 years ago | | |

> I'm not really sure why you would trust this claim from China.

Why not? While I don't remember what was the previous US's x86 cluster that ranked as top of Top500 List (RoadRunner in 2009?), China's Tianhe-3 and OceanLight are direct successors of Tianhe-2A and TaihuLight, which are once fastest and still in top 10. These seems more promising to me.

neo_blackcap 4 years ago | | |

NYT claimed so

https://www.nytimes.com/2022/05/30/business/us-supercomputer...

dekhn 4 years ago | |

all that matters in this context is whether they run TOP500 or not.

robswc 4 years ago |

Seems I've heard nothing but good things about AMD for the last 10 years or so.

I once had an terrible experience with AMD ~10 years ago that made me swear off them for good. Had something to do with software but I remember it taking several days of work/solutions.

Willing to give them another try soon though. I never seem to even use the full power of whatever CPU I get, lol.

UberFly 4 years ago | |

Lisa Su joined AMD in 2012 and in 2017 the first Zen chips were released. Good people making good decisions.

verst 4 years ago | |

Late 2020 I switched from Intel to AMD Ryzen 5900X for my gaming PC and only had great experiences as far as gaming is concerned.

I should point out that there were significant USB problems on AMD B550, X570 chipsets (eventually addressed via BIOS updates).

Unfortunately some professional audio gear is only certified for use with Intel chipsets and I have experienced some deal-breaking latency issues with ASIO drivers. For gaming I will be happy to continue using AMD - but for music I will probably switch back to Intel for my next rig.

robswc 4 years ago | | |

That actually sucks because I do a lot of music stuff and having any issues with ASIO would be a deal breaker. Thanks for the heads up! One of those things I would have never even thought of to check!

Also sums up my AMD experience 10 years ago. Stuff just wasn't working :/

peter303 4 years ago |

One petaflop DP linpack achieved in 2008. Supercomputing "Moores Law" is doubling speed every 1.5 years, order of magnitude every five years, a thousand-fold 15 years. Pretty close to schedule.

Onward to a zettaflop around 2037?

vfclists 4 years ago |

I read somewhere that this means the US now has the world's fastest supercomputer.

Does this No. 1 position have something to do with the ban on exporting advanced technology to China?

anuvrat1 4 years ago |

Can someone please explain, how software is made at this scale?

mhh__ 4 years ago | |

Fairly low tech until you get to the super high end.

You have a blend of very specific domain specific knowledge (e.g. they know the hardware - the interconnects more than the CPUs) and old skool Unix system administration.

sydthrowaway 4 years ago | |

Using a HPC Framework, such as OpenMP

curiousgal 4 years ago |

How much of that performance will get undone by the software though? Either through AMD's lack of effort or Intel's compiler "sabotage".

ghc 4 years ago | |

It probably won't be a factor. The likelihood of the system using standard compilers or drivers is quite low. It's non-trivial to optimize a compiler and drivers for a supercomputer, so companies like Cray make their own.

mihaic 4 years ago |

The more powerful processors become, the less I feel there's a need to build supercomputers.

Thinking about it, the most powerful supercomputer in the world is pretty much a million consumer processors, working in parallel. That's going to stay pretty constant, since cost scales roughly linearly.

If X is the processing power of $1k of consumer hardware, the bigger X gets, the less there is a difference in the class of problems that you can solve with X or X * 1e6 processing power.

uniqueuid 4 years ago | |

Sure, but consumer hardware does not have infiniband or other high-bandwidth interconnects. That means you can have at most ~1-2TB of ram accessible at any point. Some problems need coordination, and when you're back at OpenMP etc., a supercomputer suddenly makes sense.

mihaic 4 years ago | | |

I agree right now, I'm thinking maybe in 15 years you can have >1PB on a single machine, and then those problems that don't fit in that space but that fit in a supercomputer become fewer. 2050 will be within out lifetime.

Basically I'm estimating the benefit ratio to be (log SupercomputerSize - log ConsumerSize)/log ConsumerSize, and that keeps decreasing.

mastax 4 years ago | |

The coherent memory interconnects between nodes is typically what makes supercomputers different than just a bunch of consumer hardware. It allows different types of programming or at least makes them easier.

jabl 4 years ago | | |

It's a very fast, very low latency network fabric. But it's not coherent in the sense of cache coherent multiprocessors, and it doesn't offer shared memory style programming where you'd just load/store to addresses that happen to be mapped to another compute node somewhere in the system.

hdjjhhvvhga 4 years ago | |

By the way, while cost may scale linearly, the number of cores doesn't[0]. We have more powerful computers in our pockets than Cray supercomputers from the 80s. And I feel we still haven't learned how to use these cores in an efficient way.

[0] https://i.imgur.com/Gad4cKk.png

xhkkffbf 4 years ago | |

If you think of it this way, aren't some botnets truly the most powerful computing systems?

SoftTalker 4 years ago |

Since Cray stopped making their own CPUs, they have been back and forth between AMD and Intel several times.

wmf 4 years ago | |

It's not really back and forth; Cray supports Intel, AMD, and ARM CPUs equally as well as Nvidia, AMD, and Intel GPUs.

zepmck 4 years ago |

The most powerful and unfortunately unusable supercomputer of the world. AMD's approach to GPUs is on a failing track since its inception. The only software stack available is super fragile, buggy and barely supported. Rather than building a HPL machine I would have preferred see public money spent in a different way.

ghc 4 years ago | |

It's a supercomputer. The programming model is very, very different. The software stack is full of incredibly fragile stuff from any number of manufacturers. It's honestly hard to even describe how much more difficult using MPI with Fortran on a supercomputer is compared to anything I've ever touched elsewhere. Maybe factory automation comes close?

wait_a_minute 4 years ago | | |

How could someone get practical experience in this space?

moffkalast 4 years ago |

Now the real question: Can it run Crysis... without hardware acceleration?

cesarb 4 years ago | |

> Can it run Crysis... without hardware acceleration?

I understand you are joking, but it's a legitimate benchmark, one which I've seen at least Anandtech using. For instance, a quick web search found an article from last year (https://www.anandtech.com/show/16478/64-cores-of-rendering-m...) which shows an AMD CPU (a Ryzen 9) running Crysis without hardware acceleration at 1080p at nearly 20 FPS. As that article says, it's hard to go much higher than that, due to limitations of the Crysis engine.

marcusjramsey 4 years ago |

hmm

belter 4 years ago |

Thank you to the authors for not calling it the fastest computer in the world :-) and instead, as they should, the most powerful. Clock speed is not the only factor of course, as instruction per cycle and cache sizes have an impact, but for a pure measure of speed, the fastest still is:

- For practical use, and non overclocked, the EC12 at 5.5 Ghz: https://www.redbooks.ibm.com/redbooks/pdfs/sg248049.pdf

- An AMD FX-8370 floating in Liquid Nitrogen at 8.7 Ghz: https://hwbot.org/benchmark/cpu_frequency/rankings#start=0#i...