AMD Instinct MI325X in Q4 2024, 288GB of HBM3E

AMD Instinct MI325X in Q4 2024, 288GB of HBM3E(ir.amd.com)

82 points by asparagui 2 years ago | 50 comments

shrubble 2 years ago |

The claim that the next generation would be 35x faster, felt like an "Osborne moment" to me, but if demand is robust enough...

Netcob 2 years ago | |

In AI, that doesn't sound too surprising to me right now.

I just experiment with some local LLMs, but the differences are pretty huge:

Llama 3 8B, Raspberry Pi 5: 2-3 Tokens/second (but it works!)

Llama 3 8B, RTX 4080: ~60 Tokens/second

Llama 3 8B, groq.com LPU, ~1300 Tokens/second

Llama 3 70B, AMD 7800X3D: 1-2 Tokens/second

Llama 3 70B, groq.com LPU, ~330 Tokens/second

There seem to be huge gaps between CPU, GPU and specialized inference ASICs. I'm guessing that right now there aren't many genius-level architecture breakthroughs, and that it's more about how much memory and silicon real estate you're willing to dedicate to AI inference.

SushiHippie 2 years ago | | |

What quantization levels did you use?

I think groq doesn't use quantization, so the gap between your hardware and groq would be even further apart.

zozbot234 2 years ago | | |

> Llama 3 70B, AMD 7800X3D: 1-2 Tokens/second

How much RAM is required for this result? It's quite impressive that it even works as well as it does.

wmf 2 years ago | |

Nvidia is doing the same thing. They announced B100 before H200 shipped and a few hours ago they started talking about R100 before B100 shipped.

ipsum2 2 years ago | |

(Re: Osborne effect) It's going to be released in 2 years. Rarely can businesses wait that long, they're going to be ordering the MI300 now.

karma_pharmer 2 years ago | | |

Or they're trying to distract attention from the fact that they've already sold out 100% of the fab capacity available to produce these chips for the next two years.

So really, they lose nothing. They've already booked sales of everything there is to sell. So might as well now turn attention to those who might be customers two years from now, and make them feel like the wait will be worth it.

latchkey 2 years ago | | |

[deleted] See below, I did not understand the Osborne effect comment.

Havoc 2 years ago |

> 35x increase in AI inference performance compared to AMD Instinct MI300 Series

Even for marketing claims that’s pretty wild.

Still lots of trajectory left in just scale up plan it seems

layoric 2 years ago | |

I think there is a close limit considering most of these gains are coming from the reduced memory bandwidth consumption that comes with the smaller data types. This would line up with Nvidia’s crazy graph from yesterday where data types were specified.

How much lower can these go though? 2bit? 1.58bit? 1bit? It seems that these massive gains have a very hard stop to gains that AMD and Nvidia will use to raise their stock price before it all comes to a sudden end.

jauntywundrkind 2 years ago |

Such a weird & cruel modernity, where these releases are purely in the abstract. No, you still won't be able to buy a MI300X in Q4 2024. The enhanced edition will absolutely not be available.

(I miss the old PC era where the world at large was benefiting in tandem from new things happening (or falling behind from not adapting)).

nabla9 2 years ago |

AMD comparison:

  8x AMD MI300X (192GB, 750W) GPU
  8x H100 (80GB, 700W) GPU

What would be the result against

  8x H100 NVL (188GB, <800W) GPU

?

DrNosferatu 2 years ago |

Is the software stack working (for practical use)?

AMD still has to prove themselves in this.