Micron Kicks Off Production of HBM3E Memory(anandtech.com) |
Micron Kicks Off Production of HBM3E Memory(anandtech.com) |
Combined with this[1] interesting paper from summer 2023 on HBM combined with Xeon processors which would now allow for 144 GB on a single CPU. In theory at least.
1: https://lenovopress.lenovo.com/lp1738-implementing-intel-hig...
Not true. They announced completion of development, with intention to begin mass production in H1. So a couple months out, at least.
> Micron's memory roadmap for AI is further solidified with the upcoming release of a 36 GB 12-Hi HBM3E product in March 2024.
So likely competitive timing with Samsung for 36/12.
48GB consumer cards (or 96GB pro cards) would sell like hotcakes if AMD/Intel dare to break the artificial VRAM segmentation status quo.
Personally I'd love to have as much VRAM as possible (and as high a bandwidth as is possible too) to mess around with simulations in- but that's definitely a pro workload.
I'd love to see like a flagship card have a stupid amounts of VRAM spec option - like an RTX 4090 with 32-48gb of VRAM just to see what happens with it on the market.
AMD is doing the same thing, the only high memory cards they put out (MI300) are for data centers.
https://investors.micron.com/news-releases/news-release-deta...
- CAS latency?
- Wattage?
Also: https://piped.video/watch?v=2G4_RZo41Zw (is this memory same size as 5nm?)
Not naming the company but seems like HBM manufacturers might be going all-in to benefit from Nvidia's stock surge.
A capable GPU with 24+ GB would sell if it significantly undercuts Nvidia. Just look at geohot building his tinyboxes with AMD cards.
Looking at what’s available right now. You need 3 A100 40GB to get this amount of VRAM which will cost you way north of 20000$.
Doing it with A6000s is still about 15k$.
There’s not that many high VRAM options out there you know..
But all of this is kludges. The Radeon RX 7900 XTX has more than twice the memory bandwidth of the M3 Max with much better performance per watt than an array of P40s. What you want is that with more VRAM, not any of this misery.
But yeah the best option remains an MI300 if you can afford that.
https://www.reddit.com/r/LocalLLaMA/comments/142rm0m/llamacp...
What you are talking about is highly optimized inference using accelerators, batching and speculative decoding to achieve high throughout. Once you have that then compute is irrelevant except in terms of cost, but if all you have is a small consumer grade GPU you will be compute limited at the extreme limits of your context window.
I don't need long answers, I need by site specific knowledge base
Intel could very easily just put a buttload of VRAM on their existing GPUs to stick it to their competitors and make out like bandits. All they'd have to do is charge a Big markup instead of an Enterprise markup. And Intel has a better history of not making broken libraries.