PyTorch Library for Running LLM on Intel CPU and GPU

PyTorch Library for Running LLM on Intel CPU and GPU(github.com)

308 points by ebalit 2 years ago | 95 comments

vegabook 2 years ago |

The company that did 4-cores-forever, has the opportunity to redeem itself, in its next consumer GPU release, by disrupting the "8-16GB VRAM forever" that AMD and Nvidia have been imposing on us for a decade. It would be poetic to see 32-48GB at a non-eye-watering price point.

Intel definitely seems to be doing all the right things on software support.

riskable 2 years ago | |

No kidding... Intel is playing catch-up with Nvidia in the AI space and a big reason for that is their offerings aren't competitive. You can get an Intel Arc A770 with 16GB of VRAM (which was released in October, 2022) for about $300 or an Nvidia 4060 Ti with 16GB of VRAM for ~$500 which is twice as fast for AI workloads in reality (see: https://cdn.mos.cms.futurecdn.net/FtXkrY6AD8YypMiHrZuy4K-120... )

This is a huge problem because in theory the Arc A770 is faster! It's theoretical performance (TFLOPS) is more than twice as fast as an Nvidia 4060 (see: https://cdn.mos.cms.futurecdn.net/Q7WgNxqfgyjCJ5kk8apUQE-120... ). So why does it perform so poorly? Because everything AI-related has been developed and optimized to run on Nvidia's CUDA.

Mostly, this is a mindshare issue. If Intel offered a workstation GPU (i.e. not a ridiculously expensive "enterprise" monster) that developers could use that had something like 32GB or 64GB of VRAM it would sell! They'd sell zillions of them! In fact, I'd wager that they'd be so popular it'd be hard for consumers to even get their hands on one because it would sell out everywhere.

It doesn't even need to be the fastest card. It just needs to offer more VRAM than the competition. Right now, if you want to do things like training or video generation the lack of VRAM is a bigger bottleneck than the speed of the GPU. How does Intel not see this‽ They have the power to step up and take over a huge section of the market but instead they're just copying (poorly) what everyone else is doing.

Workaccount2 2 years ago | | |

Based on leaks, it looks like intel somehow missed an easy opportunity here. There is an insane demand for high VRAM cards now, and it seems the next intel cards will be 12GB.

Intel, screw everything else, just pack as much VRAM in those as you can. Build it and they will come.

ponector 2 years ago | | |

I don't agree. Who will buy it? A few enthusiasts who wants to run LLM locally but cannot afford M3 or 4090?

It will be a niche product with poor sales.

glitchc 2 years ago | | |

I think the answer to that is fairly straightforward. Intel isn't in the business of producing RAM. They would have to buy and integrate a third-party product which is likely not something their business side has ever contemplated as a viable strategy.

chessgecko 2 years ago | |

Going above 24GB is probably not going to be cheap until gddr7 is out, and even that will only push it to 36gb. The fancier stacked gddr6 stuff is probably pretty expensive and you can’t just add more dies because of signal integrity issues.

frognumber 2 years ago | | |

Assuming you want to maintain full bandwidth.

Which I don't care too much about.

However, even 16->24GB is a big step, since a lot of the model are developed for 3090/4090-class hardware. 36GB would place it lose to the class of the fancy 40GB data center cards.

If Intel decided to push VRAM, it will definitely have a market. Critically, a lot of folks will also be incentivized to make software compatible, since it will be the cheapest way to run models.

sitkack 2 years ago | |

What is obvious to us, is an industry standard to Product Managers. When is the last time you have seen an industry player upset the status quo? Intel has not changed that much.

zoobab 2 years ago | |

"It would be poetic to see 32-48GB at a non-eye-watering price point."

I heard some Asrock motherboard BIOSes could set the VRAM up to 64GB on Ryzen5.

Doing some investigations with different AMD hardware atm.

stefanka 2 years ago | | |

That would be an interesting information. Which MB works with with which APU with 32 or more GB of VRAM. Can you post your findings please?

LoganDark 2 years ago | | |

When has an APU ever been as fast as a GPU? How much cache does it have, a few hundred megabytes? That can't possibly be enough for matmul, no matter how much slow DDR4/5 is technically addressable.

belter 2 years ago | |

AMD making drivers of high quality? I would pay to see that :-)

haunter 2 years ago | |

First crypto then AI, I wish GPUs were left alone for gaming.

talldayo 2 years ago | | |

Are there actually gamers out there that are still struggling to source GPUs? Even at the height of the mining craze, it was still possible to backorder cards at MSRP if you're patient.

The serious crypto and AI nuts are all using custom hardware. Crypto moved onto ASICs for anything power-efficient, and Nvidia's DGX systems aren't being cannibalized from the gaming market.

azinman2 2 years ago | | |

Didn’t nvidia try to block this in software by slowing down mining?

Seems like we just need consumer matrix math cards with literally no video out, and then a different set of requirements for those with a video out.

baq 2 years ago | | |

They were.

But then those pesky researchers and hackers figured out how to use the matmul hardware for non-gaming.

OkayPhysicist 2 years ago | |

The issue from the manufacturer's perspective is that they've got two different customer bases with wildly different willingness to pay, but not substantially different needs from their product. If Nvidia and AMD didn't split the two markets somehow, then there would be no cards available to the PC market, since the AI companies with much deeper pockets would buy up the lot. This is undesirable from the manufacturer's perspective for a couple reasons, but I suspect a big one is worries that the next AI winter would cause their entire business to crater out, whereas the PC market is pretty reliable for the foreseeable future.

Right now, the best discriminator they have is that PC users are willing to put up with much smaller amounts of VRAM.

UncleOxidant 2 years ago | |

> Intel definitely seems to be doing all the right things on software support.

Can you elaborate on this? Intel's reputation for software support hasn't been stellar, what's changed?

whalesalad 2 years ago | |

still wondering why we can't have gpu's with sodimm slots so you can crank the vram

amir_karbasi 2 years ago | | |

I believe that the issue is that graphic cards require really fast memory. This requires close memory placement (that's why the memory is so close to the core on the board). expandable memory will not be able to provide the required bandwidth here.

riskable 2 years ago | | |

You can do this sort of thing but you can't use SODIMM slots because that places the actual memory chips too far away from the GPU. Instead what you need is something like BGA sockets (https://www.nxp.com/design/design-center/development-boards/... ) which are stupidly expensive (e.g. $600 per socket).

justsomehnguy 2 years ago | | |

Look at the motherboards with >2 Memory channels. That would require a lot of physical space, which is quite restricted on a 50 y/o standard for the expansion cards.

chessgecko 2 years ago | | |

You could, but the memory bandwidth wouldn’t be amazing unless you had a lot of sticks and it would end up getting pretty expensive

Hugsun 2 years ago |

I'd be interested in seeing benchmark data. The speed seemed pretty good in those examples.

captaindiego 2 years ago |

Are there any Intel GPUs with a lot of vRAM that someone could recommend that would work with this?

Aromasin 2 years ago | |

There's the Max GPU (Ponte Vecchio), their datacentre offering, with 128GB of HBM2e memory, 408 MB of L2 cache, and 64 MB of L1 cache. Then there's Gaudi, which has similar numbers but with cores specific for AI workloads (as far as I know from the marketing).

You can pick them up in prebuilds from Dell and Supermicro: https://www.supermicro.com/en/accelerators/intel

goosedragons 2 years ago | |

For consumer stuff there's the Intel Arc A770 with 16GB VRAM. More than that and you start moving into enterprise stuff.

ZeroCool2u 2 years ago | | |

Which seems like their biggest mistake. If they would just release a card with more than 24GB VRAM, people would be clamoring for their cards, even if they were marginally slower. It's the same reason that 3090's are still in high demand compared to the 4090's.

DrNosferatu 2 years ago |

Any performance benchmark against 'llamafile'[0] or others?

[0] - https://github.com/mozilla-Ocho/llamafile

VHRanger 2 years ago | |

You can already use intel GPUs (both ARC and iGPUS) with llama.cpp on a bunch of backends:

- SYCL [1]

- Vulkan

- OpenCL

I don't own the hardware, but I imagine SYCL is more performant for ARC , because it's the one intel is pushing for their datacenter stuff

[1]: https://www.intel.com/content/www/us/en/developer/articles/t...

donnygreenberg 2 years ago |

Would be nice if this came with scripts which could launch the examples on compatible GPUs on cloud providers (rather than trying to guess?). Would anyone else be interested in that? Considering putting it together.

antonp 2 years ago |

Hm, no major cloud provider offers intel gpus.

belthesar 2 years ago | |

Intel GPUs got quite a bit of penetration in the SE Asian market, and Intel is close to releasing a new generation. In addition, Intel's allowing for GPU virtualization without additional license fees (unlike Nvidia and GRID licenses), allowing hosting operators to carve up these cards. I have a feeling we're going to see a lot more Intel offerings available.

VHRanger 2 years ago | |

No, but for consumers they're a great offering.

16GB RAM and performance around a 4060ti or so, but for 65% of the price

_joel 2 years ago | | |

and 65% of the software support, less I'm inclined to believe? Although having more players in the fold is definitely a good thing.

anentropic 2 years ago | |

Lots offer Intel CPUs though...

tomrod 2 years ago |

Looking forward to reviewing!