AMD Alveo V70 AI inference accelerator card(xilinx.com) |
AMD Alveo V70 AI inference accelerator card(xilinx.com) |
It seems like pure management incompetence to me. They need to invest a whole lot more in software, integrating their stuff directly into pytorch/TF/XLA/etc and making sure it works on consumer cards too. The investment would be paid back tenfold. The market is crying out for more competition for Nvidia and there's huge money to be made on the datacenter side but it all needs to work on the consumer side too.
Their attempts at entering the ML space so far have been failures, and they are wise to hold off on really competing with Nvidia until they have the bandwidth to go “all in”. Consciously NOT trying to compete with Nvidia is the reason they didn’t go bankrupt. Their Radeon division minted from 2016-2020 because they focused on a niche Nvidia was neglecting- low-end/eSports (also leveraging their APU expertise to win PS4/Xbox contracts).
I think Nvidia will eventually lose its monopoly on ML/AI stuff as AMD, Apple, Qualcomm, Amazon and Google chip away at their “moat” with their own accelerators/NPUs. As mentioned though, the Nvidia Edge really comes from CUDA and other software, not the hardware. I doubt that Apple, Qualcomm, Amazon or Google will be interested in selling hardware direct to consumers. They want that sweet, sweet cloud money and/or competitive advantages in their phones (like photo processing). I don’t want to be paying AWS $100/mo for a GPU I could pay $600 once for. I do think AMD/RTG will go hard on Nvidia eventually, and it will not matter whether you have an AMD or Nvidia GPU for Tensorflow or spaCy or whatever else.
no, they need a product good at training and gpu compute at a reasonable price
that product doesn't need to be good at rendering, ray tracing and similar
sure students and some independent contractors probably love getting both a good graphic card and a CUDA card in one and it makes it easier for people to experiment with it but company PCs normally ban playing games on company PCs and the overlap of "needing max GPU compute" and "needing complicated 3D rendering tasks" is limited.
through having 1 product instead of two does make supply chain and pricing easier
but then 4090 is by now in a price range where students are unlikely to afford it and people will think twice about buying it just to play around with GPU compute.
So e.g. the 7900XTX having somewhat comparable GPU compute usability then a 4080 would have been good enough for the non company use case, where a dedicated compute-per-money cheaper GPU compute only card would be preferable for the company use case I think.
1) Consumer Nvidia GPU cards on custom PCs
2) Self hosted shared server
3) Cloud infrastructure.
There is no "GPU compute only card" that is widely used outside servers.
> company PCs normally ban playing games on company PCs and the overlap of "needing max GPU compute" and "needing complicated 3D rendering tasks" is limited.
The "don't play games thing" isn't a factor. Most companies just buy a 4090 or whatever, and if they have to tell staff not to play games, they say "don't play games". Fortnight runs just fine on pretty much anything anyway.
https://www.amd.com/en/graphics/servers-solutions-rocm-ml
> For this they need a product comparable to NVIDIA 4090, so that entry level researchers could use their hardware.
Why is a high end product a requirement for entry level research?
Also, ROC-M is a bit of a mess to setup. With Nvidia i just need to install cuda, cudnn and then pip install tensorflow/pytorch.
Are these programmable by the end-user? The "software programmability" section describes "Vitis AI" frameworks supported. But can we write our own software on these?
Is this card FPGA-based?
EDIT: [1] more info on the AI-engine tiles: scalar cores + "adaptable hardware (FPGA?)" + {AI+DSP}.
[1] https://www.xilinx.com/products/technology/ai-engine.html
It's possible that AMD could have reworked an existing Xilinx design to incorporate RDNA chiplets in place of some of the FPGA-gate-grid chiplets, creating a heterogeneous mesh; but I find it just as likely that AMD just took their VLSI for an RDNA core and loaded it onto the existing FPGA.
EDIT: 75W is a smaller card than I expected. "Inference" also usually means "cheaper". so maybe we can be optimistic with $5000-ish ??
Anyone shocked by the price, remember that this is an FPGA-line from Xilinx. Not a GPU from Radeon. Expect very high prices.
> [...]
> **: @10 fps, H.264/H.265
Is 10 fps a standard measure for this kind of thing?
[1] You could skip the last P frame before an IDR frame, but that doesn't buy you much.
Who owns lattice?
In my experience, mostly a marketing number, higher TOPS doesn't actually mean it'll be faster than something with a lower TOPS.
As always, you need to do your own benchmarks with your use case in mind.
Hopefully Intel takes a stab at it with their ARC line out now.
16gb RAM / 96 video channels ... I haven't done any of that work but it feels like they expect that "96" not to be fully used in practice.
As a joke I sometimes tell people the automatic flushing toilets in public bathrooms work by having a little camera monitored by someone in a 3rd world country who remotely flushes as needed, while monitoring a whole lot of video feeds. They usually don't buy it, but will often acknowledge that our world is uncomfortably close to having stuff like become reality.
I'm aware that currently GPU compute only cards are not widly used outside of the server space.
But that's not because people need the consumer GPU features (bedsides video decode) but because the economics of availability and cost lead to consumer GPUs being the best option (and this economic effects don't just apply to customers but also Nvidia itself).
Even on the host, 16 GiB is fine for say an NVR. They don't need to keep a lot of state in RAM (or for that matter to do a lot of on-CPU computation either). I can run an 8-stream NVR on a Raspberry Pi 2 without on-NVR analytics. That's about its limit because the network and disk are on the same USB2 bus, but there's spare CPU and RAM.
Not going to be spending $10k like the tortoise-tts guy (was looking into that project last night) but $2k might be doable for a hobby project. Plus I’d have a computer at the end.
Cloud/datacenter based ML is a huge growth market. Having the same software work on a consumer GPUs and enterprise ML cards is one of Nvidia's competitive advantages.
Why can't large companies tap the investment market? E.g. they could sell bonds to fund it, borrow, etc.
The idea (not aiming at you here, you didn't say this) that senior leadership at AMD is unaware of NVIDIA's lead in this space, and haven't repeatedly considered whether to invest in competing, is absurd. Likewise the idea that anyone outside of AMD understands better than AMD does what it would take in terms of investment _and opportunity cost_, is also absurd.
Senior leadership at AMD isn't dumb. The fact that they're not doing something we want doesn't make them dumb, either. Again, not aiming at you with this little rant :)
Lets try this with another company:
The idea that leadership at Lehmon Brothers is unaware of the fact that they are trading subprime loans is absurd! The leadership isnt dumb
The Idea that leadership at Being is unaware of safety issues with 737 Max is absurd! How could you suggest that anyone outside boesing understands better than they do the risks involved?
I think it's very clear with the benefit of hindsight that not investing enough into the software side of deep learning early on was a bad decision. But it was obvious to me even at the time and I said as much to anyone who would listen (e.g. seven years ago https://news.ycombinator.com/item?id=12258027)
I may not know better. But I know what I like.
It uses DirectML on Windows: https://gist.github.com/averad/256c507baa3dcc9464203dc14610d... This is thanks to Microsoft, not AMD.
On Linux you can use ROCm: https://www.videogames.ai/2022/11/06/Stable-Diffusion-AMD-GP...
The horrible install processes and what a mess this is is all down to AMD.
* "Temporal SVC", in which the frame dependencies are structured so you can discard down to 1/2 or 1/4th of the nominal frame rate and still decode the remainder.
* Three output streams, which you could configure for say forensics (high-bandwidth/high-resolution/high-fps), inference (mid-bandwidth/mid-resolution/low-fps), and viewing multiple streams / over mobile networks (low-bandwidth/low-resolution/mid-fps).
* On-camera ML tasks too. (Although I haven't seen one that lets you upload your own model.)
But other cameras are less good. E.g. some Reolinks [2] only support two streams, and the "sub" stream is fixed at 640x352, which is uncomfortably low. Your inference network may not take more resolution than that, but even if not, you might want to crop down to the area of interest (where there's motion and/or where the user has configured an alert) to improve quality. (You probably wouldn't pair that cheap Reolink camera with this expensive inference card, but the point stands in general.)
Even the "better" cameras' timestamp handling is awful, so it's hard to reliably match up the main stream, sub stream, analytics output, and wall clock time. Given that limitation it'd be desirable to just use the main stream for everything but the on-NVR transcoding's likely unaffordable.
[1] https://github.com/scottlamb/moonfire-nvr
[2] https://github.com/scottlamb/moonfire-nvr/wiki/Cameras:-Reol...
I also have a sneaking suspicion using lower channel counts let you raise the FPS but the max of 96 channels is the hard limit, tuned to allow up to use cases like recognition from unprocessed feeds but the documentation access seems to be a manual approval process so I can't verify for sure.
Good point. At that scale, the price might make sense. (I'd still hesitate to buy this card, though. Based on experience with Amazon VT1 instances, I don't have any faith in Xilinx's software quality.)
There are much lower-cost solutions if you don't need that many cameras, e.g.:
* The Coral TPU is nice and cheap. I keep hoping to see a new version and/or someone making M.2/PCIe cards with several of these chips on it. It doesn't do the video decoding, though, so you need other hardware for that.
* There was an Axelera card just announced. [1] I'm curious to read the reviews when it actually ships to folks.
* The newer Rockchip SoCs advertise decent video decoding and some ML acceleration. I have one and will be trying it out sooner or later.
> The onboard camera processing is usually about justifying a cloud pitch ("we use data to send video when something interesting happens" or "we send only the best picture of the face in HD to save bandwidth but still be able to ID them later") not so much letting you go in and solve your own problem.
My software's more aimed at the home/hobbyist side of things. There some folks go with the canned/cloud stuff (Ring/Nest/whatever) similar to what you're saying. Some do everything at home with e.g. BlueIris and use the on-camera ML stuff as it is. The lack of flexibility (mostly due to closed-source, low-quality software IMHO) is a real problem though. Some folks use something like Frigate that does on-NVR analytics, and I'll eventually add that feature to my own software.
> I also have a sneaking suspicion using lower channel counts let you raise the FPS but the max of 96 channels is the hard limit, tuned to allow up to use cases like recognition from unprocessed feeds but the documentation access seems to be a manual approval process so I can't verify for sure.
I bet you're right.
[1] https://www.cnx-software.com/2023/01/02/150-axelera-m2-ai-ac...
You are claiming that AMD leadership is infalliable based on no evidence whatsoever. Thats whats absurd
In her area. We have no real info on how proficient she is in other areas like ML.