AMD Unveils Its First Small Language Model AMD-135M

AMD Unveils Its First Small Language Model AMD-135M(community.amd.com)

311 points by figomore 1 year ago | 97 comments

diggan 1 year ago |

> The training code, dataset and weights for this model are open sourced so that developers can reproduce the model and help train other SLMs and LLMs.

Wow, an actual open source language model (first of its kind [from a larger company] maybe even?), includes all you need to be able to recreate it from scratch. Thanks AMD!

Available under this funky GitHub organization it seems: https://github.com/AMD-AIG-AIMA/AMD-LLM

wrs 1 year ago | |

We (developers and tech managers) really need to hold the line on this terminology. This is a full actual open source LLM. The usual “open inference” model is not.

boulos 1 year ago | | |

I assume by "open inference" you mostly mean "weights available"?

amelius 1 year ago | | |

Open source is what you would get if an academic institution would release it.

wmf 1 year ago | | |

You're not wrong, but if you come up with a definition that no one is willing to meet you're just making that definition irrelevant.

GeekyBear 1 year ago | |

> Wow, an actual open source language model (first of its kind

Apple research has previously released another example of a model with open training code, data, and weights, but their model was sized for running inference workloads on mobile devices.

However, Apple has a mobile device line of business and AMD has an enterprise AI accelerator line of business, so they are both doing work relevant to their bottom line.

diggan 1 year ago | | |

Thanks, seems you're talking about the OpenELM family of models: https://github.com/apple/corenet/tree/main/projects/openelm

jerrygenser 1 year ago | |

This would be another example of open source. Not from such a large company but a good reference including code, data, weights, etc.

https://allenai.org/olmo

brianjking 1 year ago | | |

Molmo even more so! The 7b is wild.

kypro 1 year ago | |

Smart move from AMD. Helps develop an ecosystem around their tech and for their GPUs.

jeff_carr 1 year ago | | |

Has anyone tried it? I mean, I would, but as far as I can tell understand I need 4 boxes with 4 GPU's. Plus an interconnect. I mean, I could put in an order for my homelab but at around $80k per box and maybe $20k for the right switches and some other gear, my wife will probably frown at me ordering a $340,000 rig to try this code that I don't know what to do with it if it works.

Maybe some other heavy hitter out there can explain what all this whatchamacallit newfangled synergy producing matrix algebra does after you have it running?

NitpickLawyer 1 year ago | |

> Wow, an actual open source language model

I find it funny that the AI field has somehow normalised the goalpost moving from capabilities all the way to definitions about open source. And people seem really tribal about it...

There absolutely are open source LLMs already. Phi3.5 (MIT), various Mistral models (Apache2.0), various Qwen2 models (Apache2.0) and so on. LLamas are not open source, nor are Gemmas. But to say this is "an actual open source model" is weird nitpicking for the sake of nitpicking, IMO.

Requiring the methods and datasets that someone used to create some piece of IP is in no way a requirement for open sourcing said IP. It never has been!

Imagine this analogy:

A dev comes up with a way to generate source code that solves a real problem. This dev uses a secret seed, that only they know. The dev also uses thousands of hours of compute, and an algorithm that they created. At the end of the exercise they release the results on github, as follows:

- here is a project that takes in a piece of text in english, and translates it into french.

- the resulting source code is massive. 10 billions LOC. The lines of code are just if statements, all the way down, with some hardcoded integer values.

- source code licensed under Apache 2.0, written in let's say python.

- users can see the source code

- users can run the source code

- users can modify the source code and re-release the code

Now, would anyone pre LLMs say "this isn't true open source" because it's too complicated? Because no one can reasonably understand the source code? Because it uses hard coded int values? Because it's 10b LOC? Because the dev never shared how they got those values?

Of course not. The resulting code would have been open source because Apache 2.0 is open source.

It's the same with model weights. Just because they're not source code, and just because you don't know how they were created, it does not mean the weights are not open source.

You can see the weights. You can change the weights. You can re-distribute the weights. It's open source. The definition of something being open source does not cover you understanding why the weights are like they are. Nor do they require you having access to the methods of creating those weights. Or datasets. Or whatever the devs had for breakfast.

Nab443 1 year ago | | |

Great, with that definition we can call all binaries opensource !

aloknnikhil 1 year ago | | |

I completely disagree with you. The fundamental problem with your concept of open source is it goes against what open source really is. The ability for you to completely change what a piece of software can do. IMO, even with LLMs, models are "executables" and weights are "configuration". Yes, of course you can tune the weights by changing the values, but that's the most I can do. Can I actually add "features" to the model? Perhaps you "open-sourced" an LLM model trained on the United States Constitution. Can I change the model to then be a specialist in real estate law? Not with weights. I need it to learn case histories to extend its "feature-set". Without data and the mechanism to reproduce the model, how is this "open-source"?

diggan 1 year ago | | |

> that the AI field has somehow normalised the goalpost moving from capabilities all the way to definitions about open source

The problem is that Facebook and others are trying to move the goalpost, while others like me would like the goalpost to remain where it is, namely we call projects "Open source" when the required parts to build it on our own machines, is sufficiently accessible.

As I probably wouldn't be a developer in the first place if it wasn't for FOSS, and I spend literally all day long contributing to others FOSS projects and working on my own, it's kind of scary seeing these large companies trying to change what FOSS means.

I think you're forgetting about the intent and purpose of open source. The goal is that people can run software for whatever purpose they want, and they can modify it for whatever purpose. This is the intent behind the licenses we use when we "create FOSS".

This means, in practice, that the source code has to be accessible somehow, so the compiler I have on my computer, can build a similar binary to the one the project itself offers (if it does). The source code has to be accessible so I can build the project, but also modify it for myself.

Taking this idea that mostly only applied to software before (FOSS) but applying it to ML instead, it's clear to see what we need in order to 1) be able to use it as we want and 2) be able to modify it as we want.

> You can see the weights. You can change the weights. You can re-distribute the weights. It's open source.

Right. If I upload a binary to some website, you can see the binary, you can change the binary and you can re-distribute it. Would you say the binary is open source?

The weights are the binary in ML contexts. It's OK for projects to publish those weights, but it's not OK to suddenly change the definition and meaning of open source because companies want to look like they're doing FOSS, when in reality they're publishing binaries without any ways of building those binaries with your own changes.

Imagine if the Linux kernel was just a big binary blob. Yes, you can change it, re-distribute and what not, but only in a binary-blob shape. You'd be kind of out there if you insist on calling this binary-blob kernel FOSS. I'm sure you'd be able to convince some Facebook engineers about it, seems they're rolling with that idea already, but the rest of us who exist in the FOSS ecosystem? We'd still have the same goalpost in the exact same spot it's been for at least two decades I've been involved.

bubaumba 1 year ago | |

No, it's not open source till someone can actually reproduce it. That's the hardest part. For now it's open weights open dataset. Which is not the same.

diggan 1 year ago | | |

That's... Not how open source works? The "binary" (model weights) is open source and the "software" (training scripts + data used for training) is open source, this release is a real open source release. Independent reproduction is not needed to call something open source.

Can't believe it's the second time I end up with the very same argument about what open source is today on HN.

Jabrov 1 year ago | | |

What’s the difference?

n_ary 1 year ago |

Now this here is the beginning on real innovation of AI. With AMD coming in(albeit late and slowly), meta with LLama improving, we will soon see some real adaptation and development in next few thousand days. At this moment, I see OAI as the yahoo of the pre-Google era.

imjonse 1 year ago | |

"next few thousand days"

can we stick to years as a unit of measure and not spread Sam Altman's phrase :)

washadjeffmad 1 year ago | | |

Twenty two thousand days

It's not a lot, it's all we got

Twenty two thousand days

- Sam Altman?

highfrequency 1 year ago |

Looks like they are using sixteen $13k GPUs [1] (around $210k hardware) for 6 days of training.

Anyone know the recommended cloud provider and equivalent rental price?

[1] https://www.wiredzone.com/shop/product/10025451-supermicro-g...

layoric 1 year ago | |

MI250s definitely aren’t a common card to rent so only can find Runpod at $2.10 per hour each. This results in a training cost of $4838 + fine tuning of $3225. However this doesn’t include the 11TB of storage or time taken to get the setup actually running the tasks. So likely you wouldn’t see much change from $10k usd if any.

- https://www.runpod.io/gpu/mi250

lhl 1 year ago | |

Runpod.io rents the next-gen MI300X's for $4/hr, although since they also rent H100's for $3/hr (that are easier to work with/faster for training) it might be more of a novelty.

highfrequency 1 year ago | | |

I thought the whole selling point of AMD GPUs was that they were a lot cheaper than Nvidia GPUs?

wmf 1 year ago | |

Hot Aisle seems to the (only?) place to rent AMD. (Ryan, please don't spam this thread. It's not a good look.)

benterix 1 year ago |

I'm happy to see a truly open source model.

Actually, AMD has excellent reasons to make this kind of development and I hope they continue.

luyu_wu 1 year ago |

The section on speculative execution is interesting. "This approach allows each forward pass to generate multiple tokens without compromising performance, thereby significantly reducing memory access consumption, and enabling several orders of magnitude speed improvements."

Does anyone know if the "several orders of magnitude speed improvement" is accurate? I'm doubtful.

Very interesting though! I'll be playing around with this on the weekend!

lhl 1 year ago | |

Orders of magnitude seems a bit ambitious. The implementation from the DeepMind paper achieved a 2-2.5X https://arxiv.org/pdf/2302.01318 and most of the tests I've seen [1][2] have been similar, but there are different variations (Medusa, Ouroboros, etc) that can do better/be combined. Recently Together.ai published SpecExec, a SD variant which did claim to get a 10-18X speedups: https://www.together.ai/blog/specexec

[1] https://www.reddit.com/r/LocalLLaMA/comments/17h4rqz/specula...

[2] https://arxiv.org/pdf/2402.01528v3

lhl 1 year ago | | |

BTW, I got a chance to read through the model card and there's a section that shows their SD gains: https://huggingface.co/amd/AMD-Llama-135m#speculative-decodi...

- 1.75x-2.80x on MI250

- 2.83x-2.98x on NPU

- 3.57x-3.88x on CPU

Note they were testing on AMD-Llama-135m-code as draft model for CodeLlama-7b, both of which do similarly badly on Humaneval Pass@1 (~30%), so it's likely if they were using a similarly trained 135m to SD for say, Qwen2.5-Coder (88.4% on HumanEval), the perf gains would probably be much worse.

Decabytes 1 year ago |

Since most people can’t run these LLMs locally, I wonder what a model would look like where we have hyper tuned models for specific purposes, IE a model for code, a model for prose, etc. you have a director model that interprets what downstream model should be used and then it runs that. That way you can run the model locally, without needing beefy GPUs. It’s a trade off of using more disk space vs needing more vram

wmf 1 year ago | |

The whole point of this model is that it's so tiny that even a weak RPi could run it. Apple has also done some interesting work with a common <4B base model that is customized with different LoRAs for different purposes.

kristianp 1 year ago | | |

About 3b if you're referring to this: https://machinelearning.apple.com/research/introducing-apple...

rkharsan64 1 year ago | |

If you're using a JetBrains IDE, the AI based autocompletions are powered by super tiny LLMs, each trained on a single language. This allows them to run locally and still product decent results.

For example, the C++ model is really good at writing both OpenGL+GLFW and Raylib.

Philpax 1 year ago | |

You're essentially describing Apple Intelligence :-)

https://machinelearning.apple.com/research/introducing-apple... (see Model Adaptation)

fennecbutt 1 year ago | | |

A rip off of LLMs and loras. Wrapping it in a shiny sounding name for the normies doesn't mean they contributed anything to the space.

Havoc 1 year ago | |

>IE a model for code

That's already very much a thing. Codestral, Phind, Starcoder etc.

Fine tuning models on whatever you want is quite accessible if you have a good dataset and a 100 bucks of budget

craftkiller 1 year ago |

I see multiple mentions of NPU on this page, but its still not clear to me: is this something that can finally use the NPU on my processor?

lhl 1 year ago | |

There's actually seems to be a bunch of stuff now:

* https://github.com/amd/RyzenAI-SW - has a list of demos and how to use it directly (including apparently w/ PyTorch and LLMs)

* https://github.com/huggingface/optimum-amd - can use RyzenAI to use the NPU for HF transformers

There's now a Linux driver even https://github.com/amd/xdna-driver although it looks like a sufficiently PITA that I haven't even bothered to try it (my 7940HS only has like 10 TOPS anyway, so not much point even if it worked perfectly).

loufe 1 year ago |

It's always encouraging to see wider hardware platform competition for AI inference and training. Access to affordable and capable hardware for consumers will only benefit (I imagine) from increasing competition.

bjt12345 1 year ago |

> [1] The training code for AMD-135M is based on TinyLlama, utilizing multi-node distributed training with PyTorch FSDP.

I thought PyTorch didn't work well with AMD architecture, and read of many people using JAX instead?

rsolva 1 year ago |

Can this model run on ollama?

suprjami 1 year ago | |

https://huggingface.co/QuantFactory/AMD-Llama-135m-GGUF

rsolva 1 year ago | |

I tried the Q_8, but it was all over the place, so not really trained for instruction and chat yet, I think :)