Mistral AI Launches New 8x22B MOE Model(twitter.com) |
Mistral AI Launches New 8x22B MOE Model(twitter.com) |
Also, what's the best way to benchmark a model to compare it with others? Are there any tools to use off-the-shelf to do that?
You would have to confirm with someone deeper in the ecosystem, but I think you should be able to run this new model as is against a llamafile?
My recent work optimizing CPU evaluation https://justine.lol/matmul/ may have come at just the right time. Mixtral 8x7b always worked best at Q5_K_M and higher, which is 31GB. So unless you've got 4x GeForce RTX 4090's in your computer, CPU inference is going to be the best chance you've got at running 8x22b at top fidelity.
Really easy to search huggingface for new models to test directly in the app.
I’m sure they are already working on it.
https://api.together.xyz/playground/language/mistralai/Mixtr...
Which has the link to the tweet instead of the profile:
Why would you want another 8x7b, if you already have it ...
Language support is one big thing that is missing from open models. I’ve only found one model that can do anything useful with Norwegian, which has never been an issue GPT-4.
I think it might be the end for 24GB 4090 cards though :(
Not surprising since GPT-4 is still state-of-the-art and much bigger. Where Mistral has been particularly impressive is when you take the size of the model into account.
But unless you’re running bs=1 it will be painful vs 8x GPU as you’re almost certain to be activating most/all of the experts in a batch.
Really though if you're just looking to run models personally and not finetune (which requires monstrous amounts of VRAM), Macs are the way to go for this kind of mega model: Macs have unified memory between the GPU and CPU, and you can buy them with a lot of RAM. It'll be cheaper than trying to buy enough GPU VRAM. A Mac Studio with 192GB unified RAM is under $6k — two A6000s will run you over $9k and still only give you 96GB VRAM (and God help you if you try to build the equivalent system out of 4090s or A100s/H100s).
Or just rent the GPU time as needed from cloud providers like RunPod, although that may or may not be what you're looking for.
https://www.reddit.com/r/LocalLLaMA/comments/18ituzh/mixtral...
This model is apparently surprisingly good at chat, even though it is a base model, and will take part it it to some extent. It should be really interesting once it's fine-tuned.
For example on EQbench[0], Miqu[1], a leaked continued pretrain based on LLama2, performs extremely similar to the mistral medium model their API offers.
Maybe they're thinking it'd be bad PR for them to release models they didn't create from scratch, or there is some contractual obligation preventing the release.
> Our mission is to make frontier AI ubiquitous, and to provide tailor-made AI to all the builders. This requires fierce independence, strong commitment to open, portable and customisable solutions, and an extreme focus on shipping the most advanced technology in limited time.
Edit: Ah, it's the wrong link. https://news.ycombinator.com/item?id=39986047
Thanks SushiHippie!
Edit: To add to this, I've had good luck getting solid output out of mixtral 8x7b at 3-bit, so that isn't small enough to completely kill the model's quality.
If these assumptions port over to 8x22B, then 8x22B has, at 281GB, sz_expert ≈ 13.8B.
I agreed for the first one, (46.3 - 7) / 7 = 5.61b.
The second one doesn't match up, (281 - 22) / 7 = 37b or (140.5 - 22) / 7 = 16.92b. Am I doing something wrong?
This is clearly an inferior model that they are willing to share for marketing purposes.
If it was an improvement over llama, sure, but it seems like just an ad for bad AI.
In fact I would go as far as saying llama2 isn’t that good compared to some of the most recent models.
I want to add Mistral support soon, probably via together.ai or a similar service.
https://twitter.com/MistralAILabs is their other Twitter account, which is very slightly more useful though still very low traffic.
It actually does what you tell it, and won't try to silently change your prompt to conform to a specific flavor of Californian hysterics, which is what OpenAI's products do.
Also, since it's a local model, your queries aren't being datamined nor can access to the service be revoked on a whim.
That person is a hero, super bummed!
magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%http://2Fopen.demonii.com%3A1337%2Fannounce&tr=http%3A%2F%http://2Ftracker.opentrackr.org%3A1337%2FannounceLacking a godly GPU, I will probably hold off for a quanitized version which has the potential to run okish on CPU or my modest GPU, but really appreciate the info.
https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#qu...
It is licensed under CC-BY-NC-4.0. That license means you are free to use, modify and redistribute it, so long as you aren't doing so "commercially". What exactly counts as "commercial" use is a complex legal question, and the answer may vary from jurisdiction to jurisdiction (different courts may interpret the phrase differently). But, for example, if you are just using it at home for private experimentation on your own personal time, with no plans to make money from doing so (whether now or in the future), I think pretty much everyone will agree that counts as "non-commercial".
Other cases – e.g., if a government agency uses the software to provide some government function, is that "non-commercial"? – are far less clear. Those are really the kind of questions you need to ask a lawyer (which I am not).
There have been a handful of court decisions on what "non-commercial" use means – the Creative Commons legal case database records [0] records three cases involving non-commercial CC licenses in the US, one in Belgium, one in Israel, plus I also know of one in Germany [1] which their database seems to be missing. I don't know if any of them addressed the contra proferentem rule which you mention.
The German and US cases on this topic appear contradictory – from what I understand, the German case assumed that all government use is commercial, interpreting "non-commercial" to basically mean "private home use", whereas two of the US cases (Great Minds v FedEx Office and Great Minds v Office Depot) were about use by commercial entities acting under contract to public school districts, and the holdings of those cases assume that government-operated schools are "non-commercial" (and furthermore, the commercial entities were engaging in "non-commercial" use, even though they were acting commercially, because they were doing so on behalf of a "non-commercial" customer). That said, all these cases have somewhat limited precedential value – the US cases are binding precedent in two federal judicial circuits (2nd and 9th) but have merely persuasive value in the remainder of the US; I don't know what the ultimate outcome of the German case was (Deutschlandradio said they were going to appeal but I don't know if they did and what the outcome was if they did), and German law doesn't view precedent as "binding" in quite the same sense that common law systems do anyway
[0] https://legaldb.creativecommons.org/cases/?keywords=&tags%5B...
[1] https://www.techdirt.com/2014/03/27/german-court-says-creati... and if you can read German, here is the actual court judgement: https://netzpolitik.org/wp-upload/OLG-K%C3%B6ln-CC-NC-Entsch...
Morally I feel 100% fine because the app would be just as appealing without it, and subscribing means you get sync, you could theoretically not pay me and use Command R
That's typically synonymous with commercial.
Nouse Hermes 2 Solar is the best model for Norwegian that I've tried so far. It's much better than NorskGPT Mistral/Llama. I actually got it to make fairly decent summaries of news articles, though it wouldn't follow any stricter commands like producing 5 keywords in a json list. Kept producing more than 5 keywords and if I doubled down on the restriction on the number of keywords it would start messing up the json.
The best competitor to GPT-4 was falcon 180b, it's still terrible compared to GPT-4. Mixtral is my new favourite though, it's faster than falcon and in general as good or better. Though I would still pick GPT-4 over Mixtral any day of the week, it's leagues ahead of Mixtral.
Tigerbot has a very interesting trait. It tends to disagree when you try to convince it that it's wrong.
I haven't been able to test out the new 8x22 mixtral or command r plus. These are the next ones on my list!
But, still, its going to need 262GB for weights + a variable amount based on context without quantization, and 66GB+ at 4-bit quantization.
Mixtral 7B @ 5 bit takes up over 30gb on my M3 Max. That's over 90 for this at the same quantization. Realistically you probably need a 128gb machine to run this with good results.
Can't reuse my old parts on a brand new Mac, or upgrade it later if I find I need more. Lock-in is rough.
https://www.ebay.com/itm/315029731825?itmmeta=01HV561YV4AJG5...
And on the Epyc System you can plug much more memory for when you need larger memory and PCI-E gpus, for when you need less faster memory.
Threadripper PRO is only 8-channel, but with memory overclocking it might reach numbers similar to those too.
12x32=384 GB of RAM seems to be about $1400 right now. Going for less capacity don't save that much, unlike the insanely marked up apple memory. And then you need the CPU heatsink for $130.
Have you tested it yourself?
It seems to reach only a little above half the theoretical speed, and scale only up to 32 threads for some reason. Might be a temporary software limitation or something more fundamental.
Do you have a source for the GPU only having access to half the bandwidth of the memory?