Notes from the Mistral AI Now Summit in Paris(koenvangilst.nl) |
Notes from the Mistral AI Now Summit in Paris(koenvangilst.nl) |
But Mistral has fall really far behind since 2025Q3. It seems they can't get good reasoning models working at even medium context sizes, which is necessary to be at the table right now.
Gemma4 and Qwen3.6 are currently best in the small size; Mistral's "small" model has ~4x the parameter count at 120B and isn't even competing with models a quarter its size.
Back one year ago with Mistral Small 3.1 they were keeping up, but they've fallen into irrelevancy right now.
If Mistral seriously wants to play the on-prem and small task-specific model game, a decent proxy would be to build models that get the r/localLlama crowd excited
This is tangential: and forgive my ignorance here, but is there an inherent reason why there aren't smaller, focused models from the frontier model providers?
I'm thinking something like a software-specific subset of Opus that is the default for use in Claude Code. Smaller, cheaper to deploy and consume, maybe faster.
I am wondering what is keeping them back, though: Money? Compute? Skills? Training data? My fear is that you are really only getting really good models by training on very dubious data (outputs from the frontier models etc) and that Mistral is too European and too enterprisey to take those risks.
Fully agree to your point though, Mistral in general is far behind where I'd expect and Qwen in particular is crushing it at the smaller sizes.
Personally, I'd consider anything 20B params and above a "medium" model. Small being <20B and large >100B. I think obviously we can get to the huge 1-2T param models, but frankly the margin of accuracy improvement for the speed hit is kinda insane (1-2% for many metrics).
Foundation model labs should be building very large reasoning models, then leaving it to the community to distill them down.
You can't scale a small model up, but you can scale a small model down.
I'm convinced the only way we'll have a seat at the table in the future and avoid total runaway takeoff is if there are very large models within 80% of the capabilities of the frontier models. Tiny RTX models do diddly squat to remain competitive.
Build open weights models for running on H200s. I'll spin them up on RunPod or Lambda.
I have used Mistral models out of pure ideology for web agents and the like which aren't doing a lot of heavy lifting.
Mistral leaning into on-prem and European-hosted models is very smart.
Europe shot itself in the dick with this hastily implemented at the height of mass hysteria bullshit and now no sane company will build anything there. an AI startup in the US or China can be a boy and his computer. in Europe, the boy needs a dozen lawyers.
Mistral's sinking into irrelevancy despite the head start they had, the very promising early models they released, and the funding they receive, might very well be the consequence of trying to comply.
And yet another time they will be thinking aloud in few year "what happened that we are fully dependent on USA?"
Also interesting to note the number of partners they invited. Going from Microsoft, Accenture and EY to startups like alpic.ai or lingo.dev . Seems like they are ramping up their M&A game too
Maybe my perspective is skewed on what "huge scale" means, but 2 million users? That's like a few hundred megabytes of data? Or a couple GBs if there's a lot of per-user data?
OTOH such things can be quite defensible, they just rarely become anything like as profitable.
I really like the direction and the transparency of Mistral, among those players.
Who else will buy their AI?
and what other options do they have?
Or is this a case of the humans, now preparing for the excuse it was the AI failure?
"BNP Paribas Sentenced for Conspiring to Violate the Trading with the Enemy Act" - https://www.justice.gov/archives/opa/pr/bnp-paribas-sentence...
"BNP Paribas caught up in French money laundering investigation" - https://www.reuters.com/business/finance/bnp-paribas-caught-...
"BNP Paribas faces $246m fine in currency scandal" - https://www.bbc.com/news/business-40635070
"BNP Paribas caught in a Cypriot money laundering investigation" - https://www.lemonde.fr/en/les-decodeurs/article/2023/12/26/b...
In Money Laundering their track record is unmatched: https://violationtracker.goodjobsfirst.org/parent/bnp-pariba...
Assuming BNP Paribas leadership wants to stop the corruption of course.
What is "weird training biases" to us might not be weird to them and vice versa. Just ask the Chinese what they think about LGBTQ+, Japanese, pride parades, Islam and colored minorities.
Every nation has its own biases injected in its domestic LLMs at this point. Otherwise they risk getting in trouble for hate speech/disinformation in the jurisdiction where they operate.
Same how Google Maps cleverly biases the lines of disputed borders based on where you are viewing it from. Or how Google maps switched 'Gulf of Mexico' to 'Gulf of America' in an instant when the orange man signed the paper. Google won't want to anger the US administration the same way how Mistral won't want to anger France and the EU, so Mistral will have all the EU prime directives injected into its LLMs no matter if they're ludicrous or not. The law is the law whether you agree with it or not. Companies want to survive and will pander to whatever the whims the regime they live under are at the current moment regardless of what is right or wrong.
But if I'm using a LLM for personal projects or generating a photorealistic choreographed fight between Tom Cruise and Brad Pitt, I don't care what its political biases are, I care if it solves my problem better and cheaper than the competition, and here the Chinese models could end up winning the consumer market, which is why you see Mistral and other EU alternatives focusing exclusive on B-2-B corporate market.
I agree. That's why I think European companies might prefer a European model.
Mistral is mostly French and tends to have mostly French speaking customers, like BNP PAribas in Belgium. Germany will want its own domestic AI champions, maybe in partnership with Switzerland and Austria, similar to how Denmark already has invested in LLMs focused on the Nordic languages with money from Norway.
The biggest mistake is treating Europe like a single homogenous country/market.
I for one would love to see more country-specific models. There was a story here the other day about Norway’s National Library developing a LLM specialized in Norwegian: https://news.ycombinator.com/item?id=48270770