Revealing optimizations similar to these would pretty much reduce their competitive position.
I suspect their tune will change if they ever take the lead..
What's with all the China glazing about this stuff? They release some open-source work and people act like they are suddenly the beacon of freedom and transparency.
Hopefully the experts here can offer insight. The above is just my hunch and I’m not a specialist in this field.
It's more a cultural thing. Sharing progress is just in their blood.
Multi-head Latent Attention (MLA), Multi-Token prediction, MoE architecture are some of the most famous examples.
https://yipzap.com/anthropic-accuses-alibaba-of-largest-ai-d...
They don't have TPUs or access to the latest Vera Rubin GPUs either to get performance gains for free. All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level.
Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.
It's funny, because if you ran Claude Code on a slow terminal, the cause of the flicker was obvious: They kept dumping the entire history of the chat back into the terminal in a number of situations, and relied on the terminal to them end up in the correct state.
What became clear when DeepSeek came onto the scene was that China was seeking to commoditize LLMs. They consider it an issue of national security not to be beholden to US tech companies when it comes to AI. And I, for one, fully endorse this policy.
Another data point on this is the black market for Claude tokens in China [1]. The chat logs themselves are a commodity to train models.
I believe that OpenAI in particular is a bet on a trillion dollar pot of gold that doesn't exist. Google, Microsoft, Amazon and Meta will all be fine. Anthropic is in a far better position than OpenAI (IMHO) but if DeepSeek or some other Chinese open weight model gets as good at coding, they're in real trouble too.
anyone with IQ higher than 130 (thus qualified for actual AI R&D) would be questioning something obvious here -
if they are already doing such dodgy stuff with the aim to maximize profits, why would those resellers have large amount of logs with actual American model responses to sell to those AI labs in the first place. shouldn't they just post train & customize some leading Chinese open source models to pretend to be Opus or GPT for the vast majority of their users (as classified by some models) who don't know much about expected Opus behaviours & not skilled enough to tell the differences?
that is actually the interesting bit not covered in your censored version of the story line, it is also what happens on the ground. your censored version of the story implies that those dodgy resellers using stolen credit cards, pooling accounts with stolen IDs and illegally selling very personal logs would somehow be honest enough to spend extra $ to ensure their victims (aka paying users) can actually use real Opus and GPT. LOL
dude, you failed this IQ test miserably.
Flash: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash-DSpark
Pro: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
Excited to see if this makes it into DwarfStar for local inference, have been using the flash model extensively since the 2-bit quants were made available by antirez.
I second ccusage, it's nice
Guessing the timing isn't accidental. Demonstrated openness vs harsh regulation
> As with V4-Flash, we treat this point as an indication that DSpark sustains useful throughput under an interactivity target that the baseline cannot efficiently support. At matched system capacities, DSpark delivers 57% to 78% faster per-user generation.
Reminds me of the flawed solution in scaling servers in 2017 that use memory-intensive technologies by adding even more servers to solve the problem. (It just increases costs.)
Rather than doing that, think about which critical parts of your app can be written in a more performant technology.
Fast forward to 2026, now you can see who is just throwing more money at the problem to create even more problems where as DeepSeek is giving us optimized solutions.
I know exactly who I would pay attention to, and it is absolutely not Anthropic.
Can't sell their SOTA models, only slightly better than the open source models for the models they can sell, cost 20x to 50x for good models, a TAM that consists almost solely of developers, with no customer of theirs actually boasting increased profits as a result of AI...
I fear their time to IPO may have passed.
The state-of-the-art nanometer are impossible to achieve but if you have infinite solar energy during business hours does it really matter? Every company has a parking spot so this ASIC-like appliance could be as big as a shipping container.
If it could just run recent open models for a handful of users it would be such a nobrainer to buy.
So, despite hiring the cream of the crop of math graduates, who could read the papers of free academia, but whose own result the free world could not access - they fell behind.
I have a theory explaining why. I think it's because science is an interactive process. NSA cryptographers could read papers, but they couldn't talk openly with the authors of those papers, because of secrecy demands - even asking question might indicate what they were working on. You can easily imagine them spending months on something they could have avoided by going to the original authors and getting told "Oh, we tried that for a long time, it doesn't work".
Whether that theory is right or not, cryptography is a concrete example of a domain where public research with fewer resources beat private research with a lot more resources.
The American companies, from my impression don’t involve themselves with such lowly “hacks” because they have so much money to just push forward with doing everything on big heavy models that run on the most cutting edge nvidia chips that they can, the moment, kinda sorta get on demand (I say that in some degree of jest).
There is a meteor headed towards all this AI investment that I don't think has been properly accounted for and that is, what happens to all the existing hardware investments when NVidia's next architecture comes out. Blackwell (H100/H200) is the current generation. Rubin (R100, presumably R200) is the next and arrives soon. Now a lot of the investment hasn't been spent yet so will likely be spent on Rubin but at that point, what happens when the next iteration comes out and does 3-4x the compute for the same electricity input and same hardware cost?
Also, what happens when people can run way bigger models on consumer hardware in 5 years? The effective limit for useful local LLMs is currently ~31B parameter models because the RTX 5090 has 32GB of VRAM and Apple's shared memory architecture, which can keep bigger models in memory, just doesn't have the raw processing power.
Anyway, why I argue Anthropic is in a better position (than OpenAI) is that they seem to have captured a market that may well be profitable for them as a company, specifically Claude for coding. So they just haven't burnt quite as much cash as OpenAI so aren't in as deep of a hole.
While I think local models are going to improve maassively over the next few years, running them in a data center at scale is always going to be cheaper for a company. Why? Because they can amortize their costs by running 24/7 and powering them and cooling them is simply cheaper at scale when you're talking about 1000+ engineers who otherwise might only be using their hardware ~40 hours a week.
IMHO Google is in the best position here of all the US companies, even though their models aren't the best, because their data centers are ruthlessly efficient, their homegrown TPUs will eventually catch up (and thus avoid the NVidia tax) and they simply haven't bet the farm on winning AI.
More likely is that an AI generated codename is impossible to fix by humans, and SOTA was not able to figure it out until now.
https://www.oecd.org/en/data/dashboards/magic-database-indus...
It’s like with VPN providers. Is Mullvad actually collaborating with law enforcement? They very well could be. It is a calculated risk.
Is DeepInfra actually logging and training or selling the logs? They could be.
They have been raided multiple times, tons of audits, does bleeding edge research on privacy preserving tech, donates to GOS, etc etc. You don't see this kind of VPN company at all because none exists.
Wikipedia is altruistic, and serves humanity quite well.
Markets don't run on altruism.
US labs in Google, Meta and SpaceX are not leading, none of them managed to build something on par with GLM 5.2.
Care to explain to me why they still don't collaborate and still choose to do it in private?
I say this because we see the same thing used as an argument against China. "If they overtake us, they'll do imperialism (like us)." Again, it says more about us than them.
A better reading (IMHO) Of the situation is that China believes that AI shouldn't be used simply to mint a few more trillionaires but the benefits should be shared with society. Why do I say this? Because we now have 70+ years of China doing exactly that. The transformation in China all the way from rural villages to Tier 1 cities has been utterly astounding. China has lifted ~800M people out of extreme poverty.
In some ways we're at a similar point to the late 1990s and 2000s when Microsoft execs complained that Linux, being free, destroyed intellectual property value. Linux should be a perfect example of how people can and do act altruistically, or at least not in a way to bait-and-switch to enrich themselves.
[1]: https://www.reddit.com/r/AskHistory/comments/1d26grm/in_the_...
Meanwhile, Xi Jinping has published his 5th book on how governance in China works and what they're after. These are not books written for a western audience: they're compilations of speeches that he already gave to the Chinese party and state apparatus, so the contents are not sanitized for foreign audiences. But there are no English reviews of summaries of this 5th book at all by the usual China experts that distribute what western audience know about China.
This extends to beyond the government. Even though "for the people but only against the government" is an often-heard mantra, nobody seems to listen to what Chinese AI companies themselves say about why they publish open models. DeepSeek and GLM have said multiple times publicly what their motivations are, yet people on HN still speculate like they usually do.
Truly mind-boggling. I get that a lot of people don't like China. But setting aside the question of whether their dislike is justified, it would at least be rational to properly understand China, even if it's to defeat it. And listening to what China says themselves is absolutely essential for proper understanding. But people don't bother to? And they seem mostly happy with sticking to speculations that match preconceived notions, even if that hurts their chances of defeating China.
If you simply take what the Chinese government says at face value, you will be correct way more often than 95% of Western policy wonks, media talking heads, "analysts" and so forth. Because, like you say, they tell you everything they're doing.
In the recent US-China summit, Xi Jinping just came out and used the Thucydides Trap metaphor, which tells you everything about where China thinks it is and where it sees the US going, which is to become increasingly belligerent as their power declines. Now whether or not you agree with that assessment (I do agree), it still tells you China wants to avoid open hostilities, it sees itself as continuing to rise and it fears what a declining US might do.
But western politicians keep raising this metaphor. So at some point they're like "okay we'll speak your language". They then used this metaphor to make the point "our rise isn't the threat, your fear of it is. If you resist it you're walking right into the trap Thucydides warned about". So your conclusion is still right, they don't want open hostilities, a stable world is in their interest.
Then western media ran away with this and were like "OMG Xi mentioned the Thucydides Trap", completely ignoring his point.