AI subscriptions are a ticking time bomb for enterprise(thestateofbrand.com) |
AI subscriptions are a ticking time bomb for enterprise(thestateofbrand.com) |
The software world is, by and large, no longer about making products with a focus on the long-term, whether that's about the customer's well being or even the company's own long-term functioning. It's about trapping people, siphoning their money, then running away after setting the building on fire. Founder McBuilder will throw away his entire userbase and tell them "lol idk good luck" about their usage needs if it means he can make an extra dollar.
This is as true for enterprise as it is for consumers. Look at all the lamenting when a liked name gets bought by venture capital or considers an IPO.
But also... is this shit AI written? I'm so tired of this.
Colour me skeptical on that one. Unless the AI improves a lot so it makes sense to spend more.
Who said it was?
> Pull out the napkin. This matters.
The article wouldn't exist if you didn't think it mattered, just tell us why.
> the question is not whether they got a good deal. The question is
Who said that was the question?
> This Is Not One Company's Problem
Who said it was?
Stop telling us what thing aren't, just speak like a normal human and convey your own thoughts. It's an insult to your audience to throw constant AI slop at them.
> thousands of companies have woven AI subscriptions deep into their operations. Marketing teams draft copy through ChatGPT Plus.
Yea I bet you do..
Did OpenAI switch from fixed prices per seat to usage based? This will surprise many companies I reckon.
Personally I use Claude Code, the 200 euro plan. And am a heavy user. A few weeks ago I realized that CC shows the token usage in cli, in the bottom right. Something I never cared about because I thought paying 200 euro a month will give me „unlimited“ access.
But I guess the party is slowly coming to an end? Prices are going to increase slowly? And the flatrates will be removed eventually?
Too bad, it was nice while it lasted.
I will continue to use it as an assistant that does the menial stuff quicker than I ever could, but it's just too early to let it do stuff that would hurt if it disappeared. Enjoy it while it lasts.
And some parts of most publicly traded ones.
If it’s not a bootstrapped company with a single offering, there’s a highly likely something there doing is at a loss in the name of growth (and even there, the loss might come in the form of deferred compensation)
Big difference between professional deployments and personal ones.
Not necessarily. Many factors go into what models are available at enterprise level. If you look around, not many companies (everywhere around the world) use DeepSeek models even though they are significantly cheaper.
Think what you want but even when hosted in the US, at the enterprise level going all in on that would be a legal and/or political death sentence.
We need better open source/cheap but high intelligence western models that are proven to work well in agent if tooling and have strong legal agreements for enterprise to even consider it.
There will be a repricing for sure as any ends of subsidies does but the world will not end
* People keep finding ways of cramming more intelligence into smaller models, meaning that a given hardware spec delivers more model capability over time. I remember not that long ago when cutting edge 70B parameter models could kinda-sorta-sometimes write code that worked. Versus today, when Qwen 27BA3B (1/23 of the active parameters!) is actually *fun* to vibe code with in a good harness. It’s not opus smart, but the point is you don’t need a trillion parameters to do useful things.
* Hardware will continue to improve and supply will catch up to demand, meaning that a dollar will deliver more hardware spec over time. Right now the industry is massively supply constrained, but I don’t see any reason that has to continue forever. Every vendor knows that memory quality and memory bandwidth and the new metrics of note, and I expect to start seeing products that reflect that in a few years.
I hope that one day we’ll look back on the current model of “accessing AI through provider APIs” the same way we now look back on “everyone connecting to the company mainframe.”
As the AI labs become more reliant on enterprise adoption, it makes sense to push capabilities at a cost that makes sense for businesses. Even if it prices out consumers or hobbyists.
Between: more efficient models - tuned for the task at hand, the ability to run those models in-house, or even at the edges, plus Google and Microsoft are well positioned to stay ambivalent as they’ve got lots of products to sell and whether or not LLMs are part of the portfolio mix is completely dependent on enterprise customer demand.
Anthropic/OpenAI have a number of aggressive downward pressures on their pricing.
Competitive pressure prevents a rug pull.
In a competitive race, each breakthrough gets copied or illicitly distilled or whatever. That means the frontier models are deprecating assets and the mark up tokens should get smaller and smaller.
Now bigger models are more expensive to run inference on, but today's models, or equivalent ability and size models, shouldn't go up in price.
5.5 is 4x the price, but 5.4 still exists, so its not rug pull, but a big more expensive to run and hopefully more valuable model.
What? Anthropic's costs aren't the API rate. The article never attempts to estimate that cost, which renders its thesis tautology.
1. GenAI companies are making a loss in order to gain adoption and later lock-in
2. ???
3. They're going to cash-in soon and start milking you now that business critical systems rely on GenAI
The "???" denotes a complete failure to offer compelling arguments that link 1 and 3.
https://github.blog/news-insights/company-news/github-copilo...
This is just simply economics. Make an early loss to get a customer base, offer addictive features at irresistible prices, then yank them into your profit-making pricing.
Although yes, the ??? in the parent comment could be replaced with "A/B test some price gouging or ratty pricing methods to see how far we can go without being noticed"
Meanwhile datacenters put out more pollution and use more electricity than all the plane rides Bill Gates took with Epstein combined, for business meetings of course.
There's plenty of sand on the planet and clever people (and AI) figuring out how to do more work with less sand and power, so any argument that AI is going to cost so much that it won't be usable, seems just preposterous.
We all know every frontier AI lab is heavily subsidizing usage, and so do all of the VCs & CEOs funding them.
Based on what? The RAM requirements alone are extraordinary.
No, running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.
I take it you haven’t actually run any of the current gen local models?
They all fit on fairly accessibility hardware, and their performance is at least on par with what I was paying for last year.
I have one of my agents running entirely from a local model running on a MBP and it has repeatedly shown it’s capable of non-trivial tasks.
Playing around with another, uncensored, local model on my 4090 desktop has me finally thinking about canceling my personal Anthropic subscription. Fully private, uncensored chat is a game changer.
For work it’s still all private models but largely because, at this stage, it’s worth paying a premium just to be sure you’re using the best and it saves the time of managing out own physical servers. But if we got news tomorrow that Anthropic and OpenAI were shutting down, a reasonable setup could be figured out pretty quickly.
They’re still pricey, the world is still scaling up memory production, and a lot of code isn’t yet built for AMD, but we went from the Wright’s brothers first airplane to jet engines in 27 years.
I’m not sure “it’s only a few years away” but we are sure moving there fast.
I must say that the largest dedicated hosted hardware providers now, like Amazon or Google, to a large extent do not produce the software they are offering as a hosted solution (like Linux, Postgres, Redis, Python, Node, etc). Similarly I'm not sure if the producers of the frontier models are going to keep their lead as the service providers for the most widely used models. They would need to have quite a bit of an edge above open-weights models.
Also, models are given very sensitive data to process. For large organizations, the shared dedicated hardware may look like a few (dozens of) racks in a datacenter, rented by a particular company and not shared with any other tenants.
At the same time, $100 a month is A LOT of RAM.
No one can deny that right now these new compact models are not as good as frontier models but for the first time we actually have competent local-first models. If I give you a local model that runs on your current hardware and performs at 75% of the ability of a frontier private paid model, would you still pay for frontier? More importantly, would you hand control of your processes and code to them knowing enshitifcation and price-hikes are always lurking nearby?
For businesses, I get it you want to compete. But personally, it's over. Even if I considered for a second paying OpenAI/Claude, not gonna happen now.
That is only true right now because hundreds of billions of dollars are being burned by these AI companies to try to win market share. If you paid what it actually cost, your comment would likely be very different.
They have to keep getting better to stay ahead of each other and open weight.
Which means it's the opposite of a timebomb, the article has it completely backwards, tokens at current level of reasoning will continue to get cheaper.
I'm not sure 'local' will be the end state, as hardware needs are high. But certainly competitive forces tend to push profit margins toward zero.
Extended discussion on this topic:
https://corecursive.com/the-pre-training-wall-and-the-treadm...
Boss is happy, very happy. We're rolling it out more widely now.
But this is the future.
I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.
The big question I'd be asking if I was investing in one of the big players is if those changes are "it can do 99% instead of 97% of the tasks a user will throw at it" (at which point going local and taking back cost control/ownership makes a lot of sense, especially for companies) OR "it will fully replace a human with better output"?
I already don't need Opus for a lot of my tasks and choose instead faster/cheaper ones.
The former is a company that's gonna be trying to sell mainframes against the PC. The latter is a company that is in potentially huge demand, assuming the replaced humans end up with other ways of getting money to still be able to buy stuff in the first place. ;)
But even if scaling plateaus for the frontier models, maybe distillation will improve to the point where smaller more manageable models can reach the same plateau. That would be great for local.
We are only 2-4 years away from consumer grade immutable-weight ASICs.
Unless there isn't some important breakthrough in hw production or in models architecture, it's quite the opposite: bigger, more expensive and more energy-intensive hw is needed today compared to 1 or 2 years ago.
Local models never reach the % utilization that cloud providers have (80%+), and they’re always going to be much better than local models for this reason.
It’s not unreasonable to suppose that in 2 years time an opus 5 quality model will be etched into silicon for high performance local inference. Then you just upgrade your model every 2-3 years by upgrading your hardware.
Eventually, we'll see. Frontier models still need some pretty serious hardware which will slowly come down in cost. Smaller models are becoming more capable, which will presumably continue to improve.
I think there's still a pretty big gap, though. Claude estimates Opus 4.6 and GLM-5 need about 1.5Ti VRAM. It puts gpt-5.5 around 3-6Ti of VRAM.
That's 8x Nvidia H200 @ ~$30k USD each. Still need some big efficiency improvements and big hardware cost reduction.
same with models.
It would cost me $300 in normal deepseek v4 pricing (non discounted) PER DAY, but I get it all for $500 worth of subscriptions.
And yet, less than 0.01% of the population (made up number, but I am more likely to be overestimating than underestimating) do so.
Running local models to do real work is likely to be another niche hobby.
Not even when that site calls itself "market" to create plausible deniality.
AI is the future operating system of every computer everywhere
Subscriptions can be equally profitable depending on the total actual tokens used. Few subscriptions use 100% of their potential, but these memes that subscriptions are unsustainable always seem to assume it to be so.
They maybe running at loss after all the salaries and stock comp, but tokens are in profit now.
Claude produces this kind of prose SO much. It's pretty annoying. I don't notice it happening on Gemini for the same prompts.
Punchy titles are also part of the marketing speak. Before Claude or ChatGPT, it would be a delicious read, understanding how they come up with the initial idea for an internal system. Since then, most of reads like "It's not just X, ..." every other paragraph, making it dull...
Nobody writes like this.
I'd like to get in front of a whiteboard with someone who knows economics and the token providers businesses well enough to answer my "explain to me like I'm five" questions. But I'll start with these in here:
Is my observation correct that for the token providers this is a margins game, while for the consumers this is a quality of service/product game? If the quality:margin lines will cross at some point on the x-axis, is the race is to reach this point before running out of money? If yes: What historical examples are there where the delta between these two is huge?
I'm guessing LLM's are unique in a sense, since there's really no limit to how good a consumer of the product expects it to get? (Compared to for example email which is much easier to scale in regards to compute.)
Also extreme noob at life question: Why would you want to IPO before having a sustainable business model? What's the upside?
Others have posted links here saying inference is about 30% of Anthropic's spend, the other 70% is R&D - things like developing the next model. If you take that into account, add a profit margin, and round to make the figures easy you end up with them selling tokens at $1 / M Token to make money.
Their API cost is currently $25 / M Token. There is no question that's profitable. Someone really pushing their $200/mo Max plan can use 10B Tokens per month, which works out at $0.02 / M Token, so they are eating a huge loss there. That is clearly going to go away at some stage. For the rest of us: look at your average monthly token usage. If you are using 100M Tokens a month with the Anthropic $100/mo plan, they are making money out of you.
You have to be working very long hours, and be really, really proficient at using AI to achieve 10B Tokens a month. The only way to be that proficient is to have been using it for a long time, years in fact, so it was useful to you long before Opus came along. You would be very disappointed to lose Opus, of course - but you are just the sort of person who can make a less capable model sing. It's not so difficult to see those users moving to an in-house-hosted, open-source model in a few years, and it will cost them what they are paying now - $0.02 / M Token.
All that means I'm not convinced by the gloom and doom vibe of the article. Things will change, but it won't mean the end of AI usage.
You can also do everything metered. There are multiple ways to buy.
Perhaps OpenRouter can be used as a benchmark for commodity cost to serve AI. I keep hearing it's better value than Claude, which suggests to me that either Anthropic is especially inefficient for some reason, or they're turning a profit on inference. They could be losing money on training, but I suspect that's just part of the cost of staying a leading lab. If any single one goes under due to debt etc. then companies can just switch?
Github Copilot has been doing this with business and enterprise seats, but that will be coming to a head very soon. I expect a fast follow after june when they re-align consumer pro and pro+ accounts.
OpenAi seems to be trying to throw tokens at clients to get lock in. So i'd be most worried about the rug pull that will come from open AI post IPO. Anthropic is already acting responsibly in this area and github copilot is attempting to remediate their insane subsidies in the next several months.
I was actually quite worried, because I've been using GHCP for large chunks of work, but the billing estimator they released shows I was only at about $150-200 a month in API priced tokens. Sure, that's a subsidy for my $20 subscription, but not insane.
Heavy use of agentic coding tools, in a responsible manner, probably lands somewhere around that $200/m mark at API pricing. Assuming that makes the provider money, I don't see that being hard to swallow for businesses employing developers in Western countries, given the hours it can save.
The real risk here is to personal project vibe coders. Building a huge app by abusing subsidized plans is ending.
1. Sell you X seats, expecting that internal network effects will get you to 3X soon enough.
2. Sell you X seats, noting that as soon you reach 2X seats, you need to move to the Enterprise plan, under which largely the same feature set plus a couple of gated SCIM controls will cost twice as much.
3. Sell you X seats for their core product, then announce that features B and C are actually an additional SKU that requires additional per-seat subscriptions that double the cost.
4. Sell you X seats for their core product, then immediately announce that you’re on a “legacy plan”, but they’re happy to move you now to their updated offering, which is the same product but now structured to cost twice as much.
1. Training is expensive. Not just compute but getting the data, researchers salaries etc 2. You have to keep producing new models to ensure people use your inference and there seems to be no end to this. So they have to pour more billions to keep the cycle going on 3. People salary and other admin cost are not that high compared to 1 and 2.
The article's point is that if you're relying on flat fee subscriptions, a rude awakening may be coming. That seems plausible to me. Issues around token quotas are a frequent topic on HN.
Nobody is going to charge "inference price" for model usage.
--You lose control over their "salary"
--You lose control over their "schedule"
--Your company becomes reliant on another party that does not share your interests or values, and can stop working for you on a whim for any reason
But AI is definitely good and trade unions are definitely bad, apparently...
That's the same as human workers. In both cases there are contracts/money to help align interests
Many companies use models deployed on Azure/Bedrock etc are already paying based on usage (often with discounts).
Remember that enthusiasts leaning on API keys and large enterprises are the exception, not the norm, and even some large customers may lean on subscriptions for at-scale adoption and wait for teams to report hitting usage caps before buying more token buckets. Subscriptions are predictable, reliable, and above all else a contractable way to acquire service.
Truth be told, this has been my red flag in orgs and with peers elsewhere for several years, now. Those orgs leaning on subscriptions are in for a nasty surprise within a year or two (like the author, I predict sooner than later), especially if those subscriptions power internal processes instead of AI buckets.
Hell, this is why I think there’s a sudden focus on the “Forward Deployed Engineer” nonsense role: helping organizations migrate from subscriptions to token buckets for processes so the bill shock doesn’t send them running away screaming.
Uber raised prices some, but mostly squeezed drivers harder. When Uber started, driving for Uber was a well paid job. It isn't, now. AI companies are mostly capital cost, so they don't have the oppression option.
Hardware price/performance may not improve much near term. Graphics GPU price/performance hasn't improved much in the last decade. DRAM prices have gone up. Fabs are all booked up. NVidia says not to expect better price/performance before 2030.
More efficient, specialized models are a strong possibility. Dumping all of human knowledge into a coding tool may be unnecessary. Although this would work a lot better if the LLM crowd figured out how to get a reliable "I don't know" answer out of a small model, then call on a bigger one for help.
Is there any good data on the true cost of inference for these companies?
If you increase the price, the value is still astronomical in comparison.
Companies need to find a way to leverage local models in tandem with frontier models to offset the costs.
It’s all about targeting specific workloads with the appropriate AI. These tools are not sentient beings they are tools that need to be properly configured to match the job at hand.
Those same companies are getting sweetheart deals with the frontier AI labs in the hope that infrastructure costs go down enough in the future to invert profitability, but it's still a risky position for them to be in. (Having their own infrastructure gives the bigcos huge leverage, even if it's only 80% as good as frontier.)
It's clearly llm-spew in its mannersims, making me wonder if there were any nuggets of wisdom in its core or if it in entirety is part of some llm-driven blog spam project?
The best course of action is to take advantage of subsidy for awhile, but not integrate is so deeply one can’t retreat. You’ll still have full productivity, just be cognizant of the reality of the situation.
Hopefully the market eventually collapses to where companies are hosting their own inference, and you simply lease a model package to run on your own (or rented ) specialty hardware.
So the frontier model companies might have crazy valuations and they might never reach that. But that might not mean they are actually unprofitable.
Yes, sure, right now it is ... but that's NOT how it got here.
There are trillions invested to recoup and at most billions in sales. It doesn't add up to tokens making a profit any time soon.
But if all the AI companies stopped training new models, they would all instantly become profitable (and stick around)
The thing that makes them unprofitable, is having to compete (which means training models). If / when enough companies exit the market, the cost to compete goes down and you end up in an equilibrium
But this isn't "a ticking time bomb for enterprise." It's an issue for the AI companies' investors.
Obviously I, like basically everyone else here, don't have access to Open AI or Anthropic books so it's just guessing based on public available evidences, but "tokens aren't being sold at a loss" does not imply there is any profit.
And, even if there is some profit, it needs to be big enough to at least pay back the capex spendings and finance the next model iteration.
It’s unlikely that Claude is proportionally that bigger and more expensive to serve so profit margins on inference must be pretty decent
How many times bigger could Opus be than GLM or Kimi, it’s certainly not proportional to the price
They might be sold at-compute-cost, but that of course ignores training, salaries, and everything else.
In the first few years after electric motors became a thing, one could have said the same thing. We would have just gone back to steam. If you tried to "do without them" now, society would collapse.
So the question is not if we can do without them now, it's if we can do without them in 5 to 10 years (or however long it takes for them to be fully integrated)
Just how "early stage" is that, and how much more integration does this "new technology" need to be?
Not true. Plenty go into the graveyard. At some point in time typewriters were everywhere. So were landline phones. Both were highly integrated into the system. They were replaced by much superior versions.
> In the first few years after electric motors became a thing, one could have said the same thing. We would have just gone back to steam. If you tried to "do without them" now, society would collapse.
Yes but there is nothing to state that the current version of LLMs is equivalent to electric motors. We could very well be in the typewriter/landline phones stage. You would need even more iterations to get something that is equivalent to electric motors.
Even electric motors themselves underwent multiple iterations to become economically viable. Lot of wasteful overhead needed to be eliminated and parts re-engineered to make it more efficient before it could be truly adopted.
“”” The subsidy era is not winding down gracefully. It is showing cracks everywhere. … the question is not whether they got a good deal. The question is how long that deal survives. … A developer running three or four concurrent coding agents is not consuming 3x or 4x the tokens of a chat conversation. It is an order of magnitude more … These are not experiments anymore. They are load-bearing workflows. … That is not a rounding error. That is a line item that needs its own budget code. “””
You're absolutely right to call that out. Let me restate that with the correct assumptions and that feedback, honestly and directly...
One can at least hope.
It's just "intellectual" botox.
Github Copilot moves to usage-based billing in two weeks.[1]
1. https://github.blog/news-insights/company-news/github-copilo...
It’s as jarring as getting halfway into a well written article, clicking a link to a source, and getting rickrolled.
It’s all you can do to not let it distract you from the fact that in 1998, The Undertaker threw Mankind off Hell In A Cell, and plummeted 16 ft through an announcer's table.
LLMs are just parroting relevant documents they've assimilated.
“Load-bearing” is a new one for me though, yuck.
Input: $30 / 1M tokens
Output: $60 / 1M tokens
GPT-5.5:
Input: $5 / 1M tokens
Output: $30 / 1M tokens
Costs have been reducing by over 5x year over year. Inference cost concern is mostly performative.
https://simianwords.bearblog.dev/conclusive-proofs-that-llm-...
Edit: can't reply but companies aren't selling inference at loss. In the blog post I point to third party hosting of open models like Deepseek which are also going down. They are not VC backed.
I also point to Gemma 31B which you can run on your laptop today that beats most models from 2024.
> it costs OpenAI less money to serve GPT-5.5 than GPT-4
> Ppl don't understand how much efficiency gains are being made
I guess "ppl" also don't understand then, with all the supposed "efficiency gains" and "tokens getting cheaper" how come MS GH Copilot is switching everyone to token-based billing? Must be because those tokens are so damn cheap, innit?
Previously they used "premium requests" which would allow you to make a request to one of the more expensive models. People abused the shit out of this because a request was disconnected from tokens.
You could make one request which used tens of dollars worth of tokens, obviously not the intended usage pattern and obviously unsustainable.
Tokens for a given intelligence level are becoming much cheaper very quickly, but everyone wants to use the smartest frontier models so tokens are not dirt cheap. Even frontier models are a bit cheaper in absolute terms than they previously were, and much cheaper in terms of intelligence.
> Open source models are 3-6 months behind.
On the benchmarks included in their training set yes, not in real life
Meanwhile there are layoffs everywhere, childcare costs keep rising, products shrinkflate.
Assertion assertion assertion wishful thinking assertion.
Show, don't tell. Show us that we're wrong and this isn't a VC black hole. The CEO of Enron as late as September 2001 could've called every critic a sad dark loser with nobody challenging him publicly. Jim Cramer famously yelled anyone pulling their money from Bear Sterns in 2008 was "silly, do not be silly" exactly 8 days before their collapse and a -92% stock drop. In COVID, calling everyone paranoid and sensationalist about some mythical new flu was popular in December 2019 and gone by March 2020. How about Uber, the seeming go-to for how VCs can turn a money hole into a profitable business? The average price increase is now 18% per year and still going up, with an over 60% increase in 5 years. Does anyone still talk about the "sad dark HN loser path" of those who doubted VR in 2018? How's your VR startup doing?
I hardly doubt there will be consumer grade HW to run it in 2 years either. And deep seek v4 pro is not even close to OAI or anthropic frontier models.
To run deepseek v4 class model, you would need to spend $120k just in gpus.
Already today is not possible to run deep seek v4 pro locally, and I cannot imagine that in 2 years we will be.
Inference compute is definitely scaling fast, but to scale RL, training and R&D compute also needs to scale hard. I don't think it's obvious that inference will overtake R&D/training, unless there's a reputable source that states that.
Having used it for > 4 years and having paid for it for > 2.5 years, I think I know full well how it's previous billing worked.
> You could make one request which used tens of dollars worth of tokens, obviously not the intended usage pattern and obviously unsustainable.
Gee, thanks Mr. Obvious! It never occurred to me this was the reason Microsoft recently removed Opus 4.6 and added a 15x multiplier in front of the inferior, but less token-intensive Opus 4.7!
We also have to assume that these operators are correctly pricing GPU depreciation, and the market is so new there is no reason to believe they are.
It would be more surprising if the surrounding architecture hasn't significantly diverged. If it _hasn't_ significantly diverged, then given the performance difference it would imply that the frontier models have significantly greater param counts, which would result in a higher cost.
I'm not sure that free tier will necessarily continue forever though, unless there is a way to monetize it (presumably by advertising, or by selling data they've gleaned about the user), or perhaps if there is no privacy and the provider is treating you as a source of free data. Right now we're still in the market-share grabbing "never mind the profits, count the users" stage.
I agree that the market for local AI is basically limited to nerds at this point, but that's because nobody's really explained why local AI is a good thing and also because the vast majority of people need the $20 paid plan at most. How much time and money would it take to get something half as good as OpenAIs products running locally?
There are a lot of good things that need to be explained to people, but nobody ever managed to. I don't think this will be any different.
> because the vast majority of people need the $20 paid plan at most
Exactly, people are not gonna invest time and money when there's already something else that satisfies their need.
Local AI will need to be both better and more convenient in order to be adoped by the masses.
However, if you can deliver 90% of the value of AI for 90% less cost, that is a really big incentive. Companies will spring up to fill that kind of gap.
Nobody can undercut the big AI players right now because they are all over-funded by VC money. Once the frontier companies try to match cost to expense, suddenly they become very, very vulnerable.
But within that big pie, the "IT-related" investments grew 15.7% whereas non-IT actually shrank 2.0%.
It's like selling dope, once they're addicted, a dealer could turn the screw on them
If things don't end up working out a lot of people have already been (and in the future will be) paid. It's the investors that will lose out, not the subscriber.
Steal from, you know, people who actually work.
Edit: ah I see the models mentioned in another comment of yours
I also have an agent using Kimi 2.6 as a backend (which is open, but not local) and for some coding tasks as well.
it’s highly unlikely OpenAI/Anthropic are not making decent amounts of money from inference.
Based on what? Why are we all whispering about how profitable all this is? It is the absolute last thing these firms would keep secret.Nobody is whispering about anything. Everyone is loudly assuming what's convenient for their thesis. Even if you have access to the books, the accounting isn't straightforward–there are yet insufficient data for a meaningful answer.
> It is the absolute last thing these firms would keep secret
If you find an optimisation strategy that you don't think your competitors have, you absolutely keep your margins secret for as long as possible. Knowing something is possible is the first step to making it so.
Eh, the AI companies still have lots of datacentres. For the guys who funded with equity, they could collapse down to just running those as utilities. (For the guys who funded with debt, they'd have to restructure.)
From the customer's perspective, this situation shouldn't result in a cost spike. (Consolidation, on the other hand, would. But that's a separate argument from the one the article attemptes to make.)
But if there's no more competition, there's no more incentive to keep prices low, which will also be reflected in pricing.
That will only happen when the bubble bursts and those companies will exit by going bankrupt
AI won't be "integrated" until something similar happens, and new businesses etc. are formed that take advantage of it in a way that can't simply be reversed to the old, pre-AI paradigm. I don't know what that will look like, but someone is going to figure it out and make successful companies with entirely new paradigms that are only made possible by AI.
At some point, every single factory was designed for electric motors, and going back became unthinkable.
-edit- also, the idea that a 5 year old tech that is still rapidly changing and developing deserves quotation marks around "new technology" is hilarious to me.
Based on the way Claude has felt the last few weeks, I'd say we're about 3-6 months away from full AGI. At that point we can start truly replacing white collar workers in earnest and begin deep integration.
> At that point we can start truly replacing white collar workers in earnest and begin deep integration.
This is why AI is so deeply unpopular. Even in the "good" scenario proselytized by true believers, you still paint a bleak near-future where everyone loses their jobs.
Even if they are “profitable” how many Uber drivers are “profitable” because they aren’t correctly calculating asset depreciation. Maybe these guys are doing the same thing.
Maybe it’s a lot of people who already had GPUs for crypto mining, and they’ve moved over to this, so that if they need to grow and buy new GPUs the costs would dramatically grow.
To an extent maybe, but that market is almost entirely commoditized already. Besides Cerebras and maybe Groq (which already charge a slight premium) all the other providers are more less interchangeable.
> Maybe it’s a lot of people who already had GPUs for crypto mining
I’m not sure the type of GPUs that were most popular for crypto are at all useful for LLMs?
If there’s a few providers subsidizing, that’s the price ceiling. Everyone who wants to compete has to subsidize.
Now if this market had been operating for years, I’d say that it’s likely all these companies are profitable or close to it. But the market is so new and there’s so much hype, I find it very plausible that none of these guys are making a profit and they all hope to just hang in until all the subsidies go away.
> I’m not sure the type of GPUs that were most popular for crypto are at all useful for LLMs?
There’s some overlap. I’ve definitely read about people repurposing.
Could be just ESL, it's hard to close the proficient to native gap.
Maybe it's different if you are doing technical/commercial writing, but for social media where you are writing for fun, and to express yourself, it'd be odd to let AI be your voice unless you realize your own writing is very poor.
A lot of people post for clout, so something that can skip the difficult process of becoming a good writer (and original thinker) is more than enough. They can churn out think pieces about any topic at an unlimited pace, basically.
It doesn’t add much to the world, but they get a lot of traction (which I cannot understand, given the quality of content.) And that’s what matters to them.
I think if you gave most people the choice between (a) being a thoughtful and original writer (b) being seen as a thoughtful and original writer, the vast majority choose (b). Especially when it is zero effort.
Now they write "competent" blog posts on LinkedIn that seem 100% AI slop. Some are employed at AWS, too.
I'm not a native English speaker as I'm sure my writing shows. My point is that I'd rather read genuine posts full of grammar errors instead of slop.
It's a measure of a very thin sort of "value/$" that excludes a lot of other things that could be of value to a business, like control, predictability, and availability.
Thin clients have been going away for a long time. The trend has been to continue to push higher levels of compute into ever-smaller and ever-more-portable devices.
But in many cases self hosted or dedicated boxes are cheaper than cloud.
What happened two months ago?
And once you can make a certain threshold of commit, the seats are free and all you pay is usage. They are definitely making a margin on the usage.
In other words, this article is, I wouldn’t say misinformed, but definitely underinformed.
Is an example startup in this area claiming 16k tok/s on an asic for llama 8b. Qwen has a 27b model at opus 4.5 quality.
And how many tokens would that buy?
Nitpick: more like 36 years, from Wright Flyer in 1903 to Heinkel 178 in 1939. Still quite impressive.
Cynically: it’s become an executive-level gpu measuring contest. If you’re not making huge commitments on data centers, you can’t be a serious player.
Realistically: It’s a mix of the two. The recent Claude caps for agentic usage suggest that demand exceeded their immediate compute supply. That they can alleviate it with additional capacity from the existing and small-ish xAI facility suggests that either demand may not be rising quite as fast as anticipated, that they’re okay in the short term until more capacity comes online, or a mix of both.
Open questions:
1. At what price point does demand fall, and are the frontier providers overall profitable before that price point?
2. At what price/performance point do on-prem local models make more sense than cloud models?
The print shop can’t replicate the practicality of local printing and I can’t replicate their scale of investment. Both coexist perfectly.
You might be interested in the tiny tape out project, which guides you through the process of getting your own design etched on silicon. If you only need larger features and not the next gen single digit nanometer stuff, you may not be so supply constrained.
The issue is the very huge amount of DRAM and high bandwidth these model require.
Also, how many companies will just buy an M6/M7 MacBook Pro with 32GB+ of RAM in a couple of years and get “free” AI along with the workstation they were going to buy anyway?
They'll be controlling lights and temperature, they'll be adding calendar reminders that show up on your phone and your fridge. Your phone and devices might sync pictures and videos there instead of the large cloud providers. They'll also be a media server, able to stream and multiplex whatever content you want through the home. They'll also be a VPN endpoint, likely your home router, maybe also a wifi access point.
I think this makes quite a bit of sense. I don't think they'll be ubiquitous, but they could be.
This distributes the power demand where local solar generation can supplement , gives the home user a lot of control, and claims overship of the user data from big tech.
Maybe I'm imagining things but this is what I think is coming.
It's the lmm/data heart of the home. A useful digital tool.
We'll have this massive machine to do "home automation", something that by all rights should be possible with less computing than is deployed in smartwatches today. Yuck...
I use a local LLM with it, but you can use a hosted LLM if you like.
The core home automation stuff can run on a potato. The LLM just writes new automations when I ask it, or acts as a natural language interface.
I use a pretty small 4B parameter local LLM, on a fairly modest mini PC. It doesn't take a frontier model to do that kind of work.
Digital sovereignty laws may mandate/remove access to LLMs of other countries on economic and national security grounds.
We will only know the actually situation once Anthropic goes public and we can look at their books.
GPT-4.1 Input: $2.00 / 1M Tokens Output: $8.00 / 1M Tokens
Pricing has no correlation with profit. It can be artificially lowered to kill competition, and artificially inflated to maximize profit.
That blog post is not very compelling either. Without knowing details of the architecture, comparing the various frontier models to open models doesn’t make sense.
Why do you need to know the architecture? Just compare Deepseek V4's performance with GPT 4 and treat internals as a blackbox. Deepseek is much cheaper and way more performant. If you can agree to reasonable assumptions
1. that closed source models are more efficient than open source
2. Deepseek is served at a profit and not a loss
Then it is pretty clear that the prices have gone down. If the prices have gone down more than 20x-30x then surely it is not _still_ subsidised is it?
I think this amount of skepticism is not warranted here. Every reasonable explanation or proxy is met with "but you don't know what they really do" is naive.
It is borderline conspiratorial to believe it this way.
Not a reasonable assumption for a variety of reasons.
> 2. Deepseek is served at a profit and not a loss
Not a reasonable assumption either.
> Why do you need to know the architecture? Just compare Deepseek V4's performance with GPT 4 and treat internals as a blackbox.
Because the internals are what actually matter and what drives inference cost.
It would be entirely reasonable to expect that GPT-5.5 has some sort of optimizations or changes to the architecture to make it easier to train, or to make runtime ablation easier, or to better handle large batches, or whatever.
Those changes, particularly if they are non-public, can easily result in worse inference performance than a comparably sized model without those changes.
> It is borderline conspiratorial to believe it this way.
It's not any sort of conspiracy. It's how land-grab tech companies have always worked. To presume otherwise is silly.
That's just one piece of the puzzle. If you're running the LLM there's no reason your family's mobile devices couldn't use said home LLM box to save battery life on their devices while maintaining control of their data, searches, photos, files, etc.
I wish this was true but it is not. And I am working on open source models so if anything, I would have a bias towards agreeing with you.
Frontier closed models (GPT/Claude) are gaining distance to everybody else. Even Google, once the king.
Your claim is a meme coming from benchmark results and sadly a lot of models are benchmaxxed. Llama 4, and most notably the Grok 3 drama with a lot of layoffs. And Chinese big tech... well they have some cultural issues.
"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"
https://xcancel.com/N8Programs/status/2044408755790508113
---
But thank god at least we have DeepSeek. They keep releasing good models in spite of being so seriously resource constrained. Punching well above their weight. But they are not just 6 months behind, either.
Gemma 4 was a major improvement is self-hostable local models and Qwen-3.6-A34B is a beast, and runs great on an MBP (and insanely well on a 4090).
The biggest lift is combining these models with a good agent harness (personally prefer Hermes agent). But I’ve found in practice they’re really not benchmaxxing. I’ve had these agents successfully hand a few non-trivial research projects that I wouldn’t have been able to accomplish as successfully even last year.
When you add in the open-but-not local models, Kimi, GLM, Minimax, you have a lot of very nice options. For personal use anything I don’t use local models for I give to my Kimi 2.6 powered agent.
Over-promising is a very stupid thing. Nobody will value the intermediate steps. Nobody will value all the effort because they will always compare us with frontier models made with billions and we will become a running joke. So please stop.
[0] US AI firms team up in bid to counter Chinese 'distillation' (Apr 7) https://finance.yahoo.com/sectors/technology/articles/us-ai-...
I've got a 128GB strix halo staying warm at home, it has nothing on top models with big budget. It's good supplement to low end plans for offloading grunt work / initial triage
Thanks for suggestion tho, tool by antirez is always going to pique interest, I'll check it out when I'm finally home again
Tho says Metal / CUDA, so doesn't seem friendly to Linux AMD system
At what tps? You can run the new gemini flash or 5.3 codex spark at 1000+tps and run circles "open" models. You can't run anything useable locally without at the very least a blackwell 6000 if not two
Sure you can run qwen 3.6 at 20tps on a mac 128gb but let's not pretend this will get you anywhere
Case in point: North Korea, with far, far fewer resources.