AI subscriptions are a ticking time bomb for enterprise

AI subscriptions are a ticking time bomb for enterprise(thestateofbrand.com)

387 points by mooreds 18 hours ago | 380 comments

evo_9 15 hours ago |

Every AI subscription is a ticking time bomb for the frontier provider; within a few years we will be running local models as good as today’s frontier models with almost no cost burden. The floor will fall out of the enterprise market for all the frontier companies.

crazygringo 15 hours ago | |

> within a few years we will be running local models as good as today’s frontier models with almost no cost burden

Based on what? The RAM requirements alone are extraordinary.

No, running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.

crystal_revenge 12 hours ago | | |

> Based on what?

I take it you haven’t actually run any of the current gen local models?

They all fit on fairly accessibility hardware, and their performance is at least on par with what I was paying for last year.

I have one of my agents running entirely from a local model running on a MBP and it has repeatedly shown it’s capable of non-trivial tasks.

Playing around with another, uncensored, local model on my 4090 desktop has me finally thinking about canceling my personal Anthropic subscription. Fully private, uncensored chat is a game changer.

For work it’s still all private models but largely because, at this stage, it’s worth paying a premium just to be sure you’re using the best and it saves the time of managing out own physical servers. But if we got news tomorrow that Anthropic and OpenAI were shutting down, a reasonable setup could be figured out pretty quickly.

alsetmusic 15 hours ago | | |

Local modals are 6 months to 18 months behind frontier. Even if the performance of a cloud model is faster, it's clear that local is catching up.

harrall 13 hours ago | | |

You can now buy 128 GB unified memory computers from AMD as commodity.

They’re still pricey, the world is still scaling up memory production, and a lot of code isn’t yet built for AMD, but we went from the Wright’s brothers first airplane to jet engines in 27 years.

I’m not sure “it’s only a few years away” but we are sure moving there fast.

nine_k 13 hours ago | | |

> shared, dedicated hosted hardware at full utilization

I must say that the largest dedicated hosted hardware providers now, like Amazon or Google, to a large extent do not produce the software they are offering as a hosted solution (like Linux, Postgres, Redis, Python, Node, etc). Similarly I'm not sure if the producers of the frontier models are going to keep their lead as the service providers for the most widely used models. They would need to have quite a bit of an edge above open-weights models.

Also, models are given very sensitive data to process. For large organizations, the shared dedicated hardware may look like a few (dozens of) racks in a datacenter, rented by a particular company and not shared with any other tenants.

dandellion 12 hours ago | | |

> The RAM requirements alone are extraordinary.

At the same time, $100 a month is A LOT of RAM.

simooooo 12 hours ago | | |

Qwen 3.6 is virtually indistinguishable from Claude on my 5090

SonnyTark 5 hours ago | | |

I run Qwen3.6-35B-A3B on my 8GB VRAM GPU for 3 weeks now and its been blowing my mind how good it is (coded multiple tools that I use daily, setup CI/build scripts for several projects, meaningfully contributed to a large personal project, etc).

No one can deny that right now these new compact models are not as good as frontier models but for the first time we actually have competent local-first models. If I give you a local model that runs on your current hardware and performs at 75% of the ability of a frontier private paid model, would you still pay for frontier? More importantly, would you hand control of your processes and code to them knowing enshitifcation and price-hikes are always lurking nearby?

For businesses, I get it you want to compete. But personally, it's over. Even if I considered for a second paying OpenAI/Claude, not gonna happen now.

iwontberude 14 hours ago | | |

I strongly disagree. Humans are so insanely well incentivized here with trillions in market share to make localized AI good enough and that’s the only benchmark they need.

leptons 14 hours ago | | |

>running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.

That is only true right now because hundreds of billions of dollars are being burned by these AI companies to try to win market share. If you paid what it actually cost, your comment would likely be very different.

claysmithr 11 hours ago | | |

Not really, I can run models on my 24GB mac.

adamgordonbell 15 hours ago | |

Or put another way, the frontier models are very quickly deprecating assets, because of the competition in the market.

They have to keep getting better to stay ahead of each other and open weight.

Which means it's the opposite of a timebomb, the article has it completely backwards, tokens at current level of reasoning will continue to get cheaper.

I'm not sure 'local' will be the end state, as hardware needs are high. But certainly competitive forces tend to push profit margins toward zero.

Extended discussion on this topic:

https://corecursive.com/the-pre-training-wall-and-the-treadm...

airstrike 15 hours ago | | |

Well, it's a timebomb for the companies who get paid per token, so the parent is right and TFA is probably wrong

intothemild 14 hours ago | |

I've spent the last month bringing in a small demo of what the future could be like, running Qwen, Gemma, and Deepseek, behind LiteLLM so we can monitor token usage, and instead of some dumb ass "tokenmaxxing" we're actively trying to get the cost of inference both down, and in-house.

Boss is happy, very happy. We're rolling it out more widely now.

But this is the future.

slashdave 14 hours ago | |

> within a few years we will be running local models as good as today’s frontier models

I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.

majormajor 14 hours ago | | |

> I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.

The big question I'd be asking if I was investing in one of the big players is if those changes are "it can do 99% instead of 97% of the tasks a user will throw at it" (at which point going local and taking back cost control/ownership makes a lot of sense, especially for companies) OR "it will fully replace a human with better output"?

I already don't need Opus for a lot of my tasks and choose instead faster/cheaper ones.

The former is a company that's gonna be trying to sell mainframes against the PC. The latter is a company that is in potentially huge demand, assuming the replaced humans end up with other ways of getting money to still be able to buy stuff in the first place. ;)

comfysocks 13 hours ago | | |

> I seriously doubt it. Scaling is already strained (don't buy into the "exponential" hype). And, in any case, the competition will be against the frontier models that will exist in two years.

But even if scaling plateaus for the frontier models, maybe distillation will improve to the point where smaller more manageable models can reach the same plateau. That would be great for local.

christopherwxyz 14 hours ago | | |

I would readjust your convictions.

We are only 2-4 years away from consumer grade immutable-weight ASICs.

Ferret7446 35 minutes ago | |

That's why cloud died out and everyone is running their own servers right?

vb-8448 15 hours ago | |

> within a few years we will be running local models as good as today’s frontier

Unless there isn't some important breakthrough in hw production or in models architecture, it's quite the opposite: bigger, more expensive and more energy-intensive hw is needed today compared to 1 or 2 years ago.

evgen 14 hours ago | | |

I can run qwen3.6-27b on a four year-old Macbook Pro that dominates ChatGPT-4o (the frontier model from 2 years ago) and is competetitve against early ChatGPT-5 versions. We are also getting a lot smarter about using and deploying these local models. Your entire AI stack from two years ago would be absolutely crushed by a todays local LLM models and a high-end local inference system when combined with a good modern coding agent.

chermi 14 hours ago | | |

Per frontier token. You're not calculating the cost of a fixed quality asset here. Old hw running non-frontier models will be very valuable. In fact, we have two direct examples: older server gpus actually appreciating and the very obvious fact that not everyone always use MAX FULL EFFORT BEST MODEL no matter what.

ls612 14 hours ago | | |

As good as today’s frontier. Gemma 4 today is roughly equivalent to the frontier a year and a half ago at gpt 4o tier.

stingraycharles 14 hours ago | |

The economics of local AI just doesn’t make sense. A model like Opus is - supposedly - something like 5T parameters, which is likely something like 3TB of GPU memory.

Local models never reach the % utilization that cloud providers have (80%+), and they’re always going to be much better than local models for this reason.

lumost 14 hours ago | | |

Capex, opex, quality, and volume are tricky things to balance. On balance, pc/mobile are cheaper to operate than equivalent cloud and on prem deployments.

It’s not unreasonable to suppose that in 2 years time an opus 5 quality model will be etched into silicon for high performance local inference. Then you just upgrade your model every 2-3 years by upgrading your hardware.

majormajor 14 hours ago | | |

Running local applications is less efficient than thin clients to the cloud generally, not just in LLMs. The trick is that you can get to the point where it's effective enough, and affordable enough, that the control and availability factors become dominant.

nijave 15 hours ago | |

>within a few years

Eventually, we'll see. Frontier models still need some pretty serious hardware which will slowly come down in cost. Smaller models are becoming more capable, which will presumably continue to improve.

I think there's still a pretty big gap, though. Claude estimates Opus 4.6 and GLM-5 need about 1.5Ti VRAM. It puts gpt-5.5 around 3-6Ti of VRAM.

That's 8x Nvidia H200 @ ~$30k USD each. Still need some big efficiency improvements and big hardware cost reduction.

snovv_crash 12 hours ago | | |

Qwen 3.6 27b is somewhere around Opus 4. It runs on a 5090, a $2k desktop GPU, at reasonable speeds.

throw1234567891 15 hours ago | | |

Or a single mlx cluster if one can find second hand machines somewhere. Difficult to get your hands on today, certainly, but not impossible.

czep 9 hours ago | |

The economic question is whether the average company will have the time or talent to roll their own models instead of eating the cost increases. The firms in question are exactly the same that have already decimated their teams. Can they so quickly pivot to self-hosted models if their AI workloads suddenly cost them 10x more? I bet most will simply start shoveling themselves deeper.

planb 14 hours ago | |

If that’s true, then it will be even cheaper to provide them as a subscription. Following your logic, every company would be running their own data centers instead of using cloud providers.

adrithmetiqa 14 hours ago | |

I disagree. No one will want to use second rate models when the frontier models reach a specific level of capability. Enterprise will keep paying.

upcoming-sesame 10 hours ago | | |

not every company can pay for the best engineers in the market, some can only afford to pay for cheaper engineers and it's fine

same with models.

malfist 14 hours ago | | |

No one? When free means I get 95% of the capabilities of something very very expensive, you bet your bottom dollar many many people will choose free.

aleqs 13 hours ago | |

Hard agree - the benefits of local/self-hosted models are not just hardware/cost (it might be more expensive at the moment), but what you get in exchange is unnerfed/unstupified models, full cost/usage transparency, optimized/specialized models, privacy/security, etc.

jmount 13 hours ago | |

I think this is a good under-represented point. Again and again things that could only run on a mainframe get ported to the personal device level. However it looks like the campaign to eliminate the PC (by pre-buying all RAM) is the counter-stroke.

wolttam 15 hours ago | |

There's still going to be plenty of use-case and demand for frontier models running across hundreds or thousands of GPUs. It's just not going to be in the current shape - certainly not accessed by the general public for rote business tasks.

YesBox 15 hours ago | |

You'd have a point if Cloud ^tm didnt take off into a multi billion dollar industry.

voxleone 9 hours ago | |

I can only hope that you'll be right someday. As of now, an RTX 3090 struggles to run most of the good local models.

WarmWash 12 hours ago | |

Linux in year 2000 vibes...still waiting to get off windows 26 years later

himata4113 13 hours ago | |

This is wrong because local models are very expensive, just as expensive as the frontier.

It would cost me $300 in normal deepseek v4 pricing (non discounted) PER DAY, but I get it all for $500 worth of subscriptions.

nozzlegear 12 hours ago | | |

Why are you paying $300/day to run a local model? The whole point is that you run them on a machine you already own.

tsycho 7 hours ago | |

A lot of home computers are capable (with a large margin) to run a large amount of self-hosted services (eg: jellyfin, immich, minecraft, plex, karakeep, ... whatever people want to use).

And yet, less than 0.01% of the population (made up number, but I am more likely to be overestimating than underestimating) do so.

Running local models to do real work is likely to be another niche hobby.

otterley 14 hours ago | |

People who are this certain of their predictions should be forced to put real money on them on Kalshi or Polymarket instead of drive-by blowharding on HN.

watwut 13 hours ago | | |

Meh, having opinioms should imply necessity to gamble on gambling site.

Not even when that site calls itself "market" to create plausible deniality.

whackernews 13 hours ago | | |

Oooh. You’re hard.

claysmithr 11 hours ago | |

I agree. The AI bubble is going to pop, people will move to local models, and the datacenters will be abandoned

throaway198234 6 hours ago | |

^^^^^^^^^^^

AI is the future operating system of every computer everywhere

aryehof 35 minutes ago |

Thie assumption here is that the profitable price (variable costs) is only the api price. I question that.

Subscriptions can be equally profitable depending on the total actual tokens used. Few subscriptions use 100% of their potential, but these memes that subscriptions are unsustainable always seem to assume it to be so.

returnInfinity 17 hours ago |

Brad Gerstner confirmed that tokens aren't being sold at a loss. Whatever the formula, API + Subscription split, the companies are making a profit on net token sale.

They maybe running at loss after all the salaries and stock comp, but tokens are in profit now.

pvtmert 11 hours ago |

Although I agree with the sentiment in the article, it smells very LLM~y. Especially the sections and punchlines. Such as: `That is not a rounding error. That is a line item that needs its own budget code.`

okdood64 11 hours ago | |

> That is not a rounding error. That is a line item that needs its own budget code.

Claude produces this kind of prose SO much. It's pretty annoying. I don't notice it happening on Gemini for the same prompts.

pvtmert 11 hours ago | | |

I was working at Amazon until recently. Number of internal documents (PRFAQs, 1-pagers, etc) having these sort of proses boomed since 2024.

Punchy titles are also part of the marketing speak. Before Claude or ChatGPT, it would be a delicious read, understanding how they come up with the initial idea for an internal system. Since then, most of reads like "It's not just X, ..." every other paragraph, making it dull...

metadat 10 hours ago | | |

It's called "constrastive negation", and both GPT and Claude tend towards it.

itsmek 3 hours ago | |

>The Math Your Finance Team Has Not Done >Pull out the napkin. This matters.

Nobody writes like this.

lefty2 9 hours ago | |

Who is it written by anyway? The author's name is not given in the article

ashton314 9 hours ago | |

Pangram flags this as 100% AI generated with high confidence.

IshKebab 8 hours ago | | |

We've established that Pangram is bullshit. But it definitely has some strong LLM vibes.

ben8bit 14 hours ago |

The entire problem with "AI" is that it's easy to do without. The AI companies know it, the users know it - even the most pro AI agent manager knows it. Thought experiment: remove AI from the world right now, all of it - what do you have? Business as usual. This article doesn't do enough to underscore that - dreaded be the day I need to get an actual engineer to review a PR, right?

rakel_rakel 10 hours ago |

The hyperbolic nature of the articles in both AI camps is very exhausting to me.

I'd like to get in front of a whiteboard with someone who knows economics and the token providers businesses well enough to answer my "explain to me like I'm five" questions. But I'll start with these in here:

Is my observation correct that for the token providers this is a margins game, while for the consumers this is a quality of service/product game? If the quality:margin lines will cross at some point on the x-axis, is the race is to reach this point before running out of money? If yes: What historical examples are there where the delta between these two is huge?

I'm guessing LLM's are unique in a sense, since there's really no limit to how good a consumer of the product expects it to get? (Compared to for example email which is much easier to scale in regards to compute.)

Also extreme noob at life question: Why would you want to IPO before having a sustainable business model? What's the upside?

rstuart4133 7 hours ago | |

You can ask an AI what it costs to produce an Opus token. The answer I got was $0.216/M Token. That takes into account hardware cost, power, cooling and hosting.

Others have posted links here saying inference is about 30% of Anthropic's spend, the other 70% is R&D - things like developing the next model. If you take that into account, add a profit margin, and round to make the figures easy you end up with them selling tokens at $1 / M Token to make money.

Their API cost is currently $25 / M Token. There is no question that's profitable. Someone really pushing their $200/mo Max plan can use 10B Tokens per month, which works out at $0.02 / M Token, so they are eating a huge loss there. That is clearly going to go away at some stage. For the rest of us: look at your average monthly token usage. If you are using 100M Tokens a month with the Anthropic $100/mo plan, they are making money out of you.

You have to be working very long hours, and be really, really proficient at using AI to achieve 10B Tokens a month. The only way to be that proficient is to have been using it for a long time, years in fact, so it was useful to you long before Opus came along. You would be very disappointed to lose Opus, of course - but you are just the sort of person who can make a less capable model sing. It's not so difficult to see those users moving to an in-house-hosted, open-source model in a few years, and it will cost them what they are paying now - $0.02 / M Token.

All that means I'm not convinced by the gloom and doom vibe of the article. Things will change, but it won't mean the end of AI usage.

rvcdbn 17 hours ago |

Article is mistaken these subs are not available to businesses. Companies are paying much closer to API prices. The strategy is to get you accustomed to infinite tokens on your personal sub and bet that behavior transfers to work.

1123581321 16 hours ago | |

They are available. Seats for team or enterprise plans cost more than the retail prices, but they are fixed prices with resetting usage limits. You can assign seats to members that are the equivalent of $20/$100/$200/mo plans.

You can also do everything metered. There are multiple ways to buy.

pureliquidhw 16 hours ago | | |

Who is selling these with enterprise trappings? What you're describing evaporated 2+ months ago. Everything is metered for enterprise users now. If there happens to be a stray vendor offering this I'd wager 2 things. 1) it's about to be phased out. 2) model limits will be in place so even that $200 plan won't go very far.

Bnjoroge 15 hours ago | |

Subs are absolutely available to businesses. There’s metered plans for ghe equivalent consumer plan.

photon_collider 16 hours ago | |

Yeah, I was confused about why it was talking about subscriptions for enterprise. The company I work at is billed on API usage.

plombe 16 hours ago | |

Looks more like AI slop with paragraphs like these; > The pattern is identical across the board. Price for adoption, not for economics. Lock organizations in. Make AI a load-bearing part of every team's daily workflow. Worry about the bill later.

imsofuture 16 hours ago | |

Not only that, but the API rate amounts being pearl clutched over in the article are still relatively trivial. 10k a month is not nothing, but when 10k a month enables a team of ~10-20 engineers, that's pretty good leverage.

fwipsy 16 hours ago |

Disclaimer: didn't finish tfa, so obviously AI even I could tell.

Perhaps OpenRouter can be used as a benchmark for commodity cost to serve AI. I keep hearing it's better value than Claude, which suggests to me that either Anthropic is especially inefficient for some reason, or they're turning a profit on inference. They could be losing money on training, but I suspect that's just part of the cost of staying a leading lab. If any single one goes under due to debt etc. then companies can just switch?

Sharlin 17 hours ago |

I think I'm going to puke if I see one more "It's not X. It's Y." phrase or the word "load-bearing" used metaphorically.

leemoore 14 hours ago |

Enterprise customers aren't running 20 bucks a month for claude pro subscriptions. My company provides developers about 1k worth of usage limits a month and best I can tell they get maybe a 30% savings off of API cost tops. That's not an insane subsidy. Many other jobs titles are only allowed 50 a month and those folks are constantly running out.

Github Copilot has been doing this with business and enterprise seats, but that will be coming to a head very soon. I expect a fast follow after june when they re-align consumer pro and pro+ accounts.

OpenAi seems to be trying to throw tokens at clients to get lock in. So i'd be most worried about the rug pull that will come from open AI post IPO. Anthropic is already acting responsibly in this area and github copilot is attempting to remediate their insane subsidies in the next several months.

briHass 12 hours ago | |

GitHub Copilot was the only one with absolutely insane subsidies, where they metered by 'request' instead of tokens. A request that costs 3 cents could end up burning $20 worth of tokens or more. That ends this month.

I was actually quite worried, because I've been using GHCP for large chunks of work, but the billing estimator they released shows I was only at about $150-200 a month in API priced tokens. Sure, that's a subsidy for my $20 subscription, but not insane.

Heavy use of agentic coding tools, in a responsible manner, probably lands somewhere around that $200/m mark at API pricing. Assuming that makes the provider money, I don't see that being hard to swallow for businesses employing developers in Western countries, given the hours it can save.

The real risk here is to personal project vibe coders. Building a huge app by abusing subsidized plans is ending.

yalogin 14 hours ago |

If it’s replacing developers it makes sense to cost more than 20 or 100 per month. The real issue for these llm companies is that they are yet to show value in other areas. Without that they will be relegated to just coding. That is the rush right now for them. What other workflows can they automate. I guess every paperwork can be automated. Once the other areas are developed they will switch the pricing model

niekkamer 13 hours ago | |

They are kinda trying to replace figma. That could be valuable.

new_account_100 13 hours ago | | |

IMO the LLM technology is so poor when it comes to converting text descriptions to visual layout that I can't imagine it could possibly succeed as part of a paid design product.

carra 15 hours ago |

Not just AI. Every subscription in general can be a time bomb. You grow more dependent on it, and the provider can disappear or take it away at any moment.

aaronharnly 7 hours ago | |

This is exactly true and it’s what every SaaS company counts on. They have four basic moves:

1. Sell you X seats, expecting that internal network effects will get you to 3X soon enough.

2. Sell you X seats, noting that as soon you reach 2X seats, you need to move to the Enterprise plan, under which largely the same feature set plus a couple of gated SCIM controls will cost twice as much.

3. Sell you X seats for their core product, then announce that features B and C are actually an additional SKU that requires additional per-seat subscriptions that double the cost.

4. Sell you X seats for their core product, then immediately announce that you’re on a “legacy plan”, but they’re happy to move you now to their updated offering, which is the same product but now structured to cost twice as much.

Ekaros 14 hours ago | |

I would expand this to any dumped product or service. Whenever the real cost isn't paid now it will sometime in future or will collapse. Just look at how extractive food delivery and taxis are. Start with dumping. Then be last one to survive and fleece all the sides you can.

bilater 13 hours ago |

[flagged]

gizmodo59 17 hours ago |

Inference is profitable. Companies lose money because:

1. Training is expensive. Not just compute but getting the data, researchers salaries etc 2. You have to keep producing new models to ensure people use your inference and there seems to be no end to this. So they have to pour more billions to keep the cycle going on 3. People salary and other admin cost are not that high compared to 1 and 2.

atq2119 17 hours ago | |

Inference at per-token pricing is profitable.

The article's point is that if you're relying on flat fee subscriptions, a rude awakening may be coming. That seems plausible to me. Issues around token quotas are a frequent topic on HN.

fg137 17 hours ago | |

So? How does it change the equation?

Nobody is going to charge "inference price" for model usage.

hibikir 16 hours ago | | |

Given that it is no a monopoly, and changing providers is very easy, it's not going to be all that easy for anyone to charge a lot more than inference price. It's not someone in cloud A, facing huge costs to migrate to cloud provider B.

andyfilms1 16 hours ago |

Replacing your workers with AI:

--You lose control over their "salary"

--You lose control over their "schedule"

--Your company becomes reliant on another party that does not share your interests or values, and can stop working for you on a whim for any reason

But AI is definitely good and trade unions are definitely bad, apparently...

nijave 15 hours ago | |

>Your company becomes reliant on another party that does not share your interests or values, and can stop working for you on a whim for any reason

That's the same as human workers. In both cases there are contracts/money to help align interests

derektank 14 hours ago | | |

Exactly, the principal-agent problem applies to all agents, be they human, corporate, or robotic

chermi 14 hours ago | |

If only there was a way to think beyond direct substitution.

fg137 17 hours ago |

Why does the author assume that enterprises use subscriptions?

Many companies use models deployed on Azure/Bedrock etc are already paying based on usage (often with discounts).

stego-tech 17 hours ago | |

Not SMBs and SMEs. Big Enterprises would generally be using API buckets or Enterprise-specific consumption models via sales teams and contracts, but most companies would default to subscription tiers - either due to shadow IT paying out of pocket for subscriptions to duck corporate IT, or because they’re too small to negotiate rates and API buckets, or because their IT teams lack the skills needed for the same.

Remember that enthusiasts leaning on API keys and large enterprises are the exception, not the norm, and even some large customers may lean on subscriptions for at-scale adoption and wait for teams to report hitting usage caps before buying more token buckets. Subscriptions are predictable, reliable, and above all else a contractable way to acquire service.

Truth be told, this has been my red flag in orgs and with peers elsewhere for several years, now. Those orgs leaning on subscriptions are in for a nasty surprise within a year or two (like the author, I predict sooner than later), especially if those subscriptions power internal processes instead of AI buckets.

Hell, this is why I think there’s a sudden focus on the “Forward Deployed Engineer” nonsense role: helping organizations migrate from subscriptions to token buckets for processes so the bill shock doesn’t send them running away screaming.

Animats 11 hours ago |

So, will the AI companies raise prices? That's the article's main claim. Uber ran at a loss to build market share for over five years after the IPO. So it it not impossible for an overhyped IPO to run in that mode. The AI service industry might do that, too.

Uber raised prices some, but mostly squeezed drivers harder. When Uber started, driving for Uber was a well paid job. It isn't, now. AI companies are mostly capital cost, so they don't have the oppression option.

Hardware price/performance may not improve much near term. Graphics GPU price/performance hasn't improved much in the last decade. DRAM prices have gone up. Fabs are all booked up. NVidia says not to expect better price/performance before 2030.

More efficient, specialized models are a strong possibility. Dumping all of human knowledge into a coding tool may be unnecessary. Although this would work a lot better if the LLM crowd figured out how to get a reliable "I don't know" answer out of a small model, then call on a bigger one for help.

r-johnv 4 hours ago |

The article repeatedly uses the API charge to the 'customer' as an estimate of the cost value to Anthropic/OpenAI.

Is there any good data on the true cost of inference for these companies?

babajabu 16 hours ago |

Even if they are momentarily losing money it’s important to note the value add they are providing.

If you increase the price, the value is still astronomical in comparison.

Companies need to find a way to leverage local models in tandem with frontier models to offset the costs.

It’s all about targeting specific workloads with the appropriate AI. These tools are not sentient beings they are tools that need to be properly configured to match the job at hand.

4aslk19 16 hours ago | |

You could use "git clone" or Wikipedia for free. If you mean the value of propagandizing gullible people, yes, there is "value".

derektank 14 hours ago | | |

Search costs aren’t trivial and, prior to LLMs, being able to find the piece of information on Wikipedia or software on GitHub that solved your problem took time, a lot of time if you weren’t an expert and unfamiliar with the jargon.

babajabu 12 hours ago | |

Just to clarify, value in this context is economic value through output. Not considering environmental impact etc.

jeswin 16 hours ago |

Since we can't reliably detect AI generated crap, I think it makes sense to penalize their submission. I say this as a generally pro-AI person.

nunez 13 hours ago |

Precisely why every bigco is spending $$$$$ buying/reusing GPUs to build their own inference serving stack based on open-source models (usually gpt-oss or one of the LLaMa variants; many bigcos in the US cannot run PRC models). That and having more control over data locality.

Those same companies are getting sweetheart deals with the frontier AI labs in the hope that infrastructure costs go down enough in the future to invert profitability, but it's still a risky position for them to be in. (Having their own infrastructure gives the bigcos huge leverage, even if it's only 80% as good as frontier.)

wegwerper 13 hours ago |

Does this article contain any original thought?

It's clearly llm-spew in its mannersims, making me wonder if there were any nuggets of wisdom in its core or if it in entirety is part of some llm-driven blog spam project?

wunderlotus 13 hours ago | |

No and yes.

exabrial 16 hours ago |

Eventually, after the seed funding is spent, you will have to pay the real cost of the coal used to power your queries.

The best course of action is to take advantage of subsidy for awhile, but not integrate is so deeply one can’t retreat. You’ll still have full productivity, just be cognizant of the reality of the situation.

Hopefully the market eventually collapses to where companies are hosting their own inference, and you simply lease a model package to run on your own (or rented ) specialty hardware.

prash20026 13 hours ago |

Just as a counter example, Midjourney is completely self funded and profitable. But they are images, LLMs might be more expensive to train but their inference is cheaper.

So the frontier model companies might have crazy valuations and they might never reach that. But that might not mean they are actually unprofitable.

paoliniluis 16 hours ago |

The FED will print to infinity as the US gov can’t stop spending, mostly all of that money will keep going to the only industry that’s growing and provides crazy returns for family offices and VC’s right now which is AI. I don’t agree with the authors opinion here as the “time bomb” timer is simply the entire world buying US debt here, which won’t happen in the short/medium term

AbstractH24 13 hours ago |

This is true of every vc backed company they rely on

And some parts of most publicly traded ones.

If it’s not a bootstrapped company with a single offering, there’s a highly likely something there doing is at a loss in the name of growth (and even there, the loss might come in the form of deferred compensation)

edwinjm 11 hours ago |

Does the writer understand that for every developer who burns all tokens, there are many people who subscribe just to join the AI revolution, but only ask a couple of questions a day?

gampleman 10 hours ago | |

No. Large co I work at everyone is like running at least 3 concurrent Claude sessions all day every day. Talking to friends in other companies it seems the same.

Big difference between professional deployments and personal ones.

wan23 14 hours ago |

I tried out Gemini in Google Sheets the other day. I asked a pretty simple question and the agent ran for like two minutes trying to answer it until I stopped it. I can't imagine these agentic features are cheap to run for what they get you.

mojosmojo 10 hours ago |

They have all switched to usage plus cheap seats based costs for enterprise contracts. the seat costs are typically 20-35% of total spend.

einrealist 17 hours ago |

Those price increases will increase the pressure to use cheaper / free models (commoditization), thus cutting into the revenue projections of the frontier model vendors. Its going to be exciting to see what happens to these huge investments and valuations.

fg137 17 hours ago | |

> increase the pressure to use cheaper / free models

Not necessarily. Many factors go into what models are available at enterprise level. If you look around, not many companies (everywhere around the world) use DeepSeek models even though they are significantly cheaper.

Jcampuzano2 16 hours ago | | |

I think part of this is due to the fact that the closest competition cheap but comparable intelligence models are all mostly Chinese models.

Think what you want but even when hosted in the US, at the enterprise level going all in on that would be a legal and/or political death sentence.

We need better open source/cheap but high intelligence western models that are proven to work well in agent if tooling and have strong legal agreements for enterprise to even consider it.

Havoc 12 hours ago |

Bad attempt to estimate company costs using api sales prices numbers.

There will be a repricing for sure as any ends of subsidies does but the world will not end

542458 17 hours ago |

I’ve said this before on HN, but there are two things that make me optimistic that we won’t see a big rug pull where price-to-capability ratio skyrockets relative to today:

* People keep finding ways of cramming more intelligence into smaller models, meaning that a given hardware spec delivers more model capability over time. I remember not that long ago when cutting edge 70B parameter models could kinda-sorta-sometimes write code that worked. Versus today, when Qwen 27BA3B (1/23 of the active parameters!) is actually *fun* to vibe code with in a good harness. It’s not opus smart, but the point is you don’t need a trillion parameters to do useful things.

* Hardware will continue to improve and supply will catch up to demand, meaning that a dollar will deliver more hardware spec over time. Right now the industry is massively supply constrained, but I don’t see any reason that has to continue forever. Every vendor knows that memory quality and memory bandwidth and the new metrics of note, and I expect to start seeing products that reflect that in a few years.

I hope that one day we’ll look back on the current model of “accessing AI through provider APIs” the same way we now look back on “everyone connecting to the company mainframe.”

ainch 17 hours ago | |

The price for a given level of capability will fall, but the frontier has recently been getting more expensive. If you compare GPT-5 to GPT-5.5 on the Artificial Analysis benchmark, it's ~4x more expensive, but achieves a higher score. Claude 4.7 is also more expensive than predecessors because of a tokenizer change.

As the AI labs become more reliant on enterprise adoption, it makes sense to push capabilities at a cost that makes sense for businesses. Even if it prices out consumers or hobbyists.

garrickvanburen 17 hours ago | |

I agree.

Between: more efficient models - tuned for the task at hand, the ability to run those models in-house, or even at the edges, plus Google and Microsoft are well positioned to stay ambivalent as they’ve got lots of products to sell and whether or not LLMs are part of the portfolio mix is completely dependent on enterprise customer demand.

Anthropic/OpenAI have a number of aggressive downward pressures on their pricing.

adamgordonbell 14 hours ago | |

Exactly.

Competitive pressure prevents a rug pull.

In a competitive race, each breakthrough gets copied or illicitly distilled or whatever. That means the frontier models are deprecating assets and the mark up tokens should get smaller and smaller.

Now bigger models are more expensive to run inference on, but today's models, or equivalent ability and size models, shouldn't go up in price.

5.5 is 4x the price, but 5.4 still exists, so its not rug pull, but a big more expensive to run and hopefully more valuable model.

throwatdem12311 13 hours ago |

Not my problem I just burn the tokens they give me!

kaydub 11 hours ago |

I'm surprised at how many businesses are using subscriptions instead of paying per token.

smoghat 16 hours ago |

How do the owners of that site correlate this with their business model, which is to use AI to write articles like this one, so as to get clients in the news?

kamranjon 16 hours ago | |

It feels like they just pointed an AI model at Ed Zitron’s blog and asked it to make a super engaging and viral post.

Yhippa 13 hours ago |

Wasn't this the same thing when enterprises started using cloud computing? Did the bomb explode for them?

thomasingalls 12 hours ago | |

Yes actually. After zirp ended, cloud costs got materially more expensive for enough enterprises that there was a good year or so of celebrated "we're moving back on-prem" stories on hn, where companies were announcing savings in the several to tens of millions per year.

alxndr 10 hours ago |

Darkly funny that Pangram 3.3.1 thinks "100% of this text is AI generated"

clearstack 15 hours ago |

MSFT, GOOGL, META are spending $60-100B+ annually on AI infra partly to own the cost floor. the moat isnt the model, its the infrastructure.

oldspleen 16 hours ago |

every infra wave starts with land-grab pricing and ends with metered billing, AI is just running the cycle in 18 months instead of 10 years

siliconc0w 14 hours ago |

Both OpenAI and Claude already charge Enterprise usage rates and they're still buying.

JumpCrisscross 16 hours ago |

> A knowledge worker running a few hours of Claude daily, uploading documents, drafting reports, analyzing data, can easily burn through several million tokens per week. At API rates, that same workload runs somewhere between $200 and $400 a month per seat. Some power users push well beyond that. But on a Pro subscription, the company is paying $20 per head. Anthropic is not the only one eating this cost.

What? Anthropic's costs aren't the API rate. The article never attempts to estimate that cost, which renders its thesis tautology.

sunaookami 15 hours ago |

Isn't EVERY subscription and SaaS a ticking time bomb for enterprise?

happosai 11 hours ago | |

It is, but every enterprise is just looking at the next few quarter results. ROI looks so great when you don't invest in anything and just lease / subscribe / SaaS everything. Time bombs are just a concern for the future.

ghusto 17 hours ago |

TL;DR to save you time:

1. GenAI companies are making a loss in order to gain adoption and later lock-in

2. ???

3. They're going to cash-in soon and start milking you now that business critical systems rely on GenAI

The "???" denotes a complete failure to offer compelling arguments that link 1 and 3.

GolfPopper 13 hours ago | |

Again, GitHub Copilot is moving to usage-based billing June 1st.

https://github.blog/news-insights/company-news/github-copilo...

add-sub-mul-div 16 hours ago | |

We popularized the term "enshittification" so we wouldn't have to keep explaining this.

arnitdo 9 hours ago | | |

Not quite enshittification, that would be the eventual '''realignment and optimization of models to suit business demands and streamline $COMPANY's economic operations while focusing on serving the customers what they need best'''

This is just simply economics. Make an early loss to get a customer base, offer addictive features at irresistible prices, then yank them into your profit-making pricing.

Although yes, the ??? in the parent comment could be replaced with "A/B test some price gouging or ratty pricing methods to see how far we can go without being noticed"

crorella 13 hours ago |

wouldn't move to local models in the future remove part of that risk for companies?

wunderlotus 13 hours ago |

This is an (embarrassingly obvious) AI-generated “article” powered by a company whose business model seems to be AISaaS (AI slop as a service).

ninjahawk1 13 hours ago |

It’s a delicate balance currently. Local models are catching up in breaking speeds while OpenAI is publicly stating they want to sell AI like a “utility” aka only through API pricing.

Meanwhile datacenters put out more pollution and use more electricity than all the plane rides Bill Gates took with Epstein combined, for business meetings of course.

lmeyerov 14 hours ago |

Not really. Claude Code harness with Sonnet 4.5 model showed you don't really need bigger GPU rollouts, and it's only a matter of time for OSS combos to hit that. Overtime, this will only get better, and the set of enterprise tasks smaller deployments can handle will only go up.

blondie9x 10 hours ago |

I think one thing the author overlooked in the solutions/hedging section is using open weight models. Enterprises need to be ready to use their own servers for inference and build pipelines to utilize non proprietary models when possible.

dboreham 12 hours ago |

My own interest in LLMs increased exponentially when, around 18 months ago, I saw a post somewhere that had a guy who wrote his own inference engine in Rust and demonstrated it running with downloaded open weight models. I tried it out and was quite amazed that even on my laptop (no GPU) I could get an LLM to write Python programs and engage in discussions about Lewis Carroll poetry. It went from "magic thing that needs a data center of unobtanium GPUs to do questionably useful stuff" to "thing that does useful things even on a regular computer".

There's plenty of sand on the planet and clever people (and AI) figuring out how to do more work with less sand and power, so any argument that AI is going to cost so much that it won't be usable, seems just preposterous.

zephyreon 16 hours ago |

Aside from the obvious fact that this is AI slop, the author (prompter?) doesn’t consider the R&D of AI itself. Efficiency gains, more compute, etc.

We all know every frontier AI lab is heavily subsidizing usage, and so do all of the VCs & CEOs funding them.

phendrenad2 14 hours ago |

As inflation plays with 10-year highs, fuel prices go up permanently (thanks to the end of middle east oil), and NIMBYs chase datacenters out of their regions, I think it's inevitable that AI is going to go up in price. It's just a question of how much. Companies should have a fallback plan to either switch AI providers, or replace AI with a pool of new hires quickly.

48terry 13 hours ago |

Honestly, this isn't too different from any other software or technology nowadays. "What if the service provider pulls the rug on us and jacks up the price exponentially / begins the enshittification" is (and if you aren't doing it, you should be) a factor when procuring and using anything from a third party anymore.

The software world is, by and large, no longer about making products with a focus on the long-term, whether that's about the customer's well being or even the company's own long-term functioning. It's about trapping people, siphoning their money, then running away after setting the building on fire. Founder McBuilder will throw away his entire userbase and tell them "lol idk good luck" about their usage needs if it means he can make an extra dollar.

This is as true for enterprise as it is for consumers. Look at all the lamenting when a liked name gets bought by venture capital or considers an IPO.

dmazin 16 hours ago |

As a few commenters already pointed out, IME enterprises aren't paying for subscriptions. They're paying per token.

But also... is this shit AI written? I'm so tired of this.

tim333 14 hours ago |

> the gap between what your organization pays for AI today and what it will pay in 18 months is going to be one of the most disruptive line-item increases most companies have ever absorbed

Colour me skeptical on that one. Unless the AI improves a lot so it makes sense to spend more.

PKop 17 hours ago |

> is not a rounding error. It is

Who said it was?

> Pull out the napkin. This matters.

The article wouldn't exist if you didn't think it mattered, just tell us why.

> the question is not whether they got a good deal. The question is

Who said that was the question?

> This Is Not One Company's Problem

Who said it was?

Stop telling us what thing aren't, just speak like a normal human and convey your own thoughts. It's an insult to your audience to throw constant AI slop at them.

> thousands of companies have woven AI subscriptions deep into their operations. Marketing teams draft copy through ChatGPT Plus.

Yea I bet you do..

megadopechos 16 hours ago | |

After reading the third "rounding error" phrase I quit.

jauntywundrkind 10 hours ago |

Good fucking luck DeepSeek. Thoughts & prayers to you with what's about to hit, shit.

lol8675309 12 hours ago |

LOL AND DUH

submeta 14 hours ago |

This is true. At our company they rolled out ChatGPT with Codex. After two months of happily using it, I got a call from the IT OPs telling me I burnt through four hundred million tokens, 200m a month. And created at least a thousand euro bill. That’s after I used all the credit, but I don’t have all details. The guy told me to „watch my usage.“ What does that even mean. He doesn’t use it himself and apparently he doesn’t know how value is created here and how he can monitor and limit usage.

Did OpenAI switch from fixed prices per seat to usage based? This will surprise many companies I reckon.

Personally I use Claude Code, the 200 euro plan. And am a heavy user. A few weeks ago I realized that CC shows the token usage in cli, in the bottom right. Something I never cared about because I thought paying 200 euro a month will give me „unlimited“ access.

But I guess the party is slowly coming to an end? Prices are going to increase slowly? And the flatrates will be removed eventually?

Too bad, it was nice while it lasted.

Lapsa 15 hours ago |

"In 1975, Dr. Joseph Sharp proved that correct modulation of microwave energy can result in wireless and receiverless transmission of audible speech."

jqpabc123 17 hours ago |

It is "bait and switch" --- done on an industrial scale.

nrawe 14 hours ago |

This mirrors my own thoughts. Additionally, for businesses looking to replace people (particularly developers) with agentic AI, this is arguably worse from an accounting perspective as the cost of using these services will likely be pure OpEx vs capitalised per my understanding of US/UK GAAP accounting.

alaudet 14 hours ago |

I had a conversation with Claude yesterday about this very topic. The AI was pretty candid about the issue and said many of the same things the author said. Now I am not sure if I went in with an unintended bias and it just went into full sycophant mode, I tried to be neutral in my prompts, along the lines of the implications of integrating AI into processes when the true cost is not being charged. But it was obvious that even moderate usage is a loss leader, so heavy users with agentic workloads are in a risky situation and should think long and hard about their business model if costs slowly trickle up in the triple, quadruple etc etc range.

I will continue to use it as an assistant that does the menial stuff quicker than I ever could, but it's just too early to let it do stuff that would hurt if it disappeared. Enjoy it while it lasts.

niekkamer 13 hours ago | |

I think a solution could be local hardware acceleration the diffecult thing to achieve is not leaking dmodel data, since yeah that is obviously a nogo for antropic, openai, etc