We went from simple chatbots to thinking models which massively exploded token utilization.
We then went from simple thinking models to tool calls and agents. Agents, and particularly long horizon agents, burn truly insane numbers of tokens blowing thinking models well out of the water.
People are trying to do agentic swarms as the next step but I don't think those make sense as of right now. Particularly they are just too insanely expensive and not that useful.
Plus right now the models just aren't good at it. It's like early agents when they first started making tool calls.
Agents are really quite bad at using subagents. They don't really internalize how to deploy them and they also don't utilize them in the ways that make sense (produce planning documents, have verifiable artifacts, break down tasks in ways that minimize risk, recognize model limitations in instruction following, iterate on results, etc).
Your last paragraph is also striking in that it exemplifies how far away from general intelligence they still are.
Most of everything tends to suck. Most projects go nowhere, most companies fail, most scientific papers are garbage.
> how far away from general intelligence they still are
Economically the real question is to what extent can these systems replace or augment human labour. And I think right now the extent is pretty shocking if not currently very well integrated.
Scientifically the fact they are bad at using subagents is sort of expected. How to use agents effectively is still a bit of an open question. A human from mid 2025 would be bad at it. Why should a model trained on data from 2025 be good at it?
If these things were to be generally intelligent they need feedback and retraining. Which persumable the Labs will do once these sorts of questions start having good answers and we can create good benchmarks and measures for meta orchestration.
Claude uses up its 6 hour or whatever quota in a couple coding prompts. Buying extra credits for the same amount as a monthly subscription and it's used up in 3 hours.
Kimi gives me about double what Claude does per window but uses up its entire weekly quota in the same time, for the same price as Claude. And I get worse results.
Gemini worked OK for a day or two and now is running one tool every 30m and getting nothing done, apparently they've been in constant outage status for for nearly a month: https://aistudio.google.com/status
I haven't tried ChatGPT because of ethical issues but well, I'm not sure that makes any sense.
Four prompts a day isn't something where I go, wow, this has revolutionized my programming. I might very well be getting more done if I wasn't fighting with the constant CLI bugs and work left half finished for 3h to 5 days when my quota is used up.
Anthropic is capturing exploding enterprise demand via their agentic tools, OpenAI is failing (relatively) to do so. They’re stuck trying to squeeze more $$ out of consumer chatbots that have reached the second knee of the S-curve.
The obvious story seems to be that OpenAI was reckless and got way ahead of their revenue assuming it would keep hockey-sticking
At some point in the next few years investors are going to want their returns. The only way I see that happening is though an IPO and then… I don’t know if they have a sustainable business model or one in sight.
Which doesn't mean an end to the AI race, since China is unlikely to care whether US companies secure financing
Also, if this happens OpenAI will probably bailed out with taxpayer money
Umm, whats your point? We arent spending 1.4t on other shitty things that are tipping to fail
As to why.. it's corruption. They would be bailed out as an act of corruption, to use the public machine for private gains.
OpenAI is a young company and if it collapses it is an indictment that AI is perhaps not really all that valuable. Further the technology and brand can be sold to a US company with plenty of expertise in AI themselves. The loss of jobs, relative to the larger economy, is minuscule. Google, Microsoft, Amazon, Apple, would all be happy to buy at a discount.
They want to cut those programs anyway, it's a win/win.
The Iran war is probably going to cost $1T or more (it's ranging from $1B/day to $2B/day already), so it's going to be more costly than this hypothetical OpenAI bailout. The Republican base doesn't care.