John Jumper to join Anthropic(twitter.com) |
John Jumper to join Anthropic(twitter.com) |
Their newest model wasn’t really SOTA. And honestly fable 5 was the most human like model I’d ever tried. It was an incredible jump.
And recently lots of Claude users at r/ClaudeAI are noticing Opus 4.8 has really increased in capability. Not new things but maybe redirected compute. It just feels like one of the best models ever, maybe because the compute that was previously assigned to Fable has been redirected? It feels incredible.
https://artificialanalysis.ai/articles/glm-5-2-is-the-new-le...
The idea of "falling behind" when you can leapfrog each other every six months leads me to believe it has to be more than just "falling behind" for one cycle. It's a culture, process, red tape, focus, or mandate problem of some sort. Something not as easily correctable preparing for next launch.
I've definitely noticed it, at least for doing backend C#/dotnet. Its insanely good, I haven't had to babysit much at all this week.
When personal finance is not the bottleneck anymore, the new criteria becomes "vision" and "stacked talent".
That was when they realized the deep learning was largely unnecessary, and they could just use their massive compute resources to brute force the problem space.
Proving that we would greatly benefit from using our compute resources for science rather than showing ads, and then we just kept showing ads.
Thank God. I'd rather companies ship something when engineers say it's actually ready rather than when the suits want something to show on stage to pump their egos and career exposure but turn out to be a massive disappointment covered in fluff.
I guess it depends on what you're using it for. I use it almost daily as an alternative to google search and it's great for that, but I think it's absolute garbage for coding and reasoning.
For questions related to coding, solving Arch Linux and WINE Lutris issues, helping me with MXLinux issues, and wifi issues on an old rooted huawei tablet running LineageOS, it was consistently wrong, constantly giving out confident but outdated or misinformation, or hallucinating stuff while gaslighting me. Every time I would point out it was wrong, it would re-check and keep apologizing and then repeat giving me wrong answers, and then apologising again and so on. Same for asking it to write me a cover letter based on my resume and the job description I want to apply to. It massively sucked at that too and made up a bunch of fake sounding BS.
Basic free tier ChatGPT 5.5 would blow it out of the water on all of those tasks. Hell, even Grok free is better at that, it gave me a one-shot Arduino code that blew Gemini 3.5 flash away.
3.5 Flash seems tuned to just eyeballing basic answers to general purpose questions that resemble Google searches like "give me a recipe" or "give me a workout plan", or "when did Yandex move to Netherlands", not to solving complex issues that require cognition and accuracy. That's what the 3.1 Pro is better for. It doesn't matter what prompts or jailbreaks you give it to get 3.5 Flash to chew longer on complex problems for better reasoning and accuracy, it just defaults to being lazy and giving you the quick and easy answer from its weights, which can be totally wrong. It also gaslights you by first telling you how amazing things from your question are.
I think Google just doesn't care about being the SOTA for coding, reasoning and accuracy, since they're in the ads and search business, not in the agentic coding business, so if the answers are some hallucinations that sound "good enough" to its clueless search user base, but is at least dirt cheap to run on their datacenter hardware, then it's already more than enough for them and they can all it a day.
Meanwhile OpenAI and Anthropic don't have search and ads monopolies, so they need to perform well at certain task for people and businesses to give them their hard earned money and survive as companies. For them, nailing stuff like coding and writing accuracy is an existential threat, not a hobby sideproject like it is for Google.
Google seems more interested in fast models that can quickly turn responses, which kind of fits with a company that needs to serve AI on a mass scale.