Expanding Project Glasswing(anthropic.com) |
Expanding Project Glasswing(anthropic.com) |
Step2: offer to test it, but only for the biggest companies in the world
Step 3: onboard those big players on your tooling and product
Step 4: profit
This is genius.
Err... wait... that was already the hard part... hmm
It means than even if the value you offer is similar as your competitors, you are the one conquering the market.
That's the only way to not becoming a commodity.
But I think that downplays the importance of having a good product. If the product didn’t work, this would be a good way to lose trust with a lot of organizations in a hurry.
At this phase no company would risk their brand by calling the product as ineffective. The big players are in it together and small ones have no option but to play along.
Nevertheless collecting the historical wisdom and running it at machine scale does have a lot of benefits for sure. The only question is the signal to noise ratio, machine is doing what humans did, just at a multiplier speed and with a lot more context than what a normal human can hold.
Marketing move doesn't mean scam. It describe the ability to sell people over a narrative and surpassing your competitor in market share. And that's exactly what is happening.
My post is a "tribute" to the efficiency of Anthropic's communication. I never complained about anything, nor calling it a scam, nor saying they should have released mythos to the public instead of rolling it out to a selected cohort.
You tried to expand my words to make me say something I didn't, because my post wasn't giving you a clear conclusion of my opinion regarding their private release.
can't release it the plebs
Will likely give them time to expand capacity as well. And make them harder to dislodge in these orgs.
- Valkey/ Redis port here https://github.com/ianm199/valdr (passes ~99% of single node test suite, real prod features like replication/ clustering/ HA early or not implemented) - Further along port of Lua 5.1-5.5 https://github.com/ianm199/lua-rs-port/tree/main - I have a less developed nginx version that would be the north star - These projects are very alpha at the moment
If anyone is interested in getting involved in this or has done similar experiments I'd love to collaborate! There is so much variation in how you can run these large scale agent fleets I don't think anyone has a perfect system yet.
If society can't trust banks and other institutions to safely control their data, what follows ?
Do we we collectivelly switch off the internet?
They seem pretty close, in both average and "best run" scores. And, in a highly verifiable domain, "best run" or pass@n is what you're looking for.
People and organizations can have mixed motivations. It’s often not “just” one thing.
They’re using security concerns to mask their inability to deliver the model at scale, while still trying to maintain their lead over OpenAI. As a result, they’ve chosen to release it privately under the banner of an “ethical” rollout.
I'm afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.
Is this why both Anthropic and OpenAI are rushing for IPOs this year?
Security wise, it's about being able to find and chain multiple vulnerabilities to actually create viable exploits.
So I would imagine that if you were using it for regular software development you may not feel that it's that different unless used in a particular way?
Nonetheless, running many of the open weights models over a codebase, with an appropriate harness, can provide about the same vulnerability coverage (i.e. each of the open weights models would find a subset of what Mythos or GPT 5.5 could find, but the subsets are not the same).
Despite needing more runs and more time, this may be significantly cheaper, especially if the models are self hosted.
Based on what Anthropic said about Mythos, they also use a quite elaborate harness for finding bugs and vulnerabilities, i.e. not a simple prompt like "find the bugs".
They run repeatedly Mythos on each file of the codebase, many times. They start with more generic prompts, used to determine whether a more thorough analysis of that file is worthwhile. Then they use more specific prompts, to detect various classes of bugs. After it becomes probable that a certain bug exists, they do a final run where the prompt requests a confirmation of the already known bug, perhaps together with a proposed patch or a PoC exploit.
Therefore the efficiency of finding vulnerabilities depends a lot on the harness, not only on the LLM. Also, searching vulnerabilities in a big codebase when paying per token is very expensive, because it requires many runs of the LLM.
No comparison to human teams, and I’m sure that $1 million in tokens was used by humans, in a team. So like most AI, they’ve developed a tool that capable people can use to be better, but unlike most tools, they’re claiming this to be outright magic. The magic is the hype train.
The only trend Mythos continues is Anthropic’s trend of warning that disaster is always 6 to 12 months away.
I mean most nasdaq tech companies would be in 13+ countries, why are they writing this like it's a big number, is hilariously small?
I don't think they're trying to flex this as a large number. They don't want to give an exact number, as that may change etc / is fuzzy, but also want to give you an idea of the scale.
They say "In the future, we intend to expand our geographical reach much further". I imagine this commentary is somewhat related to the concerns that AI will create an even worse "global underclass". AI developments are first accessible to Americans, then allies, and then later the whole world.
Yes, Anthropic is compute constrained, even after the SpaceX Colossus deal.
But supply constraints are the normal operating mode of any market. Anthropic could choose to serve whatever models it pleases at whatever price points it chooses and let the market decide where the value is.
If Mythos at $X overwhelms their capacity, they could just charge $X+1. If still overwhelmed, there are larger prices as well.
So they have a whole lot more compute now than they did last month.
As an ordinary developer who relies on a $20–$200/month subscription, I feel disappointed by the release of a paper describing a model that I can’t actually use.
They did produce great value, claude code and opus 4.5 are a singularity in software engineering.
The job we practiced for decades simply doesn't exist anymore.
To a lot of us it’s not clear that’s what’s happening. It’s speculation and one possibility.
It may also be a secondary consideration and not the primary gating factor.
Anthropic has had their missteps but it’s still plausible to take what they say at face value.
For all they know they'll find a new optimization that lets them serve Opus class models for half the computing cost next month. Or someone will invent the next OpenClaw and demand will 10x over night.
Two things of note: 5.5-Cyber is likely to be substantially cheaper than Mythos, given it is priced around Opus. Additionally: AISI has never tested OpenAI’s best public model and actual Mythos competitor: 5.5-Pro.
Nor publicly or in my internal reasoning. I rarely conclude without proof or very intense and clear intuition.
From a strategic PoV it makes sense to check if their model is dangerous, I wouldn't want to have my brand name associated with "NK hacker team find zero day in all linux servers of the web and ..."
Yet they’re still the predominate search engine, sadly the concerns of the few don’t interest monopolistic profit seekers without forced regulations, think how airlines are legally required to give refunds for delayed flights, there’s a reason it required legislation
You don’t tie it to “your device”.
You tie it to your security key.
Which is treated like a credit card.
and your extended family, friends, or volunteers can act as social proof to allow you back into your accounts,
if your key burns up, it breaks and you were too cool to provision a backup, etc.