Mythos Finds a Curl Vulnerability

Mythos Finds a Curl Vulnerability(daniel.haxx.se)

703 points by TangerineDream 34 days ago | 282 comments

rzmmm 34 days ago |

Quote:

"My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing."

It's a good reminder for us all that the competition in this space is rough and lots of more or less subtle marketing is involved.

therealpygon 34 days ago | |

Anthropic using marketing to convince people their models are more advanced, better built, or that AI is a threat that needs to be regulated because only they have the answer? I’m shocked.

More seriously, so far I haven’t seen much indication that Mythos is more than Opus with a security focused code analysis harness. That said, the fact it can find these bugs in an automated fashion is the more important takeaway outside of the hype.

I’m curious what the error rate is on the detections, because none of that means much if it is wrong 90% of the time and we are only hearing about the examples that are useful marketing.

johnbarron 34 days ago | | |

>> Anthropic using marketing to convince people their models are more advanced, better built, or that AI is a threat that needs to be regulated because only they have the answer? I’m shocked.

I remember when OpenAI was saying GPT-2 was too dangerous to release.

JeremyNT 34 days ago | |

This is roughly what I was assuming but of course the big caveat here is that they were already using the existing LLM driven tooling on an extensively audited codebase.

So while anthropic's marketing may be hype there just wasn't much left to find, a point he makes in the blog post.

Whether it's a big step forward for other kinds of projects is difficult to tell, but this highlights that everybody should be using AI code review tools to audit their existing code today, and not everybody is.

embedding-shape 34 days ago | | |

None of those other LLM tooling made the claims they're too dangerous to be released and used though, unlike Anthropic did with Mythos.

What it highlights, is that Mythos doesn't seem so much better than other LLM driven tooling at finding security issues, which was the strongest claim Anthropic made in the first place.

voxl 34 days ago | | |

Everyone should be using exclusively a proof assistant (Lean/Agda/Rocq/Isabelle) and proving their code correct, but they're not.

Do you see how ridiculous the zealotry sounds when its not your personal kind of zealotry?

thombles 34 days ago | |

Curl simply isn't a good data point. It's one of the most picked-over codebases in existence with extensive security testing practices. All the researchers using not-quite-Mythos models have had plenty of time to report bugs up to this point. Daniel may be right that Mythos hasn't been a game changer for curl but the preconditions are different for virtually any other codebase. Perhaps the real marketing here is his own modesty about curl's maturity.

GuB-42 34 days ago | | |

To me, it is a very good data point.

Curl uses all sorts of tools, including AI tools to find bugs. These tools, according to the article found hundreds of bugs including a dozen CVE.

Mythos found one vulnerability. It means the Mythos is just another tool, not the revolution it claims to be.

It is common that when a new tool is introduced that a bunch of bugs are found, with diminishing returns. Mythos finding one vulnerability is consistent to what I would expect for a major update to an existing tool, which Mythos is over existing LLM-based solutions.

spongebobstoes 34 days ago | | |

that makes it a good data point, because it is better able to illustrate the incremental capabilities of Mythos compared to previous tooling

that helps us to understand how much of Mythos is hype and how much is real

20k 34 days ago | | |

We see this exact hypetrain every time a new model is released. Mythos simply hasn't lived up to the "we're all gunna die from the flood of vulnerabilities" hype even slightly. Its slightly better than previous models by all accounts, cool stuff

I've seen literally near word-for-word this exact chain of events multiple times previously

orblivion 34 days ago | |

Is Mozilla marketing on Anthropic's behalf?

    As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week’s release of Firefox 150 includes fixes for 271 vulnerabilities identified during this initial evaluation.
    
    As these capabilities reach the hands of more defenders, many other teams are now experiencing the same vertigo we did when the findings first came into focus. For a hardened target, just one such bug would have been red-alert in 2025, and so many at once makes you stop to wonder whether it’s even possible to keep up.

https://blog.mozilla.org/en/privacy-security/ai-security-zer...

wongarsu 34 days ago | | |

There are three things happening simultaneously: 1st a new model, codenamed "Mythos", 2nd a lightweight harness built for finding vulnerabilities, and 3rd a push by Anthropic to collaborate with various Open Source projects and companies to use 1 and 2 to find vulnerabilities

We know that the combination of all three results in finding lots of security vulnerabilities. That's what Mozilla is talking about. The quote from the curl story states that just 2 and 3, but with just regular SotA models, would have produced very similar results

Which is really the crux of all this hype around Mythos: would the results really be different if they used Claude Opus instead of Claude Mythos? How much is the model, how much the harness, and how much is just because Anthropic is running a big campaign systematically trying to find vulnerabilities?

pavon 34 days ago | | |

It is difficult to compare these two accounts since Daniel Stenberg didn't get access to Mythos himself, and we have no information about how it was run compared to the other AI models that have been used on curl. It is possible that Mythos is not much better than these other models, but it is also possible that the curl team simply made better use of the other models.

Part of what made Mythos so effective for Mozilla was the integrated agentic workflow where it not only looked for bugs, but then created an exploit to demonstrate them, and ran that exploit while dynamic analysis was enabled verifying that invalid memory access occurred. In this case it hard to know how much of their success was because they put more effort into the harness compared to previous tools (we know they did), or if Mythos was more suitable for this sort of workflow to begin with.

Not many apple-to-apple comparisons to be made with Mythos at this point.

spenczar5 34 days ago | | |

Yep! The industry term is "co-marketing" and its hard to avoid seeing once you spot it.

dboreham 34 days ago | | |

I think it's more the cost to find a vulnerability that has significantly reduced, not the possibility that the vulnerability could have been found. But that cost mattered tremendously because someone has to fund the effort to find the bugs. This economics also applies to attackers.

Pay08 34 days ago | | |

I certainly wouldn't be surprised if they were.

skywhopper 34 days ago | | |

Absolutely 100%

vidarh 34 days ago | |

It may well be that the hype was primarily marketing.

The other alternative is that Curl is simply secure enough that there was far less to find than in other projects.

shakna 34 days ago | | |

Daniel found 30 CVEs in Curl, this year. I would not say that there is nothing to find, here. Just that it takes an actual expert.

teiferer 34 days ago | | |

Given how much money is on the line, it would be gross negligence if anything came publicly out of the CEO's mouth or is otherwise published by the company that's not marketing.

Joel_Mckay 34 days ago | | |

Not really, curl has slow anonymous memory leaks because of how the connection session caching was implemented. If you don't periodically restart a program, than people encounter strange hard to diagnose issues sooner or later.

Also, looking at something that trips valgrind warnings already, may obfuscate a lot of problems in both your own code and the curl library itself.

One could report the issue as functioning as described in the API, but the developers do not accept direct community input into the project.

People use it out of convenience, but it is just as janky as most bloated projects. =3

bigcat12345678 34 days ago | |

My guess:

Marketing is not intentional.

Evidences: 10 years ago, when I interviewed Baidu AI with Andrew Ng and Dario, Dario is the kind of person is pure-hearted to the point being ideological. Given Dario's successful career so far, that essence has gradually grown into a conviction, and surrounded by a purposely built team which amplifies his ideology.

Humans are very convenient creature, a rare few small fraction of them are no doubt the master of convenience: they morph their mental manifold without a hint of contradiction in their own mental mechanisms.

stingraycharles 34 days ago | | |

These things are layered. They are great scientists, smart people, etc.

Things change when you’re running a business like Anthropic, especially as the CEO. You have a responsibility to shareholders, and you just need to play the game.

Anthropic chose a great angle: focus on professionals / enterprise, safety, etc. Those can both be done by a genuine desire to make great technology, and for business purposes require you to position yourself in a bit “better” way than reality.

Just look at what their strategy is with Mythos, it’s almost perfection: the “it’s not ready to be released to the public” angle hits all the marks: they care about responsibility / safety, they have “the best” model, and “LLMs are dangerous, but we, as the guardians, can be trusted”. This also helps the industry as a whole with regulation: if they’re being constrained, China will develop even more dangerous models.

This is a result of how smart people treat business, it’s PR perfection, especially given how much the whole industry is talking about it.

(Yes, they fail in other PR areas, but that’s a different discussion)

windexh8er 34 days ago | | |

Marketing is always intentional at this scale. If you think Anthropic didn't put a lot of time and effort into Glasswing as a marketing effort I think you're misunderstanding how these organizations work and how they win.

JumpCrisscross 34 days ago | | |

> Marketing is not intentional

Mythos put Anthropic back into the White House’s good graces. It also branded Anthropic as badass, something their softener image probably needed to win government contracts.

Maybe it wasn’t marketing. But the product’s configuration, and how Anthropic talked about and released it, sure as hell played beautifully. (The timing, while Musk and Altman are distracted with each other, also couldn’t have been better.)

OtherShrezzing 34 days ago | | |

I'm not sure if that distinction is important, since what you've described less charitably synonymous with the phrase "Dario is delusional, and has surrounded himself with yes-men, so outlandish marketing gets published as a side effect".

Whether the person doing the marketing was sincere about it or not is immaterial, since marketing is experienced almost entirely by the people consuming it, and not the people communicating it. What matters is if the audience is sincerely concerned by the message, and it's transparently the case that they were sincerely concerned by it.

petesergeant 34 days ago | | |

All your evidences can be exactly true, and he genuinely believes that Anthropic "winning" the AI race is the best outcome for humanity even with a little subterfuge including marketing to the current administration. If I genuinely thought I needed to do something to secure humanity, there's little I wouldn't do to achieve it.

teiferer 34 days ago | | |

> Marketing is not intentional.

That's an odd definition of "intentional". Evolution has filtered for people with certain views and the marketing has just emerged from their actions. ... So?

A deadly virus (naturally occurring one let's say) wasn't created intentionally. Evolution selected for it. It's still bad and kills people. Doesn't make it nice because of lack of intention.

keybored 34 days ago | | |

This is marketing.

cvwright 34 days ago | |

Even that press release never claimed that Mythos was better than Opus at finding bugs.

They claim the huge advance is in exploiting the bugs.

smusamashah 34 days ago | |

He also said this [1] a few weeks ago about AI PRs.

> Over the last few months, we have stopped getting AI slop security reports in the #curl project. They're gone.

> Instead we get an ever-increasing amount of really good security reports, almost all done with the help of AI.

> They're submitted in a never-before seen frequency and put us under serious load.

> I hear similar witness reports from fellow maintainers in many other Open Source projects.

> Lots of these good reports are deemed "just bugs" and things we deem not having security properties.

[1]: https://www.linkedin.com/posts/danielstenberg_hackerone-shar...

jansan 34 days ago | |

Mythos marketing really leans into that "too powerful to be legal" vibe, much like how PS2s were allegedly banned from North Korea because their chips were basically missile-grade.

alwillis 34 days ago | |

> My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing.

I think the results say more about the great job the curl team has done maintaining their codebase.

This doesn’t mean Anthropic's Project Glasswing is a marketing stunt. Logically, it doesn’t make sense: when they announced Mythos Preview, Anthropic couldn’t meet customer demand; they didn’t have enough compute to go around. So they decide to hype an unreleased product to drive even more demand? All that would do is piss off their existing customers who already experiencing rationing and frequent outages.

Many forums were already flooded with "I cancelled Claude Code" as it was.

On the contrary, it would be incredibly irresponsible and unethical for such a young company with billions of dollars of other people’s money invested in them.

Because the Mozilla team used Mythos and found 271 vulnerabilities [1], does that mean they're in on the so-called "marketing stunt"?

Of course, if Anthropic had released Mythos to the public and bad actors used it to hack a large number of banks, hospitals, government agencies, etc. in a matter of days, the HN crowd would be all over them for acting irresponsibly and criticizing them for not knowing better.

[1]: "Behind the Scenes Hardening Firefox with Claude Mythos Preview" — https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

asltp_ 34 days ago | | |

Other AI tools have found 300 bugs and this new sentient T1000 only found one. Stenberg himself found 30 this year.

Mozilla is the current poster child but 271 in such a large codebase with thousands of user options, most of them being TOCTOU isn't that much. Sorry. TOCTOU can happen in any language when people are simply exhausted by the sheer volume of case explosions.

There is a third option: Anthropic could simply have reported the issue without mentioning the new model at all. But they don't, since they want to sell to governments and military and the artificial scarcity just provides a veneer of exclusivity that their clients will appreciate.

63stack 34 days ago | |

I'm pretty sure mythos is just a new unreleased version of Opus + marketing + a different system prompt.

eskibars 34 days ago | | |

I suspect so as well.

I've been running my own security scanning software (disclaimer: now starting a company @ zeroquarry.com) for this, and from what I've seen there's a huge value in prompts + adversarial LLM review. Without adversarial review, you get garbage (as this blog points out: 4/5 basically are nonsense) and with a good prompt, you can use almost any "near frontier" model from my experience as long as the prompt helps with the guardrails or the model doesn't protect in such a strict way

h1fra 34 days ago | |

They might be biased by the fact that curl is significantly more secure than the average software

toraway 34 days ago | | |

I've seen this suggested a few times in this thread but it seems like it's exactly backwards.

Wouldn't that make it a better to distinguish whether Mythos is uniquely super powerful vs an incremental improvement from Opus etc that are routinely used as the basis for bug reports/fixes in cURL?

If Mythos found a hundred new show stopper bugs then it would have meant Opus missed them and therefore closer to a "step change". Otherwise it implies the difference in capability isn't nearly that stark. Mythos finding 100 low-hanging bugs in a less scrutinized/hardened project on the wouldn't be as useful signal to answer that.

khlapz 34 days ago | |

Yes, the governments fall for it for the time being:

https://www.politico.eu/article/anthropic-hacking-technology...

This is an advertising masterpiece: UK gets first access, the EU is jealous and wants it, too. Thousands of bureaucrats and parasites make money in the process writing (probably using AI) whitepapers and sitting in meetings. The open source authors whose works are being scanned make nothing.

We know how the money flows. Another unrelated example is that ex MI6 director Sir John Sawers is a Palantir consultant and sells out the UK to Palantir.

coldtea 34 days ago | |

>It's a good reminder for us all that the competition in this space is rough and lots of more or less subtle marketing is involved.

About as subtle as a personal injury lawyer's billboard

steve1977 34 days ago | | |

Better Call Dario

te_chris 34 days ago | | |

A thankfully American reference

red75prime 34 days ago | |

I have an impression that he expects something like the famous move 37 of AlphaGo, while it could be that the situation is like in chess where superhuman engines validated human findings.

hislaziness 33 days ago | |

Am I missing something here. Not finding a major bug/vulnerability just means that maybe the code is really good, not that the model is not what is claimed?

greendude29 34 days ago | |

I'd go out and say the marketing is not subtle. The hype and fanboys/girls are so in line with the marketing that any level of skepticism is seen a an act of defection, but if you look at the words, hyperbole and volume that is used, there is nothing subtle about it.

It's almost Trump-esque - "this model will change everything forever; we are doomed; we are saved; we will all be fired; we will all be rich", etc

xantronix 34 days ago | | |

That's a pretty good encapsulation of the parallels between the political and the technological: One necessarily thrives upon the other and are inextricable. This moment is a culmination of all the disenfranchisement the bodypolitik have suffered, looking for any possible means of escape or elevation. AI and Trumpism, for their own respective cohorts, are salvation, on offer by different frontmen but ultimately in service of the same system.

They need the hype to pay off way more than we do. So many of us who still write code directly stand to lose nothing of our capabilities if the marketing claims cannot hold water.

ehnto 34 days ago | | |

I seem to be totally outside the hype bubble, but I have to suspect there is a lot of imagineering and wild extrapolations in the elss technical hype bubbles. I am curious but no enough to go looking.

bjourne 34 days ago | |

Eh... I think he puts the LLM down for his own ego's sake (as would I!). Curl may, next to the Linux kernel, be one of the most heavily audited codebases in existence. The LLM found something he and thousands of others missed. It's not unimpressive.

billyoneal 33 days ago | | |

The claim has never been that the new model could not do impressive things. The claim is that the new model is not the existential crisis Anthropic’s initial announcement post made it out to be.

wnevets 34 days ago | |

I commented this in another post but I'm going to repeat it because I believe its important for this discussion.

> The worrying part about Mythos isn't the fact that it can find bugs. The worrying part is Mythos being able to find them on its own across entire code base as vast as Firefox then write exploits for what its found with a very basic prompt.

> The skill required to find then create zero days is quickly approaching the floor.

colechristensen 34 days ago | | |

Opus can find bugs on its own in large codebases just fine with minimal prompting.

The great exaggeration is that this is a new capability.

apexalpha 34 days ago |

> An amazingly successful marketing stunt for sure.

This. Well done by Antropic.

It even reached the CISO of my small semi-government org in the Netherlands, who slightly panicked at the announced 'tsunami' of vulnerabilities that was coming with Mythos.

Got us some more money and priority with the board, though.

Never waste a good marketing scare.

EMM_386 34 days ago |

If an AI agent finds zero bugs in a software utility, how can that be viewed in the sense the AI agent is not very good at finding bugs?

What if there are actually zero bugs?

> Five issues felt like nothing as we had expected an extensive list.

The expectation here may not match reality, but not necessarily because Mythos isn't as capable as claimed. curl may just happen to be a well-hardened tool that doesn't have too many security vulnerabilities in its present state.

zamadatix 34 days ago | |

The author considered the same w.r.t. remaining bugs:

> More to find

> These were absolutely not the last bugs to find or report. Just while I was writing the drafts for this blog post we have received more reports from security researchers about suspected problems. The AI tools will improve further and the researchers can find new and different ways to prompt the existing AIs to make them find more.

> We have not reached the end of this yet.

> I hope we can keep getting more curl scans done with Mythos and other AIs, over and over until they truly stop finding new problems.

And that makes sense, it'd be quite the argument of coincidence to say there was just 1 proper find remaining & it was only Mythos that managed to find it just at the point in time it released while the other projects have been hoovering up every other find quickly until that point. Possible, but not the safest assumption to start questioning with.

yjftsjthsd-h 34 days ago |

> Not particularly “dangerous”

I'm not sure that follows. As noted, curl was already analyzed to death with every tool available; most software isn't at that level.

srcreigh 34 days ago |

I can't help but think that curl is, by nature, a relatively simple and well-contained tool. Compare to an operating system or web browser or database or billion dollar company codebase.

It makes some sense that Mythos/ChatGPT 5.5 might be that much better with complexities that curl just doesn't have because it's a basic tool.

Like yeah curl is obviously extremely fully featured as an "anything client" but it's orders of magnitude less complex than other software we rely on.

sausagefeet 34 days ago | |

Curl is a lot more complicated than, I believe, you think. Most people know of it simply as a CLI to hit an HTTP(S) endpoint and write it out. But:

1. It supports basically any file transfer protocol.

2. It is a library that is designed for long running processes.

3. Because it's designed for long running processes, it makes use of every trick it can to pipeline and re-use connections and resources.

4. It has an asynchronous API so it can be integrated into any existing event loop.

Is a web browser or database more complicated? Most certainly, they solve really massive problems. But curl is certainly more complicated than probably most application code that uses it.

joelthelion 34 days ago | |

I agree it's rather basic but as stated in the article, its code is still longer than war and peace. There is still plenty of opportunities for security vulnerabilities in something of that size.

breakpointalpha 34 days ago | |

From the post:

"curl is currently 176,000 lines of C code when we exclude blank lines. The source code consists of 660,000 words, which is 12% more words than the entire English edition of the novel War and Peace. ... curl is installed in over twenty billion instances. It runs on over 110 operating systems and 28 CPU architectures. It runs in every smart phone, tablet, car, TV, game console and server on earth."

I wouldn't call that simple or well contained...

Most OS or web browsers don't run on cars or tvs.

andriy_koval 34 days ago | | |

someone(mythos?) should write some simple-curl with 20% of features implemented in rust used by 98% of users.

bilekas 34 days ago |

> The single confirmed vulnerability is going to end up a severity low CVE planned to get published in sync with our pending next curl release 8.21.0 in late June

My mind still cannot understand the quality and refinement that's gone into cURL. It really is the perfect example of something done so right, that people barely think twice about.

pjmlp 34 days ago | |

Easy, it shows what is achievable if there is a high bar for quality in every single line of code that gets commited, reviewed and merged, regardless of the programming language.

However in the days of race to bottom, offshoring for penies, and now LLM powered code generation, this is a quality most companies won't care unless there is liability in place.

bilekas 34 days ago | | |

> Easy, it shows what is achievable if there is a high bar for quality in every single line of code that gets commited

This is becoming a more and more overlooked/underrated feature. I genuinely believe it would be impossible in any company that depends on shareholder value. I am yet to convince any company I've worked in without bloody hands that we need to solve old tech debt and refactor certain things etc.

patrickmeenan 34 days ago |

As far as I can tell, the messaging around Mythos is that it takes the expertise of the top security experts and top-level language, protocol and code experts and makes that available to anyone with access. The danger was in giving that access to the world before the defenders had access to that level of expertise.

Curl HAS had security, protocol and language experts poking at it for years because of how central it is to everything. That Mythos found anything is interesting but not a sign that it's been marketing hype and isn't dangerous.

You can bet that 99.99% of projects aren't nearly as secure as curl and it doesn't matter if they are open or closed source (LLM's will happily decompile closed-source projects and explore). Unless your project has been fuzzed and gone over with existing AI tooling and by experts, expect that it can already be hacked - even with the tooling that is out there now and that something like Mythos makes it accessible for an even wider population pool with less expertise to use.

2001zhaozhao 34 days ago | |

Take my upvote. Anthropic never claimed superhuman performance, only speed and scale. That it doesn't find much in terms of new vulnerabilities in a well-studied piece of software says nothing about its overall potential for dangerous misuse.

jrflo 34 days ago |

I know that the Mythos hype is part marketing by anthropic, but isn't it possible that with a highly scrutinized codebase, there just aren't any notable security exploits in it's current state? The fact that it found nothing isn't necessarily an incrimination against it, especially when other tools had identified hundreds of exploits previously. Seems like it's been completely picked over (for now).

billyoneal 33 days ago | |

People lost their minds over the mythos announcement specifically because they found something in FreeBSD, which had a reputation as being one of those picked over code bases.

AntiUSAbah 34 days ago |

There is always marketing involved and people should be able to put marketing into perspective.

Also curl in this regard is a open source project, relativly small but critical, well known and used everywhere. Besides image libraries, tools like curl or sudo, su, passwd, etc. would also be my first try.

Mythos is still not known at all what it can do. What does it mean from cost and benchmark pov to have a 10 Trillion parameter model?

Nonetheless, the fact that LLMs got significant better in finding this, better than humans, started to happen half a year ago? so at one point we need to address the elefant in the room and state that today you need to do security scanning additional with LLMs. You need to take this serious.

In worst case, use Anthropics marketing to state that its a must now and something changed.

ahofmann 34 days ago |

Putting on my tinfoil-hat: Sooo, the guy who runs the test and delivers the report could just have removed the more interesting bugs and delivered those to any three letter agency?

casey2 34 days ago | |

curl's source is public so what would be the gain in the rigmarole? Now if the prompt was "create a patch that inserts a zero-day while fixing a bug" that would be impressive.

NitpickLawyer 34 days ago |

What's going on in this thread? It's weird how prevalent the negativity towards mythos is, and I'm not sure if it's people throwing the baby out with the bathwater or something more tinfoil-adjacent coordinated campaign. I also noticed this on a thread a few days ago, before the mozilla post. There were dozens of comments saying basically "mythos is vaporware".

I get the idea that they're using it for marketing. Of course they are. But to reduce it at "just marketing" feels either ill informed or outright wrong. Unless you have reasons to not believe the dozens of credentialed, well respected people in the field that have already shared their opinions after working with mythos. Plenty of them on all the social media sites.

And then there's the team at mozilla. They wrote a blog about this, and they've worked with anthropic before, using opus 4.6 and found and fixed 22 vulnerabilities. Then they worked with mythos and found and fixed 271 vulnerabilities. Unless you're going to accuse them of being shills, these are unquestionable numbers. The model is quantitatively better at this thing. And it matches what everyone is saying.

I think there are better things to accuse anthropic of, than that they are simply lying for marketing purposes. Of course they'll use this as a marketing campaign, but there's plenty of evidence out there that there is something there, that the model is simply better than previous generations at this. Don't fall for the cheap reductionist stuff, just because you don't like them, or feel that this is marketing fluff. It doesn't feel like a gimmick, even if it gets used to push their agenda. Something, something, propaganda often uses true statements as well.

jedisct1 34 days ago |

Swival found many more vulnerabilities without Mythos https://github.com/swival/security-audits

nevi-me 34 days ago |

> These tools and the analyses they have done have triggered somewhere between two and three hundred bugfixes merged in curl through-out the recent 8-10 months or so.

If you've just gone through a lengthy analysis of your code with other AI tools, surely it's reasonable not to expect to see hundreds more from a new tool?

It should be possible, unless more bugs are introduced, to eventually get to a state where there are no more bugs in your code.

Process aside, it sounds like Daniel expected to find dozens/hundreds more bugs.

jaapz 34 days ago | |

Mythos was kind of hyped as the tool that would discover much more bugs than any currently available tool

pbmonster 34 days ago | |

curl had ~15 CVEs in 2026 so far. You surely don't think those (and the one Mythos found) were the last security bugs still left in the code base? There certainly will be more, in fact Daniel predicts ~50 CVEs for the entire year.

But Mythos found 1. After all that hype. 1.

knowaveragejoe 34 days ago | | |

Maybe curl is just... better hardened? Firefox posted hundreds in April.

tgtweak 34 days ago |

I feel like, if it was a codebase without using any security analysis tools, there would have been some more significant findings - perhaps they can re-run it on an 18 month old commit and see how many it found that were subsequenty found and fixed?

Anyway, I think the case that frontier and next-gen models will get increasingly adept at finding vulnerabilities and that those on the receiving end of those vulnerabilities need to be on top of it.

ostif-derek 33 days ago | |

Unfortunately that doesn't help much. LLMs are really really good at digging up known vulns, so much so that they often falsely declare known vulns as new and novel ones.

They have the CVEs in their training data, know how to look up ossfuzz logs, etc.

hmokiguess 34 days ago |

> curl is one of the most fuzzed and audited C codebases in existence (OSS-Fuzz, Coverity, CodeQL, multiple paid audits). Finding anything in the hot paths (HTTP/1, TLS, URL parsing core) is unlikely.

The way this reads sounds more like the LLM dismissed trying rather than it tried and failed, I've seen Claude do that often unless I probe it to challenge itself, curious here what actually happened.

andromaton 34 days ago |

If priced like other Anthropic models, Mythos will make vulnerability discovery a lot more accessible.

The author compares it to AISLE, ZeroPath, and OpenAI’s Codex Security. AISLE and ZeroPath are much more expensive. OpenAI’s Codex Security is gated.

Most people don't care about the first two and don't complain about the latter's policy because they are all specialized models and/or harnesses.

Mythos will be available to all.

vibedev999 34 days ago | |

> AISLE and ZeroPath are much more expensiv

AISLE is *cheaper* for sure

mohsen1 34 days ago |

I don't know about Mythos but in recent weeks I've noticed Opus is constantly failing to fix things in tsz[0] vs GPT 5.5 can easily churn out fixes that are solid and pass tests. I've stopped paying for Claude for now and all my money is going to OpenAI at the moment. Either Opus is massively nerfed or GPT 5.5 is really head and shoulder higher in terms of very difficult tasks. The last percent of conformance tests in tsz are really really difficult and I've seen Opus bailing again and again. So annoying to waste time and tokens to finally get "this is too involved" or "this requires a multi-week sprint to fix".

[0] https://tsz.dev

_pdp_ 34 days ago | |

The new Opus feels like a step backwards. More expensive, thinks more, and it does not get the job done.

vincent_s 34 days ago | | |

From a user’s perspective 4.7 is a downgrade compared to 4.6 . It’s intended to give Anthropic more control about their compute resources and profitability:

https://news.ycombinator.com/item?id=48072916

dyauspitr 34 days ago | |

Having never used Claude and only Codex, does Claude actually say “this is too involved” as a response to a prompt?

mohsen1 34 days ago | | |

Yes it does. Usually after hours of working and not getting results

jorisw 34 days ago |

Love Daniel's writing style here. Fact based, concise, easy to read

readthenotes1 34 days ago |

Kinda burying the lede: AI tools found over a dozen CVEs in curl last year, and hundreds of bugs.

"Primarily AISLE, Zeropath and OpenAI’s Codex Security have been used to scrutinize the code with AI. These tools and the analyses they have done have triggered somewhere between two and three hundred bugfixes merged in curl through-out the recent 8-10 months or so. A bunch of the findings these AI tools reported were confirmed vulnerabilities and have been published as CVEs. Probably a dozen or more."

toraway 34 days ago | |

Not exactly "burying the lede" since Daniel already posted an update about it months ago [1] with extensive discussion in numerous of articles [2] including on this site [3].

[1] https://lists.haxx.se/pipermail/daniel/2025-September/000127...

[2] https://www.theregister.com/software/2025/10/02/curl-project...

[3] https://news.ycombinator.com/item?id=45449348

romaniv 34 days ago |

"I signed the contract for getting access, but then nothing happened. Weeks went past and I was told there was a hiccup somewhere and access was delayed.

Eventually, I was instead offered that someone else, who has access to the model, could run a scan and analysis on curl for me using Mythos and send me a report. To me, the distinction isn’t that important."

Really? We're talking about (essentially) a product demo from a trillion dollar industry fueled by debt. Clearly, blog posts like this have an immense influence on the perception of usefulness of the particular model and AI in general. With so much staked on this for the company, wouldn't you want to be sure that you're using the actual product without anyone messing with the results in any way?

AtNightWeCode 34 days ago |

Should have scanned it with Mythos on an older code base before all these other sec issues was resolved with other tools. Or use the other tools to introduce the same kind of errors in other parts of the code base to see if Mythos would have found it.

A problem is that these tools seems smarter than they are cause they already read seen the answer key.

Semkas 34 days ago |

I'm disinclined to be overly generous to Antrophic, but I have to say that regardless of whether the talk of Mythos being uniquely dangerous was mostly cynical: It would be great if this starts a trend of giving security-critical software a few months head start with any new significantly improved model.

ilia-a 34 days ago |

IMHO Mythos was more of a marketing ploy.

When it comes to security and AI, all top tier publicly accessible models (GPT 5.5, Opus 4.7) and even near-top like Deepseek 4 PRO can do a very good job given detailed harness on how to spot issues and cross-validate them to avoid false positives.

jongjong 34 days ago |

I'm looking forward to trying Mythos run against my 5000-line, instant-finality, quantum-resistant blockchain project and decentralized exchange (an additional 5000 lines). I already ran all the models up to Opus 4.6 and they couldn't find anything.

absynth 34 days ago |

I routinely used to compile C programs on other compilers to find defects that one or another didn't find. Compiling on Windows vs Linux. You could summarize / minimize it down to compiling it with warning as errors etc but you'd be missing the point.

The point wasn't actual cross-platform portability even though that was a nice side effect. It was to flush out all the weird edge cases.

Edges like security flaws. Buffer overflows are usually platform specific. There are plenty of other ways to find these issues but simply recompiling for a different platform surfaces all sorts of issues.

vb-8448 34 days ago |

I guess we miss fundamental information: how much in terms of time and token usage took to the "middle guy" to create the report?

Next question: could it be that OP can use Mythos in a better way since he knows better the project?

23aqsI 33 days ago |

Like clockwork, criticism of the Alpha Omega apparatchiks is flagged. They know how to protect their income streams while open source authors get nothing.

tuananh 34 days ago |

Interesting. curl team found that Mythos is mostly hype while Calif team found Mythos amazing.

I would think Calif (a security firm) is a better team to better utilize such tool.

tedd4u 34 days ago |

It's also a convenient way to get press (and investor valuation) for a new model with releasing it (word is they don't have enough hardware to do so).

yjftsjthsd-h 34 days ago |

> The source code consists of 660,000 words, which is 12% more words than the entire English edition of the novel War and Piece.

Typo, or is there a spoof I should go read?

iso1631 34 days ago | |

War and Peace is about 590,000 words. Tiny compared to the full Harry Potter collection (about 1 million words over the 7 books), but long for a single book.

perching_aix 34 days ago | | |

They're referring to the typo in the title, "Piece" vs "Peace".

I also thought they were contending the word count before noticing. Even remarked how I find this a weird metric, given that code is not prose [0], but then I deleted that once I picked up on what's going on.

[0] comparing the output of `wc -w` with the word counts of books I'm reasonably sure will be super off

edit: ran a calc, substituting out symbols (but not underscores), digits, and comments yields a 390K word count compared to the 660K cited. not excluding the comments yields 600K, so more than a third of all words in the sources are comments.

Accacin 34 days ago | | |

The ten main Malazan books are 3.3 million words, apparently. No wonder it took me such a long time to get through them.

dotancohen 34 days ago | |

Perhaps he was dictating.

Does it say anything else? Just 'Aaaarggghhhh'?

Hamuko 34 days ago | | |

Doubt it considering that Daniel Stenberg is Swedish. English dictation when you speak English as a second language with an accent is quite annoying.

theaniketmaurya 34 days ago |

Who is using Mythos to find these things and where do they run it?

nottorp 34 days ago |

> (I am purposely leaving out the identity of the individual(s) involved in getting the curl analysis done as it is not the point of this blog post.)

I would very much like to know if they were independent or affiliated to Anthropic.

> My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing.

... because of this.

brunoborges 34 days ago |

AI not finding a security issue on cURL has more to do with lack of widespread security issues than the model's capacity of finding them.

selectedambient 31 days ago |

sooo tldr; curl is safe.

plexescor 34 days ago |

I personally belive its a marketing stunt and they are just using actual humans to find the bugs/vulns

utopiah 34 days ago |

Won my bet "voted 10 [vulnerabilities] but in retrospect as you are familiar with Claude and such tooling if you already used any of recent model to done some kind of security review then I'd drop to 1 or even 0." https://mastodon.pirateparty.be/@utopiah/116537456780283420

perching_aix 34 days ago |

It's a shame he seems to reject the idea of actually diving in and using these tools interactively:

> It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway.

His expertise I think would elevate the results quite a bit. Although if he never uses LLMs, which it reads like he doesn't, I guess it might backfire just as well. Prompting style (still?) does matter after all, certainly in my experience anyways.

jph00 34 days ago | |

He states in the article that they use LLMs for this purpose and find them extremely useful.

perching_aix 34 days ago | | |

Which can be true without this also being true:

> using these tools interactively

I did read the article. It seems to me they're using LLMs in a prepared manner instead, as mere scanners that produce reports.

OtherShrezzing 34 days ago | |

He posts about his use of language models a lot on Mastodon[0]. He does lots with language models, but doesn't buy all the way into the hype. I'd say he's one of. most reasonable & balanced voices on the subject of AI use in software today. Happy to use the technology, more than willing to push back on marketing bs.

[0] https://mastodon.social/@bagder

perching_aix 34 days ago | | |

I see, thanks.

I checked back two weeks worth of posts, reposts, and replies there, and do not see anything suggesting so, so I'll have to take your word for this.

What I do see is him responding to seemingly rather frequent harassment about AI use @ curl however. The stance he takes in those cases is very reasonable (even if you don't use AI for scanning the codebase and contributions, threat actors will), it's unfortunate this topic is so political that he has to deal with this to such an extent.