Anthropic's open-source framework for AI-powered vulnerability discovery

Anthropic's open-source framework for AI-powered vulnerability discovery(github.com)

181 points by binyu 2 hours ago | 64 comments

tptacek 2 hours ago |

The thing about things like this is that they're shop jigs. You can buy a crosscut sled if you really want to, but most woodworkers just make their own.

It was a different situation 2 years ago, when there was significant cost to building your own harness (but then: you probably weren't doing AI vuln research 2 years ago). Today, I think your best bet is to look at something like this for ideas, and then just ask for your own, to fit your own work style, with your own interface, your own notion of target and effort specification, and your own alerting.

redfloatplane 1 hour ago | |

"Shop jigs" is a great way to put it. I think a lot of software has gone from being made for general use to extremely individualised use. Before the Age of AI, it took so much human effort to write something that solved your problem that you might often go the extra mile so that others could re-use it. Now, it takes almost no effort, so the software stays ungeneralised. Some of the incentive has changed, I think. Most of the time I no longer share the things I've been building[0] because, for one thing they simply couldn't possibly have any benefit for others, and if they need something like it, they can build exactly the thing they want instead of having to extend or modify my thing. Like a jig!

0: https://redfloatplane.lol/blog/17-why-share/ (and related posts, I guess)

colmmacc 30 minutes ago | | |

Unless it is very specific to a proprietary product, craftspeople take their jigs with them from job to job, building up a personal library over a career. As a software developer I've always had a well-tuned IDE and shell config in a safe place.

Something I think about a lot is what is the equivalent for the software builders of today using AI tools? how do make these harnesses exportable and portable? You might think employers would be against this; make it more costly to leave. But I actually think most will favor this because it makes people more productive more quickly. But we have to find ways to normalize it and show that there are no security leaks in the process (like might make it in to a set of personal steering prompts).

andhug 58 minutes ago | | |

That’s an interesting way to say “code quality in the age of ai has gone out the window”

jorl17 26 minutes ago | |

This is exactly it.

I've said many times that I believe "using the computer will transparently involve having it write and run code for you" (and if you're not technical you won't even know it!). What you're saying goes in that direction as well.

I feel that it's often better for us to create purpose-built tools for our lives, and with every model release, the complexity of those tools grows.

These are really personal tools: they solve a problem that other people might have, but are very tied to your own specific way of working, and would be hard to explain or adapt to someone else. So: shop jigs.

I have about 10 custom scripts and programs that are like this -- I haven't felt like this since college! Back then I had all the time in the world to customize my setup...now I have agents!

In a way, I want to show this to all my friends, but whenever I mentally trace how that would go, I realize they wouldn't really understand a bunch of the quirks they have, because they are _my_ quirks. They're reasonably complex pieces of tech that solve my problems very well, which are themselves particular versions of broader problems, and which I (at least for now) have no interest in supporting.

It's so clear we're heading in this direction, and yet so many people still believe code will be for the elites. Maybe production-code...As for the rest, I think soon your mom and dad are going to have their computer running code it wrote to serve them. Security-wise it's scary, but it's exciting to think about!

newaccount12344 2 minutes ago |

Let's see how better it is in comparison to ZAP and Burp. I will test on https://github.com/SasanLabs/VulnerableApp which i built under SasanLabs

simonw 2 hours ago |

I wonder how much this thing costs to run.

https://github.com/anthropics/defending-code-reference-harne... says:

> As a rough guideline, expect ~10K uncached input tokens/min and ~2K output tokens/min per agent. You can scale parallelism up to your account's ITPM limit (roughly 10 agents per 100K ITPM).

My guess would be hundreds of dollars with Opus and thousands of dollars with Mythos.

majicDave 17 minutes ago |

It will always be easier to find a single hole than it will be to seal every one. The hackers have all the same tools, so this is an arms race that cannot be won.

dclavijo 1 hour ago |

Sligthly off topic: it seems that someone is in a dead/flag rampage killing all good links to Github in this post, why?

richardbarosky 2 hours ago |

To be sure, security is an amazing AI/LLM use case. A huge swath of the work is pattern matching known security issues against stuff that's very precise to analyze -- programming language text.

Something that stands out is that for the strongest use cases, AI companies will prefer to sell the technique as a service rather than its raw output. For use cases where the output is less valuable, tokens are sold. If AI tokens were so magical in creating new value in developing software applications generally, they wouldn't be selling tokens directly. They'd hoard the tokens are use them to dominate SaaS software in any industry they want.

The same way as someone selling an expensive course in the stock market is signaling that they have more to gain by selling the course rather than taking their knowledge and making money in the stock market directly.

lanyard-textile 2 hours ago |

>This repo is not maintained and is not accepting contributions.

Hm :)

Hamuko 1 hour ago | |

Why isn't Claude maintaining it?

skeledrew 1 hour ago | | |

They pretty much saying the efficacy of the tool can be tested by anyone to determine if it's worth purchasing the more polished and up-to-date commercial offering.

bobkb 1 hour ago |

Very interesting.

I have working on and using a similar tool for a while now :

https://github.com/bobinson/vulture

I have been struggling with false positives and using Claude + MCP as a poor man’s audit tool. As of last few days found better result with nvidia hosted models.

euroderf 47 minutes ago |

Is Anthropic still majority French-owned? It would explain a lot about their entire approach to the wider ecosystem.

bigmattystyles 2 hours ago |

I wonder how this sort of product is going over at Coverity and others like it. Proper SAST vendors I mean. Is it an existential threat?

rms2ds 1 hour ago | |

If I had to guess, they'l eventually just add it into their own product and hike the prices up to cover tokens lol.

crooked-v 1 hour ago |

I still find it so weird that they haven't bought out whoever controls the `anthropic` github username.

trilogic 2 hours ago |

https://github.com/Mainframework/Anthropic-Cybersecurity-Ski...

Be aware: the .py/s will not pass the antivirus but basically they do the job.

extr 1 hour ago |

Interesting it's in python!

wslh 1 hour ago |

Looking forward to trying this tomorrow (it's late here). Has anyone run it on a real codebase yet? Curious about setup friction, cost, and signal/noise.

bartoszcki 1 hour ago |

> Anthropic engineers on average ship 8x as much code per quarter

Are they making 8x more features or the same amount just with more code?

crooked-v 1 hour ago | |

Going by the issues on their repos, it's 2x features and 6x regressions of bugs that were "already fixed".

zoobab 1 hour ago |

Open source crap to connect to an LLM blob.

zoobab 1 hour ago |

'open source' crap to connect to their LLM blob.