SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks(arxiv.org)2 points by FiberBundle 51 days ago | 0 comments