SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks(arxiv.org)2 points by FiberBundle 97 days ago | 0 comments