Beyond Benchmark Maxxing: Measuring Open Source Models as Real-World Agents | Dark Hacker News