Before You Score the Model, Score the Benchmark | Dark Hacker News