Measuring What Matters: Construct Validity in Large Language Model Benchmarks(oxrml.com)3 points by Cynddl 195 days ago | 2 comments