The Evaluability Gap: Designing for Scalable Human Review of AI Output | Dark Hacker News