Bootstrapping AI Evals from Context (Why 'Just Asking Claude' Fails) | Dark Hacker News