The inefficiency of RL, and implications for RLVR progress | Dark Hacker News