Chain-of-Thought Reasoning Is a Policy Improvement Operator | Dark Hacker News