Chain-of-Thought Reasoning Is a Policy Improvement Operator(arxiv.org)2 points by hughzhang 2 years ago | 0 commentsNo comments yet