Beyond 80/20: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning(arxiv.org)3 points by mdp2021 20 days ago | 0 commentsNo comments yet