GRPO vs. GDPO: Building Intuition for RL Reward Policies | Dark Hacker News