DeepSeek R1 Theory Overview (GRPO and RL and SFT) | Dark Hacker News