Grpo explained: group relative policy optimization for LLM finetuning | Dark Hacker News