Understanding RL for model training, and future directions with GRAPE | Dark Hacker News