Experimenting with policy gradient methods in Jax | Dark Hacker News