Batched reward model inference and Best-of-N sampling | Dark Hacker News