Batched reward model inference and Best-of-N sampling(raw.sh)34 points by rawsh 1 year ago | 0 commentsNo comments yet