Generalized on-policy distillation with reward extrapolation(arxiv.org)3 points by fzliu 93 days ago | 0 commentsNo comments yet