Digital Agent outperforms o1 by 15% – trained with new RL-variant similar to R1(arxiv.org)11 points by let_tim_cook_ 1 year ago | 0 commentsNo comments yet