LLM Speedrunner: Eval for frontier models to reproduce scientific findings(github.com)2 points by zerojames 325 days ago | 0 commentsNo comments yet