DatBench fixes VLM evals: 70% blindly solvable, 42% mislabeled, 35% prod gap(datologyai.com)5 points by hurrycane 134 days ago | 0 commentsNo comments yet