ZeroBench: An Impossible* Visual Benchmark for Contemporary Multimodal Models(zerobench.github.io) |
ZeroBench: An Impossible* Visual Benchmark for Contemporary Multimodal Models(zerobench.github.io) |
When I look at the benchmark questions, they all look like they are exploiting the fact that LLMs suck at composition of subtasks. They might be able to solve each individual problem, but not the combination of them.
Solving that would be a far cry away from AGI.