ZeroBench: An Impossible* Visual Benchmark for Contemporary Multimodal Models

ZeroBench: An Impossible* Visual Benchmark for Contemporary Multimodal Models(zerobench.github.io)

7 points by EliBullockPapa 1 year ago | 3 comments

casey2 1 year ago |

I would consider any system that solves similar problems to be AGI. What I suspect will happen is that this benchmark will saturate long before any such system exists.

imtringued 1 year ago | |

Basically that's the reason why they built this benchmark. By posing challenging, seemingly unfair, benchmark questions, the system will be forced to at least have some generalisation ability that it previously did not posess.

When I look at the benchmark questions, they all look like they are exploiting the fact that LLMs suck at composition of subtasks. They might be able to solve each individual problem, but not the combination of them.

Solving that would be a far cry away from AGI.

drakenot 1 year ago | |

Because of benchmark leakage / contamination?