Show HN: Vision AI Checkup, an Optometrist for VLMs(visioncheckup.com) Evaluating visual capabilities of language models is hard. On the one end of the evaluation spectrum, we have vibe checks which, while useful for building intuition, are time-consuming to run across a dozen or more models. On the other end, we have large benchmarks which are so large that they are intractable to most users. Vision AI Checkup is a new tool for evaluating VLMs. The site is made up of hand-crafted prompts focused on real-world problems: defect detection, understanding how the position of one object relates to another, colour understanding, and more. Our prompts are especially focused on industrial tasks -- serial number reading, assembly line understanding, and more -- although we're excited to add more general prompts. The tool lets you see how models do across categories of prompts, and how different models do on a single prompt. We have open sourced the codebase, with instructions on how to add a prompt to the assessment: https://github.com/roboflow/vision-ai-checkup. You can also add new models. We'd love feedback and, also, ideas for areas where VLMs struggle that you'd like to see assessed! |