Show HN: AI Browser Agent Leaderboard(leaderboard.steel.dev) |
Show HN: AI Browser Agent Leaderboard(leaderboard.steel.dev) |
IMHO, if you're building AI products, most of the time building and running your own evals is the only right way to build something good.
BTW - Arch looks super cool! Just starred and looking forward to playing around with it :)
Since working on Steel, we've seen a ton of people have a hard time putting the browser agent space and how it's progressing into perspective and it felt odd to us that there were no centralized leaderboards like there were for so many other agentic use cases.
So we launched this leaderboard to help! It's open-source and we're open to any contributions we may be missing. We're committed to keeping this up to date as the space progresses (which it seems to be doing quite quickly).
Let us know if you have any feedback/thoughts :)