How are you running evals for AI agents? | Dark Hacker News