Agent-evals: Metacognitive scoring and boundary testing for LLM coding agents(thinkwright.ai)2 points by oceanwaves 93 days ago | 0 comments