Agent-evals: Metacognitive scoring and boundary testing for LLM coding agents | Dark Hacker News