Benchmarking LLM Agents on Consequential Real World Tasks | Dark Hacker News