Vending-Bench: Testing long-term coherence in agents | Dark Hacker News