Open-world evaluations for measuring frontier AI capabilities [pdf] | Dark Hacker News