Open-source, sanitized evaluation datasets for models that reason and code | Dark Hacker News