What the HellaSwag? On the Validity of Common-Sense Reasoning Benchmarks | Dark Hacker News