Show HN: Pencil Puzzle Bench – LLM Benchmark for Multi-Step Verifiable Reasoning | Dark Hacker News