Show HN: Benchmarking Tangible Interface Understanding in Long-Horizon Tasks | Dark Hacker News