Efficient Pre-Training with Token Superposition | Dark Hacker News