Show HN: Aion-Torch – Adaptive residual scaling for deep Transformers | Dark Hacker News