Towards Monosemanticity: Decomposing Language Models with Dictionary Learning | Dark Hacker News