We used sparse autoencoders to explain LLM moderation flags of violent threats | Dark Hacker News