AI interpretability tools fail to predict inner misalignment | Dark Hacker News