Wrapping your head around self-attention and multi-head attention | Dark Hacker News