Autoregressive next token prediction and KV Cache in transformers | Dark Hacker News