Speeding up LLM Inference with parallel decoding | Dark Hacker News