Why Can GPT Learn In-Context? Language Models Perform Gradient Descent | Dark Hacker News