Why Can GPT Learn In-Context? Language Models Perform Gradient Descent(arxiv.org)35 points by xaedes 3 years ago | 0 commentsNo comments yet