(Partly copied from
https://news.ycombinator.com/item?id=34640251.)
On models: Obviously, almost everything is Transformer nowadays (Attention is all you need paper). However, I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.
It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).
Some self-promotion: I think my Phd thesis does a good job on giving an overview on this: https://www-i6.informatik.rwth-aachen.de/publications/downlo...
Diffusion models is also another recent different kind of model.
Then, a separate topic is the training aspect. Most papers do supervised training, using cross entropy loss to the ground-truth target. However, there are many others:
There is CLIP to combine text and image modalities.
There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.
And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.