For academia: I expect that FAANG will continue to produce larger and larger language models as well. Probably new improvements in multi-task learning to keep on increasing model size - this will probably take ideas from relational / contrastive learning. Few-shot /zero-shot learning is also something that's come a long way the last couple of years. There will probably be a bunch of secondary papers about language models as well - hypothesis on how they work, explanation of corner cases, ways to deal with bias and fairness.
For industry: Feels like “making BERT / some other language model do things” is a common job nowadays. On the more engineering side - I think we’ll see more tools to quickly and efficiently fine-tune language models, especially tools that allow a human in the loop.
Overall it feels like we’re getting to a point where there’s a pretty standardized approach to simple NLP problems like text classification - no more real feature engineering, just throw BERT at the problem. I expect for this trend to continue - with more and more of a focus on dataset creation and validation and less of an emphasis on model architecture.
I also think there will be a rise in multi-modal language models - combination of language and vision models for example. But I think the more interesting application will be combining dense language model representations with sparser tabular data. Think of trying to predict a users likelihood to buy a product given a review of another product (dense embedding of text), but also their clicks over the last 2 hours. (sparser tabular data) - this feels like a much more common problem people have.
To stay updated: read papers (arxiv-sanity.com is a lifesaver) and watch talks (usually just on youtube or a lot of uni reading groups are public on zoom nowadays).