The Pile: An 800GB Dataset of Diverse Text for Language Modeling [pdf](pile.eleuther.ai)1 points by nixtaken 5 years ago | 0 comments