3T Token Open Corpus for Language Model Pretraining(blog.allenai.org) |
3T Token Open Corpus for Language Model Pretraining(blog.allenai.org) |
Reasonably speaking, nobody can use this dataset for anything of value. I really wonder who comes up with these "open-source" products with such licenses and why they even bother. I guess Marketing?