Slop Bucket Idea – a dataset of AI slop (train AI what not to do) I just had this idea, you read it all the time AI slop is so prevalent people are getting banned for a year for submitting science papers to arXiv with it, moans of angst from developers, even Microsoft doing its own study where AI degrades the quality of simple documents, and the beloved em-dash. I don't really have the know-how or the time but it occurred to me, if we created a public data set that could be submitted to publicly, we could catalog and organize all the AI slop, the different types, with explanations about why it is slop and why not to do it, and then train a large language model using this data set included, to help correct itself. I don't really know the technical details of training a large language model,is this even possible? |