What color are your bits? (2004)(ansuz.sooke.bc.ca) |
What color are your bits? (2004)(ansuz.sooke.bc.ca) |
> Copyright holders worry about how to exercise control over the use of "their" creative material for training models; but that begs the question of whether copyright holders ever had, or should have, a right to any such control. If a human can read a book and learn from it, and then write their own books, why shouldn't a computer?
There’s a small amount of irony in an article that’s discussing copyright, and the invisible but critical context of information, then dismissing the context of copying when it comes to copyright, as well as confusing what copyright protects. I’m certain the author knows that copyright does not protect ideas, it does not protect “colour”, it deliberately only protects the “bits”. In US copyright law this is called the “fixation” of a work. The Berne Convention uses similar terminology: “works shall not be protected unless they have been fixed in some material form.”
AI’s “learning” has a different colour than human learning. This has been debated at length on HN and elsewhere, and in the courts, but it’s definitely wildly misleading to compare ChatGPT training on all books ever written and then being distributed (for a profit) to everyone, to one human reading one book and learning something from it.
IP courts will have some truly novel questions before them this century.
The flip side is that this is why the article’s discussion about randomness and monkeys on typewriters is irrelevant to copyright law. It’s a copyright violation to produce the same “fixation” no matter how you do it. If you generated a random sequence of characters, and it happened to match a NYT best selling book, you violate the book author’s copyrights, and claiming it was random isn’t a viable defense. Intent to copy can make it worse, but lack of intent does not absolve. There is precedent for people coming up independently with the same songs and one being successfully sued.
Do note that there are other laws that might cover plagiarism of ideas, trademarks, code, etc., copyright isn’t the only consideration, but copyright seems to be often misunderstood. We definitely have some novel questions because of the scale of AI’s copying, the nature of training and the provenance of the training data, and because of AI’s growing ability to skirt copyright law while actually copying.
You’re right that in certain limited circumstances, copyright will protect fictional characters. To protect a character, the character must be “well delineated”, and this has proven to be a pretty high bar. https://en.wikipedia.org/wiki/Copyright_protection_for_ficti...
What if I did something similar, but rather than a simple Ctrl-H replacement, I asked an LLM to rewrite each paragraph in different words? What if I did the rewrite myself, by hand? Is there a difference? If so, why? If not, why not?
The LLMs question is more complicated. If you ask AI for a rewrite of a specific work, that’s infringement on grounds of originality. It’s also infringement when you don’t have the rights to the work that you feed to the LLM. This is part of the debate over AI training, and is covered in the Copyright Office’s draft on generative AI under the Prima Facie Infringement section https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
AI companies aren’t arguing over derived works, they are trying to get approval to classify AI training as Fair Use, because they know they are infringing existing copyright law. The Copyright Office might end up allowing it and changing the law.