More writers sue OpenAI for copyright infringement over AI training

More writers sue OpenAI for copyright infringement over AI training(reuters.com)

54 points by kurhan 2 years ago | 66 comments

Random thought: my blog is licensed under a Creative Commons license [1] that allows you to use and transform my content as long as you give attribution and distribute your contributions under the same terms.

I found the OpenAI bot scraping my blog recently. Assuming they used that data, when will they attribute me?

[1] https://creativecommons.org/licenses/by-sa/4.0/

PaulKeeble 2 years ago | |

These AI companies not complying with the licenses on code has meant since Microsoft released their code generator I haven't contributed a single line of open source software nor released any of my projects that way. I removed a bunch a while ago and I will likely remove all of them when I get around to it. I have been fixing bugs and releasing open source projects for decades and I just stopped the moment they did that. Open source is dead to me if the licenses can't be enforced.

extra88 2 years ago | |

Your license doesn't override copyright law.

Given that Google successfully used a fair use defense in Authors Guild, Inc. v. Google, Inc., I think it's likely OpenAI and the others will also win in court.

I do think it's possible for specific uses of the output of LLMs to be copyright infringement. That's why it's interesting to see Microsoft to indemnify customers of their commercial products in the event a case is brought against the customer. This is smart on Microsoft's part; the risk probably isn't very high and by making it a non-issue for their customers, many more will feel comfortable using their LLM-based features and services.

plagiarist 2 years ago | |

Well it all comes down to whether training an LLM is fair use or not. I think it is likely that courts rule it is transformative enough that training is allowed regardless of what terms you have for the use of content.

yk 2 years ago | |

Interesting question, continuing on this, since they probably used GPL-3 code with the Affero clause, do they have to open source GPT? (The Affero clause is I believe the more directly applicable license thingy, though CC by-sa should also work.)

https://www.gnu.org/licenses/agpl-3.0.en.html

guilhas 2 years ago | | |

I think all the code license question does not matter much, because the code is data input, not a part of their actual program

Like githubs servers host AGPL code as data, without having to be open-source

The perceived problem there, is if their model generates an exact copy of some AGPL code, and you use it in your project unknowingly, and then you get can sued

belter 2 years ago | |

How recent? Because ChatGPT is always on the same mantra, of its training being from back September 2021 with no updates...Even for ChatGPT-4

richardanaya 2 years ago | |

Where's your attributions for all the words written in your comment ;P you remixed the words and grammar patterns from other people's creative common's licenses of other people writings!

Note: i'm declaring my comment license as https://creativecommons.org/licenses/by-sa/4.0/

So if you remix or transform my comment by responding it, please attribute to me your response.

edgyquant 2 years ago | | |

Humans are not LLMs trained and operated by a company for profit. Your argument is that LLMs hold all the same basic rights as humans but they hold (and should hold) exactly none.

dvngnt_ 2 years ago | | |

which works specially?

profit vs non profit also makes a difference

politician 2 years ago | |

I believe that OpenAI is not required to attribute you if the output was produced by an OpenAI-operated AI model because the AI is not constrained by the Berne Convention treaty regime in the same way that people are.

I believe that this fact is and will be exploited to strip copyright and effectively transfer ownership using cleanroom/firewall techniques.

yieldcrv 2 years ago | |

training will be ruled fair use which doesn't require any license, while there is no lawsuit on the output

guiambros 2 years ago | |

> Assuming they used that data...

That's the key part. You haven't yet proved they have actually used your content for anything (other than, potentially, read the license to decide if they should include or discard from their training set).

But in practice we'll never know for sure if they are respecting the terms of licenses until 1) this is tested in court, or 2) there's some internal leak that points into either direction.

JamisonM 2 years ago | | |

I expect that OpenAI would concede that they used the data in any court case immediately to get that issue off the table, I really don't think they have a strong interest in foot-dragging on this stuff, right?

I would think OpenAI wants the thornier legal issues actually settled so that the whole ecosystem can grow within those terms & they can lobby for the legal changes they need/want?

mannyv 2 years ago |

All these lawsuits will die. Why?

Because people train on corpuses of data all the time, without a license or any attribution.

Every piece of text a writer reads is training that writer. Every image an artist sees helps to train that artist. Every sound a musician hears is training that musician.

That doesn't mean they can't exclude their works from training via a license going foreward. But that becomes an enforcement problem.

plagiarist 2 years ago | |

IIRC courts have already ruled AI-generated works cannot have copyright. So there is already a legal distinction between a human and a model creating works.

I also doubt "humans are just a larger Markov chain than the LLM and they're allowed to" will hold up in court.

brookst 2 years ago | | |

I don’t see what eligibility to have works protected has to do with legality of learning.

I really hope “copyright can be used to prohibit reading and learning” does not hold up in court.

Copyright is, and should be, a protection from unauthorized reproduction. Extending it to protect the abstract ideas would be a disaster. And extending it to control stylistic learning would be even worse.

sensanaty 2 years ago | |

It's a good thing humans and computers are 2 wholly separate categories of things that have 0 things related to them other than computers being anthropomorphized by AI sycophants!

anonymousab 2 years ago | |

People can do a lot of things that we don't legally allow machines or automation to do.

Ukv 2 years ago | | |

True of some things, but not of Fair Use. Automatically generating thumbnails is generally Fair Use, for example.

gruez 2 years ago | | |

Like what? The only things I can think of relate to quality/safety (eg. drivers or lawyers).

edgyquant 2 years ago | |

LLMs are not humans and your anthropomorphizing argument is idiotic

buildbot 2 years ago |

Do cliff notes of books and plays infringe/need a license? If so, that seems like they’d have a possible case. Maybe. If not, well… maybe openAI infringed by not buying their original copy, but not sure that feeding it into a bunch of math is going to be copyright infringement.

throwaway290 2 years ago | |

> the system can accurately summarize their works and generate text that mimics their styles

Cliff notes is not what lets you replicate the style of the author etc.

And yeah, you can use "it's just feeding it into a bunch of math" to justify nearly anything that involves software including good old piracy. What matters is what math is used for. (Spoiler: line up Microsoft's pockets at the expense of actual writers in this case.)

rmbyrro 2 years ago | | |

Not at all like piracy.

When someone pirates a book, they're replacing the original without consent or remuneration to the copyright holders.

When you train an AI on the contents of a book, you're not replacing it. If someone is interested in the content, they still need to buy it. Using ChatGPT is not a substitute. If it is, they're gonna have to prove it in court, but I doubt they'll be able to.

extragood 2 years ago | | |

Copyright doesn't protect style or genre [1], so these suits seem destined to fail. That said, it seems like it is time to reexamine those laws in light of current technology before it kills off creative works.

https://creativecommons.org/2023/03/23/the-complex-world-of-....

yieldcrv 2 years ago |

The fair use argument is quite strong

If you dissect the plaintiffs claim they are arbitrarily conflating training and regurgitating

Training is using for criticism and comparison purposes, hence fair use

And there is no lawsuit against what it regurgitates and the purpose of its output, whether someone asks it to give a list for comparison purposes, or specifically asks it for a story that has a plagiarized result

AlbertCory 2 years ago |

I'm miffed. I tried a couple characters from my books, and zilch:

===== who is dan markunas

ChatGPT I'm sorry, but I don't have any information on a person named Dan Markunas in my database ....

who is janet saunders ChatGPT I'm sorry, but I don't have any specific information about a person named Janet Saunders in my database,

===========

SanderNL 2 years ago | |

Your book was published somewhere mid-2021, right?

AlbertCory 2 years ago | | |

Two books. The second was after their cutoff date.

mistrial9 2 years ago |

this is a great lawsuit! if you read the complaint, they catch OpenAI dead-to-rights .. asking about plot details with names from the books, asking to write a paragraph in the style of the author in that book, and a diversity of authors that shows social awareness.. great support for this from California

tpmoney 2 years ago | |

If you go to wikipedia and look up a book, you'll likely find plenty of plot details, including character names. Is this also infringing?

As far as style goes, copyright doesn't protect that. Trademark MIGHT if your style is distinctive enough to be a trademark (and is used as such), but the "style" of a writer is largely about tempo and word choices, none of which are subject to copyright protections.

mistrial9 2 years ago | | |

I think we are now reproducing multiple generations of debate on this topic, in a few go-rounds.. Let's note that among the four largest economies in the world, they each have different rules for this.

tsegratis 2 years ago |

I'm waiting for the day OpenAI sues humans for infringement of it's prompt output

racked 2 years ago |

It's akin to suing a person for memorizing things from a book. Don't complain, go write something.

RecycledEle 2 years ago |

Is there a list of the critters suing AI companies so I can boycott them?

lamp987 2 years ago | |

Why would you want to do that?

RecycledEle 2 years ago | | |

AI is the next industrial revolution. It will greatly increase human productivity. Anyone standing against it is an enemy of all mankind.