Nightshade: An offensive tool for artists against AI art generators(nightshade.cs.uchicago.edu) |
Nightshade: An offensive tool for artists against AI art generators(nightshade.cs.uchicago.edu) |
I think it's worthwhile for such discussion to happen in the open. If the tool can be defeated through simple means, it's better for everybody to know that, right?
Let me rephrase: Would AI-powered upscaling/downscaling (not a simple deterministic mathematical scaling) not defeat this at a conceptual level?
To train a model on the data.
If they don't then whatever social network or other services where things can shared/viewed by large groups to millions & are posted publicly need to be labeled "We can not verify veracity of this content."
I want a real internet ..this AI stuff is just triple fold increasing fake crap on the Internet and in turn / time our trust in it!
Might this "flood the zone" approach also have -some- efficacy against human copycats?
If you ask me, this is 100% applicable in this case, so I wonder what a judge would rule.
This will work about as well...
Oh, I forget, fighting music pirating was considered an evil thing to do on HN. "pirating is not stealing, is copyright infringement", right? Unlike training neural nets on internet content which of course is "stealing".
Many people would in fact argue that training AI on people's art without permission is copyright infringement, since the thing it (according to detractors) does is infringe copyright by generating knockoffs of people's work.
You will see some people use the term "stealing" but they're usually referring to how these AIs are sold/operated by for-profit companies that want to make money off artists' work without compensating them. I think it's not unreasonable to call that "stealing" even if the legal definition doesn't necessarily fit 100%.
The music industry is also not really a very good comparison point for independent artists... there is no Big Art equivalent that has a stranglehold on the legislature and judiciary like the RIAA/MPAA do.
AI is sampling other's works.
Musicians can and do sample. They also obtain clearance for commercial works, pay royalties if required, AND credit the samples if required.
AI "art" does none of that.
Some projects against this behavior:
What we really need is clarification of the extent that copyright protection extends to similar works. Most likely from an AI analysis of case law.
> • can inject a small number of poison data (image/text pairs) to the model’s training dataset
I think thoes are bad assumption, labelling is more and more done by some labelling AI.
This, of course, assumes that "poisoning" actually works. Glaze and Nightshade and similar are very much akin to the various documented attacks on facial recognition systems. The attack does not exploit some fundamental flaw in how the systems work, but specific characteristics in a given implementation and version.
This matters because it means that later versions and models will inevitably not have the same vulnerabilities. The result is that any given defensive transformation should be expected to be only narrowly effective.
If it doesn't matter, then neither does the poisoning matter.
You sure about that?
Entire legal firm empires have been built on the licensing, negotiations, and fees that make up the industry.
I'm ain't talking about some dude on YouTube or Soundcloud. Few people care about some rando on Soundcloud. Those moles aren't big enough to whack. Vanilla Ice and MC Hammer were. OpenAI is as well.
There's even a company that specializes in sample clearance: https://sampleclearance.com
More info: https://www.soundonsound.com/sound-advice/sample-clearance
Also:
>not turned into a part of a pattern used to generate new sounds like what AI generators do with images
This is demonstrably false. Multiple individuals have repeatedly been able to extract original images from AI generators.
Here's one-- Extracting Training Data from Diffusion Models https://arxiv.org/abs/2301.13188
Text, too: https://arxiv.org/abs/2311.17035
Causing car crashes isn't hard (https://xkcd.com/1958/). That doesn't mean Car Crash™ International®'s decision-makers know how to do it: they probably don't even know what considerations go into traffic engineering, or how anyone can just buy road paint from that shop over there.
It's everybody's responsibility to keep Car Crash™ International® from existing; but failing that, it's everybody's responsibility to not tell them how to cause car crashes.
Anyone arguing against this technology following the line of reasoning you present is operating in adverse to the good of society. Especially if their only motive is economic viability.
> simply acquiring only training data you have permission to use
Currently it's generally infeasible to obtain licenses at the required scale.
When attempting to develop a model that can describe photos for visually impaired users, I had even tried to reach out to obtain a license from Getty. They repeatedly told me that they don't license images for machine learning[0].
I think it's easy to say "well too bad, it doesn't deserve to exist" if you're just thinking about DALL-E 3, but there's a huge number of positive and far less-controversial applications of machine learning that benefit from web-scale pretraining and foundation models - spam filtering, tumour segmentation, voice transcription, language translation, defect detection, etc.
However - and this is a big however - I don't believe it deserves the legal protection to be used for profit.
I am of the opinion that if you train your model on data that you do not hold the rights for, your usage should be handled similarly to most fair use laws. It's fine to use it for your personal projects, for research and education, etc. but it is not OK to use it for commercial endeavors.
I would believe there is enough content out there to get reasonably good results.
However much we might wish that it was not true, ideas are not rivalrous. If you share an idea with another person, they now have that idea too.
If you share words on paper, then someone with eyes and a brain might memorize them (or much more likely, just grasp and retain the ideas conveyed in the words).
If you let someone hear your music, then the ideas (phrasing, style, melody, etc) in that music are transferred.
If you let people see a visual work, then the stylistic and content elements of that work are potentially absorbed by the audience.
We have copyright to protect specific embodiments, but mostly if you try to share ideas with others without letting them use the ideas you shared, then you are in for a life of frustration and escalating arms race.
I completely sympathize with anyone who had a great idea and spent a lot of effort to realize it. If I invented/created something awesome I would be hurt and angry if someone “copied” it. But the hard cold reality is that you cannot “own” an idea.
The above comment is true about the properties of information, as explained via the lens of economics. [1]
However, one ignores ownership as defined by various systems (including the rule of law and social conventions) at one's own peril. Such systems can also present a "hard cold reality" that can bankrupt or ostracize you.
[1] Don't let the apparent confidence and technicality of the language of economists fool you. Economics isn't the only game in town. There are other ways to model and frame the world.
[2] Dangling footnote warning. I think it is instructive to recognize that the field of economics has historically shown a kind of inferiority complex w.r.t. physics. Some economists ascribe to the level of rigor found in physics and that is well and good, but perhaps that effort should not be taken too seriously nor too far, since economics as a field operates at a different level. IMO, it would be wise for more in the field to eat a slice of humble pie.
[3] Ibid. It is well-known that economists can be "hired guns" used to "prove" a wide variety of things, many of which are subjective. My point: you can hire an economist to shore up one's political proposals. Is the same true of physicists? Hopefully not to the same degree. Perhaps there are some cases of hucksterism, but nothing like the history of economists-wagging-the-dog! At some point, the electron tunnels or it does not.
But whatever game gives the most predictive power is going to win.
> In economics, a good is said to be rivalrous or a rival if its consumption by one consumer prevents simultaneous consumption by other consumers, or if consumption by one party reduces the ability of another party to consume it. - Wikipedia: Rivalry (economics)
Also: we should recognize that stating something as rivalrous or not is descriptive (what exists) not normative (what should be).
If there was a training process that let us pick a minimal sample of examples and turn it into a general purpose art generator or text generator, I think people would have been fine with that. But that's not what any of these models do. They were trained on shittons of creative expression, and there's statistical evidence that the models retain that expression, in a way that is fundamentally different from how humans remember, misremember, adapt, remix, and/or "play around with" other people's creativity.
[0] You called these "embodiments", but I believe you're trying to invoke the idea/expression divide, so I'll run with that.
[1] Or at least it did. OpenAI now filters out conversations that trip the bug.
The closest parallel I can think of is that humans can ingest chocolate but dogs should not.
The way societies work is that the members of the society contribute and benefit in prescribed ways. Societies with lots of excess production may at times choose to allow creative works to be monetized. Societies without much surplus are extremely unlikely to do so, eg a society with not enough food for everyone to eat in the middle of a famine is extremely unlikely to feed people who only create art; those people will have to contribute in some other way.
I think it is a very modern western idea (less than a century old) that many artists can dedicate themselves solely to producing the art they want to produce. In all other times artists either had day jobs or worked on commission.
Now, if the original copyrighted work can be extracted or reproduced from the model, that’s obviously copyright infringement.
OpenAI etc should ensure they don’t do that.
Because this isn’t simple law. It feels like simple infringement, but there’s no actual copying going on. You can’t open up the database and find a given duplicate of a work. Instead you have some abstraction of what it takes to get to a given work.
Also it’s important to point out that nothing in the law is sure. A good lawyer, a sympathetic judge, a bored/interested/contrarian juror, etc can render “settled law” unsettled in an instant. The law is not a set of board game rules.
“One may well ask: ‘How can you advocate breaking some laws and obeying others?’ The answer lies in the fact that there are two types of laws: just and unjust. I would be the first to advocate obeying just laws. One has not only a legal but a moral responsibility to obey just laws. Conversely, one has a moral responsibility to disobey unjust laws. I would agree with St. Augustine that ‘an unjust law is no law at all.’”
I'm not convinced that most copyright infringements are immoral regardless of their legal status.
If you post your images for the world to see, and someone uses that image, you are not harmed.
The idea that the world owes you something after you deliberately shared it with others seems bizarre.
It obviously is not "simple law".
Let's talk about ownership in a broader sense. In practice, one cannot effectively own (retain possession of) something without some combination of physical capability or coercion (or threat of coercion). Meaning: maintaining ownership of anything (physical or otherwise) often depends on the rule of law.
You can't monopolize an idea.
Copyright law is a prescription, not a description. Copyright law demands that everyone play along with the lie that is intellectual monopoly. The effectiveness of that demand depends on how well it can be enforced.
Playing pretend during the age of the printing press may have been easy enough to coordinate, but it's practically impossible here in the digital age.
If we were to increase enforcement to the point of effectiveness, then what society would be left to participate? Surely not a society I am keen to be a part of.
I now declare that I own Fortnite.
Where’s my money, Epic?
Do I really have to explain this? You know I don't. Do better.
Also who said anything about selling?
Even when talking about various kinds of scientific and engineering fields, predictive power isn't the only criteria, much less the best. Sometimes the simpler, less accurate models work well enough with less informational and computational cost.
Even if we focus on prediction (as opposed to say statistical inference), often people want some kind of hybrid. Perhaps a blend of satisficing with limited information, scoped action spaces, and bounded computation; i.e. good enough given the information we have to make the decisions we can actuate with some computational budget.
Say I train a machine vision model that, after having pretrained on ImageNet or similar, detects deformities in a material for a small company that manufactures that material. Do you not think that would be fair use, despite being commercial?
To me it seems highly transformative (a defect detection model is entirely outside the original images' purposes) and does not at all impact the market of the works.
Moreover, you said it was "Baffling to see anyone argue against this technology" but it seems there are at least some models (like if my above detector was non-commercial) that you're ethically okay with and could be affected by this poisoning.
>Moreover, you said it was "Baffling to see anyone argue against this technology" but it seems there are at least some models (like if my above detector was non-commercial) that you're ethically okay with and could be affected by this poisoning.
Just because I think there are situations where it's not ethically wrong to use someone's work without permission does not mean I think it's ethically wrong for someone to protect their work any way they see fit.
To use an extreme example: I do not think it's wrong for a starving man to steal food. I also do not think it's wrong for someone to defend their food from being stolen, regardless of the morality of the thieves' motivation.
I'd argue that essentially all work based off the work of others. I can only draw a "car" better than a random guess because I've seen many (individually copyrighted) car designs.
That's not to say we inherently have to treat use of statistical models the same, but rather that there does have to be a line somewhere to define where a new work, while dependant on previous works in aggregate, is sufficiently transformative - not carrying substantial similarity to any particular existing work that played a part in its creation - and can therefore be used by the author to make a living.
That line has to be placed in a way that prioritizes the progress of sciences and useful arts, rather than just enriching rightsholder megacorps like Getty Images/Universal Music. It should certainly allow training something like a tumor segmentation network, rather than rendering it infeasible.
Also, while whether it's morally okay is relevant and worth discussing, I think the question still stands of whether you believe my example would count as fair use, given the transformative nature and lack of impact on the market of the original work.
> Just because I think there are situations where it's not ethically wrong to use someone's work without permission does not mean I think it's ethically wrong for someone to protect their work any way they see fit.
> To use an extreme example: I do not think it's wrong for a starving man to steal food. I also do not think it's wrong for someone to defend their food from being stolen, regardless of the morality of the thieves' motivation.
I personally agree that they have a right to do so, but I don't think it'd be "baffling" that the [starving man/person training a tumor detector] would be against it, and it's likely not a "non-issue" for them to obtain sufficient [food/data] through other means.
Particularly since there are already means to opt-out that are respected by scrapers, and this is instead an attempt to do active damage. I guess in the analogy it'd be leaving out poisoned bread, although that's more extreme than I intend.
(BTW I forbid you from using my comment here in your future reasoning)
Let me define a few cases of 'uses that image' and see where your line in the sand drops
* If someone used that image as part of an advertising campaign for their product, they are profiting off your work. Are you not harmed?
* If someone used that image and pretended they created it. Are you not harmed?
* If someone used that image and sold it directly. Are you not harmed?
The others aren't harmful, unless you're defining harm to include the loss of something which someone believes they are entitled to, a concept which is fraught with problems.
Creating an image (or any non-physical creation) doesn't obligate the world to compensate you for your work. If you choose to give it away by posting it on the internet, that's your choice, but you are entitled to nothing.
You're right and wrong. You're right because most infringement is from people who can do minimal damage and in fact so more help by giving awareness to your works by sharing. But this is only becsuse copyright it working (most of the time) against corporate entities who don't want to leave any room for legalities to come in.
If copyright ended I'd bet my bottom dollar Disney and all the other billionaires companies would be spamming any and everything that gets moderately popular. And Disney can put advertise the original artist easily.
Your yard is your property. If I shit on your property then I've trespassed and damaged your property.
Digital images are nonrivalrous.
If you make an infinite number of photocopies and hand them out for free to all passing strangers (aka sharing an image online) then if a stranger uses one of those photocopies, they haven't trespassed or damaged your property.
As for the card example it is on you to keep secrets secret. It is clearly a fault with how credit cards work and for things like that you should get insurance. Keep your card safe, or better yet, use cash. We can call it theft, doesn't matter to my point.
There is no "taking" or "grabbing" anything, I visited your website and was served a copy of the data because you made it do so. You wanted it to happen, for the public to see it, that was your goal. You expected it to happen. Do you disagree that me acquiring a copy was consensual? If so it is very different from the hypotheticals you're posing don't you think?
Once a copy is in my possession you would need to initiate violence to stop me from training a model on it which puts you on the wrong side of the moral line.
> Also who said anything about selling?
All the corporations that are offering AI as a paid service?
Okay, extract the images from a Stable Diffusion checkpoint then. I'll wait.
It's not like lossy compression CAN'T be fair use or transformative. I'm sure you can imagine how that is possible given the many ways an image can be processed.
> All the corporations that are offering AI as a paid service?
Am I them?
But from there my point was that, in order to ask what "should be", you must ask what can be. I don't think there is a way for ideas to be non-rivalrous, period.
So sure, saying ideas are not rivalrous is descriptive, but there isn't really another option.
I'm open to the idea that copyright law might need to change, but it doesn't seem controversial to note that scraping actual creative works to extract elements for an algorithm to generate new works crosses a number of worrying lines.
So distributing a zip file of a copyrighted work subverts the copyright?
Can you list the principles across human civilizations?
Of course you can. Anyone can say anything.
Is “keeping the list of principles a secret” a principle like the rules of Fight Club? It is not unreasonable to ask for a link or summary of this immutable set of ground truths.
> What principle do you claim exists to contradict that claim?
I could not answer this question without being able to double check. The only principle that comes to mind is the principle of ligma
What does that have to do with rivalry? This doesn't dispute the idea that AI is indeed competing with artists. You're just saying artists don't deserve to get paid.
Regardless, some artists will give up but some will simply be more careful with where and how they post their art with tools like these. AI doesn't have a right to the artist's images neither.
If OpenAI's output reproduces a copyrighted image with one pixel changed, is that valid in your view? Where does the line end?
Copyrighted material should never be used for nonacademic language models. "Garbage in, garbage out." All results are tainted.
"But being forced to use non-copyrighted works will only slow things down!"
Maybe that's a good thing, too. Copyright is something every industry has to accept and deal with -- LLMs don't get a "cool tech, do whatever" get-out-of-jail free card.
Whatever the standard is for humans doing the exact same thing.
By virtue of the fact that it is "fuzzy" and open to interpretation, we're going to see lawsuits, the resulting chilling effects of those lawsuits will blunt US tech firms from the practice of ingesting large amounts of copywritten material without a second thought. US tech firms will be giving it a second, third, fourth, etc. thought once the lawsuits start.
It's gonna be like submarine patents on steroids.
Like I said, I'm down for letting the courts decide. But AI supporters should probably avoid kicking the hornets' nests regarding copyright.
I think there's an important distinction to be made here - "can" be reproduced isn't infringement, only actual reproduction is (and degrees thereof not consisting of sufficiently transformative or fair use).
Trivially a typewriter can reproduce a copyrighted book. Less trivially Google books, with iirc stores the full text of copywrited works has been judged to be legal.
It's the purpose of some of those bytes that turns it into DRM.
> Copyright law demands that everyone play along with the lie that is intellectual monopoly.
Saying "lie" suggests willful deception. Perhaps you mean "socially constructed"? Combined with "playing pretend" makes it read a bit like a rant.
> Then let's use a more precise term that is also present in law: monopoly.
Ok, in law and economics, the core idea of monopoly has to do with dominant market power that crowds out the existence of others. But your other uses of "monopoly" don't match that. For example, you talk about ideas and "intellectual monopoly". What do you mean?
It seems like some of your uses of "monopoly" are not about markets but instead are closer to the idea of retaining sole ownership.
> If we were to increase enforcement to the point of effectiveness, then what society would be left to participate? Surely not a society I am keen to be a part of.
It appears you've already presupposed how things would play out, but I'm not convinced. What is your metric of effectiveness? A scale is better than some arbitrary threshold.
Have you compared copyright laws and enforcement of the U.S. versus others?
How far would you go: would you say that i.e. society would be better off without copyright law? By what standard?
I most certainly do mean to call out copyright as willful, but it's not a deception, at least not a successful one: everyone knows it is false. That's why it's enforced by law! Instead of people being deceived, people must instead pretend to be so. Each of us must behave as if the very concept of Micky Mouse is immortal and immutable; and if we don't, the law will punish accordingly.
Every film on Netflix, every song on Spotify, etc. can obviously be copied any number of times by any number of people at any place on Earth. We are all acutely aware of this fact, but copyright tells us, "Pretend you can't, or get prosecuted."
So is it truly effective? Millions of people are not playing along. Millions of artists are honestly trying to participate in this market, and the market is failing them. Is that because we need more people to play along? Rightsholders like the MPAA say that piracy is theft, and that every copy that isn't paid for is a direct cost to their business. How many of us are truly willing to pretend that far?
What if we all just stopped? Would art suddenly be unprofitable for everyone, including the lucky few who turn a profit today? I don't believe that for a second.
The only argument I have ever heard in favor of copyright is this: Every artist deserves a living. I have seen time and time again real living artists fail to earn a living from their copyright. I have seen time and time again real living artists share their work for free, choosing to make their living by more stable means.
Every person living deserves a living. Fix that, and we will fix the problem copyright pretends to solve, and more.
Unless I'm misunderstanding you, this is not even wrong. What about copyright law is empirically false? Such a question is non-sensical.
Your comment redefines the word "false" in a way that muddles understanding. You aren't alone -- some philosophers do this -- but it tends to confuse rather than clarify. I've developed antibodies for language abuse of this kind. Such language can even have the effect of making language charged and divisive.
Many people understand the value of using the words _true_ and _false_ to apply to the assessment of _facts_. This is a useful convention. (To be clear, I'm not opposed to bending language when it is useful.)
To give a usage example: a misguided law is not _false_. Such a statement is non-sensical. We have clear phrases for this kind of law, such poorly designed, having unintended consequences, etc. We could go further and say that a law is i.e. immoral or pointless. You are likely making those kinds of claims. By using those phrases, we can have a high-bandwidth conversation much more quickly.
I see the dark arts of rhetoric used here, and it is shameful. The portion I quoted above is incredibly confused. I would almost call it a straw man, but it is worse than that.
Copyright law says no such thing. Of course you _could_ copy something. Copyright law exists precisely because you can do that. The law says i.e. "if you break copyright law, you will be at risk of a sufficiently motivated prosecutor."
## "Rhetoric and Reality in Copyright Law" by Stewart E. Sterk
Benjamin N. Cardozo School of Law. https://repository.law.umich.edu/cgi/viewcontent.cgi?article...
> Why give authors an exclusive right to their writings? Copyright rhetoric generally offers two answers. The first is instrumental: copyright provides an incentive for authors to create and disseminate works of social value. By giving authors a monopoly over their works, copyright corrects for the underincentive to create that might result if free riders were permitted to share in the value created by an author's efforts. The second answer is desert: copyright rewards authors, who simply deserve recompense for their contributions whether or not recompense would induce them to engage in creative activity.
> The rhetoric evokes sympathetic images of the author at work. The instrumental justification for copyright paints a picture of an author struggling to avoid abandoning his calling in order to feed his family. By contrast, the desert justification conjures up a genius irrevocably committed to his work, resigned - or oblivious - to living conditions not commensurate with his social contributions. The two images have a common thread: extending the scope of copyright protection relieves the author's plight.
> Indeed, the same rhetoric· - emphasizing both incentives and desert - consistently has been invoked to justify two centuries of copyright expansion. Unfortunately, however, the rhetoric captures only a small slice of contemporary copyright reality. Although some copyright protection indeed may be necessary to induce creative activity, copyright doctrine now extends well beyond the contours of the instrumental justification. ...
## "Copyright Nonconsequentialism" by David McGowan
Missouri Law Review. https://scholarship.law.missouri.edu/cgi/viewcontent.cgi?art...
> This Article explores the foundations of copyright law. It tries to explain why those who debate copyright often seem to talk past each other. I contend the problem is that copyright scholars pay too much attention to instrumental arguments, which are often indeterminate, and too little to the first principles that affect how one approaches copyright law.
> Most arguments about copyright law use instrumental language to make consequentialist arguments. It is common for scholars to contend one or another rule will advance or impede innovation, the efficient allocation and production of expression, personal autonomy, consumer welfare, the "robustness" of public debate, and so on.' Most of these instrumental arguments, though not quite all of them, reduce to propositions that cannot be tested or rejected empirically. Such propositions therefore cannot explain existing doctrine or the positions taken in debate.
> These positions vary widely. Consumer advocates favor broad fair use rights and narrow liability standards for contributory infiringement; producer advocates favor the reverse.' Most of the arguments for both consumers and producers prove too much. It is easy to say that the right to exclude is needed to provide incentives for authors. It is hard to show that any particular rules provide optimal incentives. It is easy to point to deviations from the model of perfect competition. It is hard to show why these deviations imply particular rules.
> ...
Consider instead the term "legal fiction", it's not so derogatory.
i listened to one podcast on corporate personhood and intuited that intellectual property was similar but i see what you mean
>Nightshade's goal is not to break models, but to increase the cost of training on unlicensed data, such that licensing images from their creators becomes a viable alternative.
Which feels similar to DRM. To discourage extraction of assets.
I don't see how any of that is muddled. I'm being as direct as I can with my words here. I'm talking about the very plain fact that art can be copied freely.
For example, DRM software gives you encrypted content and the decryption key. Why bother? Because the end user is expected to pretend they are only able to use that decryption key once. This is patently false, but any user who decides not to play along is immediately labeled a "pirate". What vessel have they commandeered? The metaphorical right to copy. What will be enshrined in law next, the rights to hear and to see?
Copyright law is poorly designed. It does have unintended consequences. It is immoral and pointless. To back these claims, all I must to do is show the absurdity that copyright is on the face of it.
No, copyright law does _not_ compel us to believe that things cannot be copied. Instead, it establishes consequence of illegal copying. I'm flabbergasted that you, so far, don't recognize a simple and foundational difference between "can" and "should".
the goals of artists and the developers of OP,
versus
the goals of AI engineers,
and how it seems to me similar to the is/ought disctinction.
In my original analogy, it’s generally considered lawful to have a lock on your door, or not to do so, and the issue of a lock or lack thereof is moot when one is invited to enter, just as it is lawful to enter a premises when invited or during exigent circumstances, such as breaching entry to render lifesaving aid by emergency services or firefighters.
By that same token, no amount of locks or other barriers to entry will prevent ingress by a vampire once invited inside.
To me, much of the ballyhoo about OP seems like much ado about big cats eating faces, like a person publicly decrying the rise in vampire attacks after inviting that same vampire inside for dinner. It’s a nonstarter.
Copyright law is broken, because of the way that the law is written as much as the way that it’s enforced, and also broken because of the way that humans are. Ideas are not copyrightable, and while historically their implementations or representations were, going forward, neither implementations nor representations are likely to receive meaningful/effective protections from copyright itself, but only from legal enforcement of copyright law.
After the recent expiry of Disney’s copyright on Steamboat Willie, the outpouring of praise, support, excitement, and original work from creators shows me that copyright law in its current incarnation doesn’t perform its stated goals of promoting the creation of arts and sciences, and so should be changed, and in the meantime ignored and actively disobeyed, as any unjust law ought to be, regardless of what the law is or does.
In light of our obligation to disobey unjust laws, I applaud efforts like OP to advance the state of the art in computer science, while at the same time encouraging others working on AI to actively circumvent such efforts for the selfsame reason.
I similarly encourage artists of all kinds to make art for art’s sake while monetizing however they see fit, without appealing to red herrings like the legality or lack thereof of end users appreciating their art and incorporating it into their own artistic output, however they may choose to do so.
Like all art, code and its outputs is also First Amendment protected free speech.
This is effectively the equivalent of someone doing really crappy image processing. As other commenters have mentioned, it does alter how images look to humans as well as machines, and it can be “mitigated” through additional processing techniques.
Lots, I'm sure. / Besides, this is a straw man. And it misses the point. I am opposed to your choice of language, not because I'm trying to "control" how you frame or prioritize things, but because it suggests a deep misunderstanding of fundamental concepts. You claimed that copyright law forces people to believe that one cannot copy things. This is ridiculous. I hope you can see this.
Copyright law establishes probabilistic consequences for certain behaviors. Many people would view copyright law as being _normative_ (in that it defines right and wrong). I agree that it sets norms, even if I don't necessarily agree on the wisdom of the chosen norms. Laws are wise and just only to the extent they align to deeper principles.
You and I probably agree with many of the downsides of copyright law, even if our final takes are different, but that's not very interesting. In another comment, I linked to some articles on the topic by legal experts. I truly hope no one cares very much about my or your points of view about copyright law; I hope people go and read something by the experts instead. Of course, I'm not saying they are infallible. But there is a huge between armchair commenters on HN and someone with hundreds of hours of focused study and experience actively debating with others at a similar level.
Here is what is interesting to me: how have you ended up with these bizarre conceptualizations? Why are you sticking to them? Is it largely inertia and/or ego? I haven't ruled out a high degree of contrarianism as well. Perhaps even just wanting to get a reaction. Otherwise, you seem reasonable, so I'm puzzled.
Another point: Are you familiar with Orwell's ideas of thoughtcrime? That would be an example of law and society attempting to force people to believe something. Even so, as we know from the novel, that twisted goal cannot be fully achieved even in a totalitarian society.
There is a thing called metaphor, and it's totally legal.
I mentioned Orwell because I was trying to throw you a bone. I was trying to show that I was engaging with you: to find some conceptual connection to what you wrote.
Looking back, this feels a little like the motte-and-bailey fallacy: you started off by making a ridiculous claim ("Copyright law demands that everyone play along with the lie that is intellectual monopoly"). Much later you "retreat" to calling it a metaphor.
You might have known it was metaphor all along, but if so, why did you double down? Why persist? Again, I think there is high chance that you like the reaction you get. It seems to me that people often seek attention in this way. There is a downside: many people seeing the kind of language you used will think "loony!" and stop engaging.
Attempting to convince people by exaggerated metaphor can of course work! But if it does, I don't think you really want to take credit for it. People that easily swayed are not a prize worth counting. Besides, if you only "win" someone's agreement by rhetoric, you can expect the next person will "win" that same mind by some subsequent emotional appeal. Careful choice of words might take longer but it has less blowback. Rhetoric is the fast food of persuasion. (Yes, I can use metaphor too!)
Nothing personal. For years, I've attempted to discuss things with people that seem puzzling and/or stubborn. Thanks for discussing.
Well you can call it a captcha if you want. The point here is to make it harder to access for bots (but not impossible) while inconveniencing honest actors in the process. It doesn't sound like there's a straightforward answer to "are captchas DRM" either.
The rumblings I'm hearing are that this a) barely works with last-gen training processes b) does not work at all with more modern training processes (GPT-4V, LLaVA, even BLIP2 labelling [1]) and c) would not be especially challenging to mitigate against even should it become more effective and popular. The Authors' previous work, Glaze, also does not seem to be very effective despite dramatic proclamations to the contrary, so I think this might be a case of overhyping an academically interesting but real-world-impractical result.
[1]: Courtesy of /u/b3sn0w on Reddit: https://imgur.com/cI7RLAq https://imgur.com/eqe3Dyn https://imgur.com/1BMASL4
I don't know if anyone else is still scraping new images into the generators. I've heard somewhere that OpenAI stopped scraping around 2021 because they're worried about training on the output of their own models[1]. Adobe Firefly claims to have been trained on Adobe Stock images, but we don't know if Adobe has any particular cutoffs of their own[2].
If you want an image that screws up inference - i.e. one that GPT-4V or Stable Diffusion will choke on - you want an adversarial image. I don't know if you can adversarially train on a model you don't have weights for, though I've heard you can generalize adversarial training against multiple independent models to really screw shit up[3].
[0] All learning capability of text generators come from the fact that they have a context window; but that only provides a short term memory of 2048 tokens. They have no other memory capability.
[1] The scenario of what happens when you do this is fancifully called Habsburg AI. The model learns from it's own biases, reinforcing them into stronger biases, while forgetting everything else.
[2] It'd be particularly ironic if the only thing Nightshade harms is the one AI generator that tried to be even slightly ethical.
[3] At the extremes, these adversarial images fool humans. Though, the study that did this intentionally only showed the images for a small period of time, the idea being that short exposures are akin to a feed-forward neural network with no recurrent computation pathways. If you look at them longer, it's obvious that it's a picture of one thing edited to look like another.
Generative models like text-to-image have an encoder part (it could be explicit or not) that extract the semantic from the noised image, if the auto-labelers can correctly label the samples then the encoded trained on both actual and adversarial images will learn to not take the same shortcuts that the proxy model has taken making the model more robust, I cannot see an argument where this should be a negative thing for the model.
Denoising is probably a good preprocessing step anyway.
It's a bad tradeoff.
According to which authority?
The only real way for artists or anyone really to try to hold back models from training on human outputs is through the law, ie, leveraging state backed violence to deter the things they don’t want. This too won’t be a perfect solution, if anything it will just put more incentives for people to develop decentralized training networks that “launder” the copyright violations that would allow for prosecutions.
All in all it’s a losing battle at a minimum and a stupid battle at worst. We know these models can be created easily and so they will, eventually, since you can’t prevent a computer from observing images you want humans to be able to observe freely.
There's a nonzero chance that encouraging the creation of a large dataset of known tampered data can ironically improve generative AI art models by allowing the model to recognize tampered data and allow the training process to work around it.
In the future, my guess is that courts will generally be on the side of artists because of societal pressures, and artists will be able to challenge any image they find and have it sent to yet another ML model that can quickly adjudicate whether the generated image is "too similar" to the artist's style (which would also need to be dissimilar enough from everyone else's style to give a reasonable legal claim in the first place).
Or maybe artists will just give up on trying to monetize the images themselves and focus only on creating physical artifacts, similar to how independent musicians make most of their money nowadays from touring and selling merchandise at shows (plus Patreon). Who knows? It's hard to predict the future when there are such huge fundamental changes that happen so quickly!
It’s pretty exciting.
Being able to find a mix of styles you like and apply them to new subjects to make your own unique, personalized, artwork sounds like a wickedly cool power to give to billions of people.
I want a scaling license fee to apply (e.g. % pegged to revenue. This still has an indirect problem with different industries having different profit margins, but still seems the fairest).
And I want the world (or EU, then others to follow suit) to slowly reduce copyright to 0 years* after artists death if owned by a person, and 20-30 years max if owned by a corporation.
And I want the penalties for not declaring usage** / not paying fees, to be incredibly high for corporations... 50% gross (harder) / net (easier) profit margin for the year? Something that isn't a slap on the wrist and can't be wriggled out of quite so easily, and is actually an incentive not to steal in the first place.)
[*]or whatever society deems appropriate.
[**]Until auto-detection (for better or worse) gets good enough.
IMO that would allow personal use, encourages new entrants to market, encourages innovation, incentivises better behaviour from OpenAI et al.
Why death at all?
It's icky to trigger soon after death, it's bad to have copyright vary so much based on author age, and it's bad for many works to still have huge copyright lengths.
It's perfectly fine to let copyright expire during the author's life. 20-30 years for everything.
I still feel it is absolutely wrong to roam around the internet and scrape images (without consent) in order to power one’s cash cow AI. I hope more methods to protect artworks (including audio and other formats) become more accessible.
Also... Maybe I am naive, but it seems rather trivial to work around with a quick prefilter? I don't know if tradition denoising would be enough, but worst case you could run img2img diffusion.
reply
Doing that requires much less compute than training a large generative image model.
The poorest people have historically produced great art. Training a model, however? Expensive. Running it locally? Expensive. Paying the sub? Expensive.
Nothing is being democratized, the only thing this does is devaluing the blood and sweat people have put into their work so FAANG can sell it to lazy suckers.
sorta like what the laptop did for writing
You mean like OpenAI and Adobe ?
Only the free and open source models didn't licensed any content for the training data.
OpenAI has provided no such documentation or legal guarantees, and it is still quite possible they scraped all sorts of copyright materials.
In this case, the mechanism for how it would work is effectively useless. It doesn't affect OpenAI or other companies building foundation models. It only works on people fine-tuning these foundation models, and only if the image is glazed to affect the same foundation model.
EDIT: I have seen a few examples with GPT-4 V and how I imagine it wasn't deceived, I doubt this technique can have any impact on the quality of the models, the only impact that this could potentially have honestly is to make the training more robust.
Eventually I assume the poisoning artifacts introduced in the images will be very visible to humans as well.
It's still noticeably visible.
Enjoy the short term novelty while you can.
AI image exclusion standard
, similar to "robots.txt" -- which would tell an AI data-gathering web crawler that a given image or set of images -- was off-limits for use as data?
For instance, if I set traps in my home which hurt an intruder we are both guilty of crimes (traps are illegal and are never considered self defense, B&E is illegal).
Would I be responsible for corrupting the AI operator's data if I intentionally include adversarial artifacts to corrupt models, or is that just DRM to legally protect my art from infringement?
edit:
I replied to someone else, but this is probably good context:
DRM is legally allowed to disable or even corrupt the software or media that it is protecting, if it detects misuse.
If an adversarial-AI tool attacks the model, it then becomes a question of whether the model, having now incorporated my protected art, is now "mine" to disable/corrupt, or whether it is in fact out of bounds of DRM.
So for instance, a court could say that the adversarial-AI methods could only actively prevent the training software from incorporating the protected media into a model, but could not corrupt the model itself.
We are born and then exposed to the torrent of data from the world around us, mostly fed to us by other humans, this is what models are trying to tap.
Unfortunately our learning process is completely organic and takes decades and decades and decades; there's no way to put a model through this easily.
Perhaps we need to seed the web with AI agents who converse and learn as much like regular human beings as possible and assemble the dataset that way. Although having an agent browse and find an image to learn to draw from is still gonna make people reee even if that's exactly what a young and aspiring human artist would be doing.
Don't talk about humans being sacred; we already voted to let corporations be people, for the 1% to exist and "lobby", breaking our democracy so that they can get tax breaks and make corrupt under the table deals. None of us stopped that from happening...
2. They don't need to keep it a secret; the goal is to remove these images from the training data, in a way that would be much more efficient than simply adding a "please don't include my art in your ai scraper" message next to your pictures.
A made up scenario¹ is that a person who is training an AI, goes to the local library and checks out 600 books on art. The person then lets the AI read all of them. After which they are returned to the library and another 600 books are borrowed
Then we can imagine the AI somehow visiting a lot of museums and galleries.
The AI will now have been trained on the style and looks of a lot of art from different artists
All the material has been obtained in a legal manner.
Is this an acceptable use?
Or can an artist still assert that the AI was trained with their IP without consent?
Clearly this is one of the ways a human would go about learning about styles, techniques etc..
¹ Yes you probably cannot borrow 600 books at a time. How does the AI read the books? I dont know. Simplicity would be that the researcher takes a photo of each page. This would be extremmly slow but for this hypothetical it is acceptable.
I feel like I’m taking crazy pills TBQH
The only explanation I can find for this backlash is that artists are actually worried just like the rest of us that pretty soon AI will produce higher quality more inventive work faster and more imaginatively than they can - which is very natural, but not a reason to inhibit an AI's creative education.
Furthermore there’s a sort of unavoidable “jitter” in human-produced art that varies between individuals that stems from vastly different ways of thinking, perception of the world, mental abstraction processes, life experiences, etc. This is why artists who start out imitating other artists almost always develop their imitations into a style all their own — the imitations were already appreciably different from the original due to the aforementioned biases and those distinctions only grow with time and experimentation.
There would be greatly reduced moral controversy surrounding ML models if they lacked that mincemeat/pink slime aspect.
There is another alternative to the law. Provide your art for private viewing only, and ensure your in person audience does not bring recording devices with them. That may sound absurd, but it's a common practice during activities like having sex.
I'm not defending it. Just acknowledging the reality. The next TMZ for private art gatherings is percolating in someone's garage at the moment.
On the other hand, the adversarial environment might push models towards a representation more aligned with human perception, which is neat.
This tool is free, and as far as I can tell it runs locally. If you're not selling anything, and there's no profit motive, then I don't think you can reasonably call it "snake oil".
At worst, it's a waste of time. But nobody's being deceived into purchasing it.
I don't think that's the intention of Nightshade, but I wouldn't put past someone to try it.
Snake oil for the sake of getting published is a very real problem that does exist.
The only way to be an artist now is to have a unique style of your own, and to never make it online.
So then of course, you also cannot sell your work, as those might put it online. And you cannot show your art to big crowds, as some will make pictures and put it online. So ... you can become a literal underground artists, where only some may see your work. I think only some will like that.
But I actually disagree, there are plenty of ways to be an artist now - but most should probably think about including AI as a tool, if they still want to make money. But with the exception of some superstars, most artists are famously low on money - and AI did not introduce this. (all the professional artists I know, those who went to art school - do not make their income with their art)
We all know what a law is you don't need to clarify. It makes your prose less readable.
I just want to say: I really appreciate the stark terms in which you've put this.
The thing that has come to be called "intellectual property" is actually just a threat of violence against people who arrange bytes in a way that challenges power structures.
As is, art already isn't a sustainable career for most people who can't get a job in industry. The most common monetization is either commissions or hiding extra content behind a pay wall.
To be honest I can see more proverbial "Furry artists" sprouting up in a cynical timeline. I imagine like every other big tech that the 18+ side of this will be clamped down hard by the various powers that be. Which means NSFW stuff will be shielded a bit by the advancement and you either need to find underground training models or go back to an artist. .
It's not particularly that hard. The furry nsfw models are already the most well developed and available models you can get right now. And they are spitting out stuff that is almost indistinguishable from regular art.
If there is any "point" of this, it's that's going to push the AI models to become better at capturing how humans see things.
Be reminded that this is - and has always been - the mainstream model of the lineages of what have come to be called "traditional" and "Americana" and "Appalachian" music.
The Grateful Dead implemented this model with great finesse, sometimes going out of their way to eschew intellectual property claims over their work, in the belief that such claims only hindered their success (and of course, they eventually formalized this advocacy and named it "The Electronic Frontier Foundation" - it's no coincidence that EFF sprung from deadhead culture).
And that OpenArt on the analogy of OpenSource is a non-existing thing (I know, I know, different things, source code is not for the generic audience and can be hidden on will, unlike art, just having some generative thoughts artefact here ;) )
I think population tends to value "looks pretty", and it's other artists, connoisseurs, and art critics who value origin and process. Exit Through the Gift Shop sums this up nicely
I'm sure OpenAI's models can shit out an approximation of a new Terry Pratchett or Douglas Adams novel, but nobody with any level of literary appreciation would give a damn unless fraud was committed to trick readers into buying it. It's not the author's work, and there's no human message behind it.
According to Marx, value is only created with human labour. This is not just a Marxist theory, it is an observation.
There may be lots of over-priced junk that makes you want to question this idea. But let's not nit-pick on that.
In two years time people will not see any value in AI art, quite correctly because there is not much human labour in creating it.
And in the process, they will obviate the need for Nightshade and similar tools.
AI models ingesting AI generated content does the work of destroying the models all by itself. Have a look at "Model Collapse" in relation to generative AI.
I have more access to information now than the most powerful people in the world did 40 years ago. I can learn about quantum field theory, about which pop star is allegedly fucking which other pop star, etc.
If I don't care about the law I can read any of 25 million books or 100 million scientific papers all available on Anna's Archive for free in seconds.
And I also agree that we shouldn’t build systems that alienate people from that accumulated equity.
Robots.txt survived because the use of it to gatekeep valuable goodies was never widespread. Most sites want to be indexed, most URLs excluded by the robots file are not of interest to the search engine anyway, and use of robots to prevent crawling actually interesting pages is marginal.
If there was ever genuine uptake in using robots to gatekeep the really good stuff search engines would've stopped respecting it pretty much immediately - it isn't legally binding after all.
Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.
>Robots.txt survived because the use of it to gatekeep valuable goodies was never widespread. Most sites want to be indexed, most URLs excluded by the robots file are not of interest to the search engine anyway, and use of robots to prevent crawling actually interesting pages is marginal.
Robots.txt survived because it was a "digital signpost" a "digital sign" -- sort of like the way you might put a "Private Property -- No Trespassing" sign in your yard.
Most moral/ethical/lawful people -- will obey that sign.
Some might not.
But the some that might not -- probably constitute about a 0.000001% minority of the population, whereas the majority that do -- probably constitute about 99.99999% of the population.
"Robots.txt" is a sign -- much like a road sign is.
People can obey them -- or they can ignore them -- but they can ignore them only at their own peril!
It's a sign which provides a hint for what the right thing to do in a certain set of circumstances -- which is what the Law is; which is what the majority of Laws are.
People can obey them -- or they can choose to ignore them -- but only at their own peril!
Most will choose to obey them. Most will choose to "take the hint", proverbially speaking!
A few might not -- but that doesn't mean the majority won't!
>If there was ever genuine uptake in using robots to gatekeep the really good stuff search engines would've stopped respecting it pretty much immediately - it isn't legally binding after all.
Again, name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.
Nevertheless, I hope that at some not-so-far point in the future there will be more legal guidance about this kind of stuff, i.e. it will be made clear that scraping violates copyright. This still won't solve the problem of detectability but it would at least increase the risk of scrapers, should they be caught.
Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.
>Currently I see no organisation who would be willing to do this or even just technologically able - as even just detecting such scrapers is an extremely hard task.
// Part of Image Web Scraper For AI Image Generator ingestion psuedocode:
if fileExists("no-ai.txt") {
// Abort image scraping for this site -- move on to the next site
} else { // Continue image scraping for this site
};See? Nice and simple!
Also -- let me ask you this -- what happens to the intellectual property (or just plain property) rights of Images on the web after the author dies? Or say, 50 years (or whatever the legal copyright timeout is) after the author dies?
Legal grey area perhaps?
Also -- what about Images that exist in other legal jurisdictions -- i.e., other countries?
How do we know what set of laws are to apply to a given image?
?
Point is: If you're going to endorse and/or construct a legal framework (and have it be binding -- keep in mind you're going to have to traverse the legal jurisdictions of many countries, many countries!) -- you might as well consider such issues.
Also -- at least in the United States, we have Juries that can override any Law (Separation of Powers) -- that is, that which is considered "legally binding" -- may not be quite so "legally binding" if/when properly explained to a proper jury in light of extenuating (or just plain other) circumstances!
So kindly think of these issues prior to making all-encompasing proposals as to what you think should be "legally binding" or not.
I comprehend that you are just trying to solve a problem; I comprehend and empathize; but the problem might be a bit greater than you think, and there might be one if not serveral unexplored partial/better (since no one solution, legal or otherwise, will be all-encompassing) solutions -- because the problem is so large in scope -- but all of these issues must be considered in parallel -- or errors, present or future will occur...
If you upload a picture of a dog to DeviantArt and you label it as a cat, and a model ingests that image and starts to think that cats look like dogs, would anybody claim that you are breaking a law? If you upload bad code to Github that has bugs, and an AI model consumes that code and then reproduces the bugs, would anyone argue that uploading badly written code to Github is a crime?
What if you uploaded some bad code to Github and then wrote a comment at the top of the code explaining what the error was, because you knew that the model would ignore that comment and would still look at the bad code. Then would you be committing a crime by putting that code on Github?
Even if it could be proven that your intention was for that code or that mistagged image to be unhelpful to training, it would still be a huge leap to say that either of those activities were criminal -- I would hope that the majority of HN would see that as a dangerous legal road to travel down.
DRM can, for instance, disable its own parent tool (e.g. a video game) if it detects misuse, but it can't attack the host computer or other software on that computer.
So is the model or its output, having been trained on my art, a byproduct of my art, in which case I have a legal right to 'disable' it, or is it separate software that I don't have a right to corrupt?
Of course I'm assuming it works to begin with. Sounds like a game of cat and mouse. And AI has a lot of rich cats.
Your understanding of the attack was the same as mine, it injects just the right kinds of pixels to throw off the auto-labellers to misdirect what they are directing causing the tags to get shuffled around.
Also on reddit today some of the Stable Diffusion users are already starting to train using Nightshade so they can implement it as a negative model, which might or might not work, will have to see.
Study on the Influence of Adversarial Images on Human Perception
“It won’t be any more wickedly cool than the internet” - saying something won’t be any more wickedly cool than the most profound and impactful pieces of infrastructure human civilization has erected is a pretty high compliment.
Neither. Nightshade is not DRM or malware, it's "lying" about the contents of an image.
Arguably, Nightshade does not corrupt or disable the model at all. It feeds it bad data that leads the model to generate incorrect conclusions or patterns about how to generate images. This is assuming it works, which we'll have to wait and see, I'm not taking that as a given.
But the only "corruption" happening here is that the model is being fed data that it "trusts" without verifying that what the data is "telling" it is correct. It's not disabling the model or crashing it, the model is forming incorrect conclusions and patterns about how to generate the image. If Google translate asked you to rate its performance on a task, and you gave it an incorrect rating from what you actually thought its performance was, is that DRM? Malware? Have you disabled Google translate by giving it bad feedback?
I don't think the framing of this as either DRM or malware is correct. This is bad training data. Assuming it works, it works because it's bad training data -- that's why ingesting one or two images doesn't affect models but ingesting a lot of images does, because training a model on bad data leads the model to perform worse if and only if there is enough of that bad data. And so what we're really talking about here is not a question of DRM or malware, it's a question of whether or not artists have a legal obligation to make their data useful for training -- and of course they don't. The implications of saying that they did would be enormous, it would imply that any time you knowingly lied about a question that was being fed into an AI training set that doing so was illegal.
BTW, the right to prepare derivative works belongs to the copyright holder of the reference work.
I doubt that many AI works are in fact derivative works. Sure, some bear enough similarity, but a gross majority likely doesn't.
But it's still not clear why this is worse than the situation where not everyone can create perfectly-rendered pieces of whatever idea is in their head, and have to rely on others to do it for them, while being limited by what they can afford and what those others are willing to paint.
Why is this hippie nonsense so popular?
https://i.imgur.com/zUVn8rt.png
But now that I double-check, I was comparing with the images zoomed to 200%. On desktop the artifacts are also noticeable at 100%, but not nearly as bad as in my previous comment.
That's actually quite plausible.
Maybe they're ignorant of it right up until you announce it, but then they're no longer ignorant of it.
I am not trying to belittle foundational model trainers, but a lot goes on in ML land. Even groups can't track every development.
I'll repeat this point for clarity. After going over the paper again, denoising shouldn't affect this attack, it's the ability of plausible images to not be detected by human or AI discriminators (yet)
Even before the current AI boom, plenty of artists have wanted to showcase their work/prove that it exists without necessarily making the highest quality original file public.
Yes, and who is supposed to run that code?
> Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.
Github? OpenAI?[1] Stable Diffusion?[2] LAION?[3] What do you think why there are currently multiple high-profile lawsuits ongoing about exactly that topic?
Besides, that's not how things work. Training a foundation model takes months and currently costs a fortune in hardware and power - and once the model is trained, there is, as of now, no way to remove individual images from the model without restraining. So in practical terms it's impossible to remove an image if it has already been trained on.
So the better question would be, name two entities who have ignored an artist's request to not include their image when they encountered it the first time. It's still a trick question though because the point is that scraping happens in private - we can't know which images were scraped without access to the training data. The one indication that it was probably scraped is if a model manages to reproduce it verbatim - which is the basis for some of the above lawsuits.
[1] https://www.theverge.com/2022/11/8/23446821/microsoft-openai...
[2] https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-...
[3] https://www.heise.de/hintergrund/Stock-photographer-sues-AI-...
People that are honest and ethical?
And/or groups that don't want to risk getting sued? (Your: [1] [2] [3])?
>> Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.
>Github? OpenAI?[1] Stable Diffusion?[2] LAION?[3] What do you think why there are currently multiple high-profile lawsuits ongoing about exactly that topic?
Because:
a) (Some) American Lawyers (AKA "Bar Association Members") -- are Sue Happy?
b) Because various Governments / Deep States (foreign and domestic) / Dark Money Groups / Paid (and highly biased) Political Activists -- want to see if they can get new draconian laws (whilst believing their actions to be super-patriotic to their respective countries!) -- or at least court precedents that move in that direction -- passed?
c) Because there's big money at stake, all the way around? (https://www.biblegateway.com/passage/?search=1%20Timothy%206...)
d) Because the alleged "victims" are "playing the victim card"?
(https://tvtropes.org/pmwiki/pmwiki.php/Main/PlayingTheVictim...) (Note that as a theory, this pairs well with (a)!)
(How much revenue will they be losing if their net income from artwork was $0? Also, wouldn't such high profile cases give the artists a ton of free advertising? The Defendant companies should counter-sue for giving the Plaintiff artists what amounts to free publicity for their artwork so great that they couldn't buy it with all of the Google advertising credits in the world!)
>Besides, that's not how things work. Training a foundation model takes months and currently costs a fortune in hardware and power - and once the model is trained, there is, as of now, no way to remove individual images from the model without restraining.
>"without retraining"...
Meditate on that one for a moment...
>So in practical terms it's impossible to remove an image if it has already been trained on.
In practical terms -- just retrain the model -- sans ("without") the encroaching images!
The models will need to be updated every couple of months anyway to include new public data from the web!
Create a list of images NOT to include in the next run (see above, "no-ai.txt" -- good suggestion incidentally!) -- and then don't include them" on the next run!
It's not Rocket Science! :-)
(Also, arguably Elon Musk doesn't think that "Rocket Science" is in fact as hard as "Rocket Science" is purported to be -- but that's a separate debate! <g>)
>So the better question would be, name two entities who have ignored an artist's request to not include their image when they encountered it the first time. It's still a trick question though because the point is that scraping happens in private - we can't know which images were scraped without access to the training data. The one indication that it was probably scraped is if a model manages to reproduce it verbatim - which is the basis for some of the above lawsuits.
Explain to me, from the point of view of an AI company, how that AI company is to know ahead of time NOT to include an image from the web? (And thus not break the law, copyright law at least, and thus not incur the lawsuits and all the chaos that will apparently follow such an act?)
How is the AI company supposed to know, ahead of time, that a given image on the web is not to be included?
How please?
Because you see, that's the root of the problem you are trying to solve.
In fact, let me ask you a better question...
How can an arbitrary Internet User -- not a big, legally powerful AI company, but an arbitrary small-fry Internet User -- know ahead of time, that a given Image, exposed to the public via the public Internet; the Web -- that the artist who created that image (or the intellectual/artistic property holder) -- does NOT want their Image to be used for specific purposes?
?
Because well, I don't know of any easily parsible, easily understandable standard for that on the Web currently...
So, to recap, the question is:
How is everybody (humans and machines) to know the unambiguous, easily parsable, easily understandable uses that the artist (or intellectual/artistic property) of an image -- wishes/wills for that image?
And how to easily know the unintended uses?
That might be a better definition of the problem that is trying to be solved...
If you have to have both and instantly toggle between them to notice the difference, then it sounds like it’s doing its job well and is hard to notice the difference.
Possibly you instead meant that fair use is relevant, but people are wording remarks in a way that suggests the model itself is giving a fair use defence to copyright infringement, rather than the persons training or using it?
An ML model can neither have nor be in breach of copyright so any discussion about how it works, and how that relates to how people work or “learn” is besides the point.
What actually matters is firstly details about collation of source material, and later the particular legal details surrounding attribution. The last part involves breaking new ground legally speaking and IANAL so I will reserve judgement. The first part, collation of source material for training is emphatically not unexplored legal or moral territory. People are acting like none of the established processes apply in the case of LLMs and handwave about “learning” to defend it.
It is important (for the training and generation stages) to distinguish between whether the model copies the original works or merely infers information from them - as copyright does not protect against the latter.
> The first part, collation of source material for training is emphatically not unexplored legal or moral territory.
Similar to as in Authors Guild v. Google, Inc. where Google internally made entire copies of millions of in-copyright books:
> > While Google makes an unauthorized digital copy of the entire book, it does not reveal that digital copy to the public. The copy is made to enable the search functions to reveal limited, important information about the books. With respect to the search function, Google satisfies the third factor test
Or in the ongoing Thomson Reuters v. Ross Intelligence case where the latter used the former's legal headnotes for training a language model:
> > verbatim intermediate copying has consistently been upheld as fair use if the copy is "not reveal[ed] to the public."
That it's an internal transient copy is not inherently a free pass, but it is something the courts take into consideration, as mentioned more explicitly in Sega v. Accolade:
> > Accolade, a commercial competitor of Sega, engaged in wholesale copying of Sega's copyrighted code as a preliminary step in the development of a competing product [yet] where the ultimate (as opposed to direct) use is as limited as it was here, the factor is of very little weight
And, given training a machine learning model is a considerably different purpose than what the images were originally intended for, it's likely to be considered transformative; as in Campbell v. Acuff-Rose Music:
> > The more transformative the new work, the less will be the significance of other factors
You have a legal right to scrape data and use it as input into a model, you don't have a right to good data. It's up to you to sanitize it before training your model on it.
Those millions/billions of people matter a great deal.
For what it’s worth I never use midjourney or dalle or any of the commercial closed systems that steal from artists but I know I can’t stop the masses from going there and inputting “give me pretty picture in style x”
But you pay that for any piece of art though? You appreciate it because you like what it looks like. The utility of it is in how good it looks, it's not how much effort was put into it.
If you need a ditch you're not going to value the ditch more if the worker dug it by hand instead of using an excavator. You value it based on the utility it provides you.
Again, i’m not saying buying a mass made AI art will be wrong. Just personally speaking, it will never evoke any feelings other than “looks neat” for me. So its inherent “art value” is close to 0 as I can guess its history is basically someone put in a prompt and sent it to print (which I can do myself on my phone too!). It’s the same as looking at cool building pics on my phone (0 art value) versus actually seeing them in person (non-0), mostly because the feelings I get from it. That being said, if it makes others happy, it’s not my place to judge.
And yet it's completely and absolutely wrong. Value is created by the subjective utility offered to the consumer, irrespective of what inputs created the thing conveying that utility.
Yes, I'm aware. This is precisely why I'm stating the prior comment to be "absolutely wrong". Marginal utility is a substantially valid model, LTV is not.
> In fact, there are also other value theories in economy. It's a mostly philosophical choice, and like other philosophical choices, it's not possible to accuse one of them of being wrong.
Sure it is. These aren't theories in a normative sense, they're models of causality for manifest phenomena. They're closer to scientific theories than they are to philosophical axioms. LTV simply doesn't bear out with observation.
Yeah? Well, you know, that's just like uh, your opinion, man
https://helpx.adobe.com/manage-account/using/machine-learnin...
"The insights obtained through content analysis will not be used to re-create your content or lead to identifying any personal information."
"For Adobe Firefly, the first model is trained on Adobe Stock images, openly licensed content, and public domain content where the copyright has expired."
(I work for Adobe)
OpenAI and Shutterstocks publicly announced their collaboration, Shutterstocks sells AI generated images, generated with OpenAI models.
This is one the AI companies should offer the olive branch on IMO, there must be a way to use stenography to transparently embed a "don't process for AI" code into an image or text or music or any other creative work that won't be noticeable by humans, but the AI would see if it tried to process the content for training. I think it would be a very convenient answer and probably not be detrimental to the AI companies, but I also imagine that the AI companies would not be very eager to spend the resources implementing this. I do think they're the best source for such protections for artists though.
Ideally, without a previous written agreement for a dataset from the original creators, the AI companies probably shouldn't be using it for training at all, but I doubt that will happen -- the system I mention above should be _opt-in_, that is, you must tag such content that is free to be AI trained in order for AI to be trained on it, but I have 0 faith that the AI companies would agree to such a self-limitation.
edit: added mention to music and other creative works in second paragraph 1st sentence
edit 2: Added final paragraph as I do think this should be opt-in, but don't believe AI companies would ever accept this, even though they should by all means in my opinion.
Now for the second type, representing models such as Stable Difusion and Chat GPT, it would be required to have their trained model freely available to anyone and any resulting output would not be copyrightable. It may be a more fairer way of allowing anyone to harness the power of AI models that contain essentially the knowledge of all man kind, but without giving any party an unfair monopoly on it.
This should be easily enforceable for big corporations, else it would be too obvious if they are trying to pass one type model as another or even keep the truth about their model from leaking. It might not be as easy to keep small groups or individuals from breaking those rules, but hey, at least it evens the playing field.
Of course the workaround would be to have multiple accounts, but that in turn can be made unscalable with a "prove you're human" box.
- This is still vulnerable to stuff like mturk or even just normal users who did get past the anti-bot things pulling and re-uploading the content elsewhere that is easier for the AI companies to use
- The artists' main contention is that the AI companies shouldn't be allowed to just use whatever they find without confirm they have a license to use the content in this way
- If someone's content _does_ get into an AI model and it's determined somehow (I think there is a case with a news paper and chatGPT over this very issue?), the legal system doesn't really have a good framework for this situation right now -- is it copyright infringement? (arguably not? it's not clear) is it plagiarism? (arguably yes, but plagiarism in US court system is very hard to proof/get action on) is it license violation? (for those who use licenses for their art, probably yes, but it's the same issue as plagiarism -- how to prove it effectively?)
Really what this comes down to is that the AI companies use the premise that they have a right to use someone else's works without consent for the AI training. While your suggestions are technically correct, it puts the impetus on the artists that they must do something different because the AI companies are allowed to train their models as they currently do without recourse for the original artist. Maybe that will be ruled true in the future I don't know, but I can absolutely get why artists are upset about this premise shaping the discussion on AI training, as such a premise negates their rights as an artist and many artists have 0 path for recourse. I'm pretty sure that OpenAI wouldn't think about scraping a Disney movie from a video upload site just because it's open access since Disney likely can fight in a more meaningful way. I would agree with artists who are complaining that they shouldn't need to wait for a big corporation to decide that this behavior is undesirable before real action is taken, but it seems that is going to be what is needed. It might be reality, but it's a very sad reality that people want changed.
Standards differ. I saw glazed images in the wild, was wondering why they have so much JPEG artifacts, until I saw the post of one of those anti-AI + glaze images on his profile.
I can't really imagine a case where if I had only seen the AI edited one I would have any different reaction or response to viewing the piece of art compared to having only seen the original one.
It's not random but it's not particularly curated either. Most of the time, any curation is done afterwards.
Given all that, the purpose of LLMs should be to create tailor made content to everyone's tastes. However, it seems the hardcore guardrails put into GPT4 and Claude prevent it from generating anything enjoyable. It seems, even the plot of the average Star Wars movie is too spicy for modern LLM sensibilities, never mind something like Stephen King.
Also, I'm sure that you can eventually just prompt the model with the message you want to put into the story, if you can't already do that.
But that wouldn't be helpful. It would've been memorable, because novels aren't just about the message.
AFAICT current text generation is something approaching bad mimicry at best and downright abysmal in general. I think you still need a very skilled author and meaty brain with a story to tell to make use of an LLM for storytelling. Sure it’s a useful tool that will make authors more effective but we are far from the point where you tell the LLM “write a story set in Pratchetts Discworld” and something acceptable or even entertaining will be spit out - if such a thing can even be achieved.
There is somewhat famous digital artist from Russia - Alexey Andreev. Google it, he has very distinctive style of realistic technique and surrealistic situations, like landing big manta ray on the deck of aircraft carrier. Or you can see his old works in his 5-years-not-updates LJ [1].
Now he uses generative AI as one of his tools. As Photoshop, as different (unrealistic!) brushes in Photoshop, as other digital tools. His style is still 100% recognizable and his works don't become worse or more "generic". Is he still artist? I think so.
Where will you draw the line?
When everybody is an artist, then nobody will be one.
Am I no longer a software engineer?
Was it irony? :)
The Authors Guild lawsuit against Google Books ended in a 2015 ruling that Google Books is fair use and as such they don't have a right to be de-listed. It's not the case that they have a right to be de-listed but choose not to make use of it.
The same would apply if collation of data for machine learning datasets is found to be fair use.
> one has to argue they have _no_ such right to be “de-listed” in order to get any training data currently under copyright.
Datasets I'm aware of already have respected machine-readable opt-outs, so if that were to be legally enforced (as it is by the EU's DSM Directive for commercial data mining) I don't think it'd be the end of the world.
There's a lot of power in a default; the set of "everything minus opted-out content" will be significantly bigger than "nothing plus opted-in content" even with the same opinions.
The (quite entertaining) saga of Nightshade tells a story about what is going to be content creators “default position” going forward and everyone else will follow. You would be a fool not to, the AI companies are trying to end run you, using your own content, and make a profit without compensating you and leave you with no recourse.
I'm unclear on what stance I've supposedly retreated from. My position is that an opt-out is not necessary under current US law, but that it wouldn't be the worst-case outcome if new regulation were introduced to mandate it.
> The (quite entertaining) saga of Nightshade tells a story about what is going to be content creators “default position” going forward and everyone else will follow
By "default" I refer not to the most common choice, but to the outcome that results from inaction. There's a bias towards this default even if the majority of rightsholders do opt to use Nightshade (which I think is unlikely).
"A person who creates art (such as painting, sculpture, music, or writing) using conscious skill and creative imagination"