There's a certain arrogance to believing the timing "simply wasn't right". It looks really bad if you try it with any recent controversy:
* "The timing wasn't right to charge people for heated car seats"
* "The timing wasn't right to make Photoshop a subscription service"
* "The timing wasn't right to increase fees"
It's a way of talking yourself away from the fact that what you are making may, inherently, be disliked. The cited survey even seems to have been read as favourably as possible:
> Surveys consistently showed that consumers believed artists deserved payment when AI generated content in their style.
This doesn't mean people want artists style to be generated by AI. It could mean they think it's horrible, but if it happens they should at least be compensated for it. In fact, the quotes survey even says 43% believe companies should ban copying artists styles. I could make the exact opposite argument with the same data:
"Many consumers believe companies should ban copying styles, and this may be a more common opinion than measured as most people have no experience with modern AI tools and therefore no chance to have made an opinion yet. What is known is that the majority believe that if artists were to be copied, they should at least be compensated"
edit: formatting, typo
I say this as someone who switched to Krita and canceled CC subscription.
But sometimes timing is indeed wrong, not because of anything you did, but just because no one wanted it _yet_. Google Glass from a few years ago comes to mind. Now Meta has a similar idea and it does seem successful, much to society's dismay.
But sometimes it is worth asking, "does the idea actually suck and that's why no one likes it? Or is it actually a good idea that is muddied with other issues that no one likes?"
The article doesn't make it clear that they were that introspective.
Bad timing can be to a number of factors, some can be good, some can be bad.
I get the emotional side of this argument - artists going hungry while someone else cashes in on their ideas. But compensation is a dangerous premise, because derivative art is an established type of artistic freedom. Artists routinely mimic styles, or work within the bounds of styles established by masters, but they've never been expected to compensate those styles' pioneers. Imagine it as a precedent:
"Your stuff borrows from Warhol? Guess what buddy, you owe the Warhol estate x% of your sale."
Perhaps you're arguing things change when commercial interests are involved? But again, this has never been the case for advertising companies (with their hired artistic guns) or any kind of graphic design leaning on established artistic styles for effect and making a killing in the process.
In the case of AI, even if it has a commercial master, it seems much closer to the borrowing of an ordinary artist. It's a trained entity, with deep understanding of styles, capable of making new works. On top of that, it works under the instruction of a user with their own ideas, whose guidance is crucial in deciding the work's final state. The user is the artist here - like one of the visionaries who delegate the nitty gritty of production to helpers. In this case the helper is leased from the AI company, which is more like an agency supplying those helpers.
All in all it's hard to see how any compensation model wouldn't end up constricting the artistic freedom most of these artists depend on.
I think a much more useful question is whether some arrogance is necessary to succeed. I personally think it is. But we are discussing a post mortem here, and the author is (in my opinion) clearly beating around the bush and using "the time wasn't right" to hide what may be uncomfortable truths.
Is a post mortem valuable if it doesn't address these face first? I am not the one with all the answers here, but what I am used to in mature tech teams is that the uncomfortable parts are usually the most important in any post mortem.
There are plenty of stories about companies that failed because the timing was wrong, and then see another company succeed in their place later on. That doesn't mean failure simply means "the timing was wrong" - you are putting a lot of weight on society adjusting to your belief. Consider that venture capital often invests in hundreds of founders like this, betting that at least one of them wasn't wrong. That's not statistically in your favor.
It is OK (in fact it is valuable) to fail and conclude that your signals may have been wrong. There's a reason some venture capital funds prefer investing in people who have failed before.
I mean, if you keep ignoring stuff that undermines your beliefs that's the definition of arrogance.
It's interesting that "consumers" are generally for the expansion of IP laws. At at the moment, I'm fairly certain that "style" is not something protected by Copyright. I personally do not want this, and I'm sure there are likely many like me. Poorly thought out IP laws lead to chilling-effects, DRM, stupid and unnecessary litigation, and ultimately a loss of digital freedoms.
> What 325 Cold Emails to Artists Taught Us
I'm surprised 1% didn't respond with "EAT HOT FLAMING DEATH SPAMMER" for sending them unsolicited commercial email. ;)
Then I tested out the image generation itself and I was unable to come up with prompts that achieved the kind of images I wanted. My only prior experience at the time was OpenAI API. With OpenAI I usually got what I wanted on the first or second try, but with Tess, I couldn’t get a usable result even after 20 tries.
So in addition to the limited number of artists, I think the quality of outputs vs. competing models was a huge factor. I needed to generate thousands of images, so I couldn’t afford to do dozens of attempts for each one.
Hopefully one day there will be a service that can match the quality of OpenAI Image API and Flux but with compensation for artists.
Would I pay extra to ensure that the artists that these models were trained on were compensated fairly? Absolutely! Would I pay extra for that but with degraded ergonomics? Given that this is just a silly hobby, probably not, if I'm being honest.
I think if that problem can be solved, and it's marketed to the correct group, a player in this space could certainly do well.
It's not hard when someone inputs "create in style of studio ghibli" to say that studio Ghibli should get a cut. It's very different when you don't specify the source for the origin style.
And if you tried to identify the source material owner, the percentage of the output image that their work contributed to would be extremely - if not infinitely - small. You'd get minuscule payouts.
> A free Tess subscription to use their own model for brainstorming and scaling repetitive work (roughly 1 in 4 artists took advantage of this)
So based on the math I'm seeing... the 21 artists in the system, only 5 ("1 in 4") optioned to use the tool for their own productivity? That seems really low and makes me wonder what the user experience for creation feels like. I would assume if you decided to commit to this endeavor, you would want to see what derivative results will look like.
> …fine-tune a Stable Diffusion base model.
So your entire business proposition was a lie, as you literally used a base model trained on billions of images by other artists too!
Their solution basically just amounts of "Ethically sourced Styles" which still has all the red tape that a normal text2image model has because majority of the data is still unapproved for use in an AI model.
Businesses didn't want to get wrapped up in a pesudolegal model that really has no better legality than base SD.
The demand to produce something in an artists style is low. The volume required to make it interesting to artist isn't present.
AI adoption and pushed back is greatest with artists you would be better off asking for money to shutdown AI.
The tech itself sounds interesting and would love that writeup.
“One engineer who left Kapwing in fall of 2025 said that the short-lived Tess investment contributed to burnout.”
Startups are not for the weak but the process detailed here is how we've gotten some of the most transformative and innovative products in technology. Props on attempting this unique idea; very sad that it didn't work out, but sometimes the market just can't support certain ideas!
Don’t take this personally.
Even if you told this person to work constantly and they believed in you and the business, it’s not totally your fault that they burned out. I say this as someone that has burned out twice, is currently burned out, and blames those that I currently and formerly worked for. I know the problem is as much me as them. Yes, employers have a responsibility to their employees not to burn them out. But, if they do, even if the employer is in a power position where the employee felt they had no other choice, and I felt that both times, the employee can choose not to work that much or care that much for almost whatever that means- if you’re literally holding a gun it’s different of course.
I know of a developer that committed suicide and the toll that took on the employer. But the employer can’t take on all of that themselves.
I’m sorry that your business failed, but I hope that something good comes out of this.
Also- I’m not saying that any part of your responsibility in burning out this person was ok. Just that not all of it is your fault.
No, they should take it personally. Leadership is 9/10 times the culprit.
Props for the postmortem.
Grammerly will tell us:
Despite being more popular than “lessons” in the corporate setting, “learnings” is still incorrect. It's an erroneous plural form of the colloquial term “learning.”
~ https://grammarist.com/usage/learnings/As a business-speak buzz-word it might fade, or it may end up with a greater global footprint outside of the Biz-speak Babel tower.
People missed the joke that it was poor English on purpose.
I wish artists would stop with the "it stole our work bullshit" and just be more honest about the "it can do what we do and we're terrified and scare for our future" part.
Because that I can 100% understand, and contrary to previous jobs just disappearing, we do live in "the future" and things like UBI or free cross-training should be available for this sort of thing.
What's more, their reasoning for abandoning the company was to build out another company with a suspiciously similar idea...
Maybe they're wrong but I tend to agree. Or even if it is possible to do it ethically, it still never will be done that way because there's just too much money in behaving unethically
i generate hundreds of images weekly for video content and the honest truth is i never think "i want this specific artist's style." i think "i need a documentary still that looks like 1970s film grain" or "i need a character that matches my last 50 frames." consistency and speed matter way more than provenance. the few times i tried artist-specific fine tunes the quality was noticeably worse than just prompting a good base model well.
the 6.5% artist signup rate buried in there is actually the real story. they cold emailed 325 high end editorial artists and got 21. those artists didn't want passive income from AI - they wanted AI to not exist in their market at all. paying someone royalties to automate away their livelihood is a weird value prop no matter how you frame it.
And I think same could happen to LLM. If it took all the fossil fuel on Earth just to barely able to drive a car to a car wash, there's more things wrong with the car than in the oil price.
To a degree it is protected, but not by copyright. Design patents are a thing and companies have sued each other over them (Apple vs Samsung during the "smartphone wars" comes to mind)
I'm staunchly against expansion of IP laws. But I personally think that when a corporate machine gobbles up an artist's works so that people like me who can't draw can generate silly memes for a few bucks a month, the artist should be compensated. The company is profiting off of other people's work! That's not right.
The mechanism by which compensation is calculated appears to be an unsolved problem currently though.
What's wrong with it?
We live in an interconnected world. Every company or individual who profits off anything does so, in very large part, thanks to work left behind by others that they don't directly compensate each other for.
Stated differently, if we look at the other side of the coin, it's one thing to create value, and another thing to capture value. If you are a business (and artists seeking profit are businesses), you create value then try to capture that value. Creating value and trying to capture (in the form of profit) is the entire name of the game. But no business captures 100% the value they create. If you make a product/artwork/service/whatever and release it to the public, lots of people may use it, view it, be inspired by it, learn from it, and ultimately profit off it in their own way without you necessarily being able to capture some part of it. And what's wrong with that?
Do we really want the entire world to be endlessly full of cookie-licking rent seekers who demand profit every time anyone does anything? Because they failed to capture the value they created, and thus demand a piece of the pie from those who are better at capturing value?
I like the way Thomas Jefferson put it:
> If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it. Its peculiar character, too, is that no one possesses the less, because every other possesses the whole of it. He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. That ideas should freely spread from one to another over the globe, for the moral and mutual instruction of man, and improvement of his condition, seems to have been peculiarly and benevolently designed by nature, when she made them, like fire, expansible over all space, without lessening their density in any point, and like the air in which we breathe, move, and have our physical being, incapable of confinement or exclusive appropriation. Inventions then cannot, in nature, be a subject of property. Society may give an exclusive right to the profits arising from them, as an encouragement to men to pursue ideas which may produce utility, but this may or may not be done, according to the will and convenience of the society, without claim or complaint from anybody. Accordingly, it is a fact, as far as I am informed, that England was, until we copied her, the only country on earth which ever, by a general law, gave a legal right to the exclusive use of an idea.
Kapwing is specifically designed for artists to share IP with other people in an IP-friendly and financially profitable way. A 'consumer' on Kapwing is not the same as an ordinary person browsing for AI generated art, and the fact that people who make money from selling their IP on there are in favour of expanding IP law shouldn't be a surprise.
All this really tells us is that Kapwing's artist community believe protecting their individual art style is more valuable to them than any money they'd earn from licensing it on a per-image basis to Kapwing's AI tool. I'd be willing to bet that if Kapwing changed the offer to a flat-fee-of-$50,000-a-year-plus-per-image-fee they'd find 99% of artists on there changed their minds. As with most things, people feel strongly about their rights all the way up until the price is right.
Since the didn't, they should go to jail. The same way I would have gone to jail if I built Sora in my basement and sold it to the public.
That is the gap in the legal landscape.
Here, for example, any comment is open to read and respond to. On ArXiv any paper can be downloaded, read and cited. Wikipedia contains text from many thousands of editors, building on each other. We like collaboration more than asserting our exclusivity rights. That is why these places provide better quality than work for direct profit or, God forbid, ad revenue, that is where the slop starts flowing.
But including your art in the training data is fair use (or otherwise exempt) by most standards, as no reproduction occurs. You are advocating for a change to IP law to make it more restrictive.
It's not. This total depends on how you ask it.
Q: Do you think artists deserved payment?
A: YES.
Q: Will you pay for art?
A: MAYBE.
Q: Do you think people should go to jail not paying for art
A: NO.
I have rarely been as disheartened as I am by the transformation of Studio Ghibli's beautiful art style, painstakingly developed over decades, into a heap of slop-trash that actively erases the human connections so artfully depicted in Hayao Miyazaki's work.
All that sorrow and it's not even my style.
So, no - a human who's willing to draw an illustration in a particular style, perhaps one they live and admire, is not necessarily a hypocrite for seeing genAI's ability to produce billions of images in that style.
Don't forget how polling works. Change the wording of the question and you get a different answer.
Try asking them if they think Comcast or Sony should be able to sue individuals for posting memes that don't even contain any copyrighted material.
The Ghibli thing is a great example; who's still actually doing that? It was a passing fad.
But let's not pretend that that passing fad has changed the fact that Ghibli's films are absolute masterpieces that will continue being enjoyed by generations to come. Because they are and still are.
We throw plenty of smart people at plenty of hard ideas. If a company really wanted to, they would find a way to make this feasible.
But the problem of attribution is easily understandable to any human with a modicum of intelligence.
Imagine that you have a trillion input images, with every single one having a source associated. When training they go through lots of processes and every single image contributes a varying degree to a subset of 8billion parameters. That alone would produce a dataset that is 1T * 8B to just say how much a particular image contributed to the output...
To mimic intelligence the output is also randomized - the association is not static and every single pixel in the output has it's own lineage.
So as you can probably imagine that to calculate the actual source weights on the output you'd require to do at least 8e+21 calculations per output pixel... and require double precision floating point while you do it.
We know how to do it. It's just ridiculously expensive.
(The above example is for demonstrative purposes only)
If it all was non-profit - then no one would raise the ethical issue.
I still think cutting artists livelihood from under them with tooling built on top of their work is unethical no matter how you cut it
No one is stealing anything. It's not theft. There has been no crime. None of this is anywhere near criminal law.
I could make a more nuanced argument on copyright infringement. But to make that steelman, I'd need to accept a too large overton window shift, so I'll decline to do so here.
The problem is that you're putting it in the wrong legal framing, and it just won't fly. Willing to engage, but not on these terms.
Music conglomerates have money and their lawsuits will probably settle the issue.(unless they settle) That will be applied for all copyrighted works, regardless of the medium.
I believe going against the big guys is the reason why the big ones don't yet have music generation LLMs.
Just because you obfuscate what's happening by calling it "learning" and pretending your model is actually just looking at pictures the same as a human, doesn't make it true.
I can assure you, that you didn't grant a license with an exclusive list of operations that can be performed on your work of art. At best you may have had something like "no commercial use" clause and general broad terms.
I'd prefer looking at what (potential) consumers actually do rather than what they say. "Saying" is a really weak signal.
I am one of those people: 1. Absolutely despise the lightroom being subscription and 2. Haven't switched yet.
There are moats and capabilities and friction. Not every vote with your wallet is a ringing endorsement. I have 15 years of lightroom databases over 100k photos so switching is hard. At the same time those are from the time I did a photography side gig, now I don't so monthly cost for no monthly gains really peeves me.
So it absolutely is a successful business decision and it absolutely is widely despised by customer base. Both are true :-)
Where did you get that idea. Global economy is ~200T/year PPP. 0.1% of that split across every artist you want the training data from would be insanely difficult for the vast majority of them to turn down. Which makes sense as art isn’t that big a percentage of the global economy compared to say housing, food, medical care, infrastructure, military spending etc.
Obviously the incentive to take without compensation is far more appealing, but that doesn’t mean it was impossible to make a reasonable offer.
However, ultimately nobody is going to pay them more than the value of their posts to the AI company which puts a severe cap on what that’s actually worth. People who post a great deal of online content might be worth compensating a few thousand dollars, but it would be hard for them to then turn that down.
Hard cap of 200B divided by 1M equals 200k, and that would be sure more reasonable, but we aren't hearing artists responding favorably to hypotheticals in that range, so I'm skeptical that "ain't nobody gonna turn that down".
I think the vast majority would agree to let AI companies train on their art for 10k let alone 200k. Don’t forget the average global salary is way below what you see in the US.
Put another way how many people would turn down 6+ months salary. Of course the vanishing tiny percentage people care about would want more, but that’s a separate question and not particularly valuable to AI companies.
That's kind of an interesting concept: "since the scale of my transgression was so big, I should get away with it scot-free."
Didn't that exact social experiment took place in the US last year? I thought the result of that was disastrous if media reports are to be believed.
OTOH I remember creator of Wordle closed the "low few mil" deal instantly, so I do believe it unlikely that people turn down few _hundred_ months worth of salary. But those artists are not from regions with 50-100x less median income and/or wider income distribution relative to US - I think they're concentrated in relatively high-income-low-disparity regions - so I don't think there's backwater wherever that lifetime income there is equivalent to no more than 6 months worth in US that has abundant supply of artists.
And IMO those artists are basically engaged in a geo-scale dumping of media contents. It's the same phenomenon as how moving consumer electronics manufacturing to US instantly multiply costs by small integers instead of just incurring premiums in percentages. If that phenomenon were to be quenched and those effects were integrated into economy anyhow, that will change the global balances of power to some statistically significant degrees, like, we'd be seeing flying rocket amphibian McBoatfaces everywhere. That might be interesting, but I'm not sure if that's an interesting kind of an interesting thing to see.
That’s really not a reasonable comparison to what is being sought.
As to global artists, I was suggesting the majority of artists globally make ~20k USD or less per year as artists. To get to millions of artists you need to use a generous definition, so now Hollywood is full of actors how many of them made 20+k last year as an actor? If you disagree fine let’s double it and 6 months salary is still only 20k and would I suspect be a seriously tempting offer when you retain all rights to past and future works.
The starving artists I know would be extremely happy to get even one cookie to lick. I know an artist prodogy that paints on canvas and has work in a sizable gallery, at least one institution patron, and is constantly hosting paid workshop events. He architected and built his own custom 40ft ceiling pine art house covered in beautful stained wood and arches, with large metal cuttings and engravings of wild horses on the railings. This artistic prodigy is still starving and works as a handyman/construction worker part-time. He is strongly opposed to AI, by the way.
Most artists are "starving artists"; there are extremely few artists that can support themselves by their creations alone. Many artists make no money at all, and many artists seem to work or create alone as individuals, meaning that they almost always lack the funds or community resources to protect their creative work.
But, sadly, art isn't an easy business. There is a tremendous supply of art, and not enough demand for it. In other words, it's very competitive.
So, just like every other competitive industry on earth, if you're going to be in the art business, then simply making a good product/service isn't enough. You have to think about marketing and sales and differentiation and distribution and strategy and the whole big picture.
There are plenty of starving artists who make incredible art that nobody pays for, just like there are plenty of starving startup founders who build well-coded apps that nobody pays for.
No, far better that we have four rent-seekers who gobble up everything that anyone is naive enough to share with the world, then turn around to demand profit in order to keep up with the new pace of the world that they’ve created.
PS: I categorically disagree that AI developers are rent-seekers, unless they require rent for the products their models generate
Imagine a genius person went out, trained himself by studying all the great art, and in doing so gained the ability to paint or draw in just about any style. That wouldn't be a copyright violation. In fact, plenty of people do this, or some version of it. But we say it should be illegal for AI... why, exactly? Because artists don't like it? Because it's "too good"?
If we're going to keep copyright around, then imo what should be illegal here is using the AI as tool to create copies of people's work. Like, if I go out and use AI generate a book of poetry and it's got a bunch of Beatles lyrics in it, fine, sue me.
AI companies want a license to train not ownership of a portfolio.
Edit: haven't followed the law in a while, but you could definitely copy, digitalize and scan documents for yourself and your friends (copia privada).
There is a reason why we call it styles, because it’s a recognizable pattern someone came up with maybe after decades of work.
You don't even need to have a legally acquired source material to produce work in a certain style.
The new reality allows for original creators to actually track the chain, so we're in this situation.
If one or two people take an apple from your tree it’s not a big deal, if a machine takes 10,000 it is.
So if scan a book you are making a copy. In some copyright jurisdictions this is allowed for individuals under a private copying exception - a copyright opt out, if you like - but the important thing is private use. In some jurisdictions there is also a fair use exception, which allows you to exploit the rights protected by copyright under certain circumstances, but fair use is quite specific and one big issue with fair use is that the rights you are exploiting cannot result in something that competes with the original work.
Other acts restricted by copyright include distribution, adaptation, performance, communication and rental.
So if you copy a book, digitize it, and write a program to analyze the word frequencies it contains you may, in some jurisdictions but not all, be allowed to do this.
If you’re doing it locally on your own machine you are simply copying it. If you do it in the cloud you are copying it and communicating the copy. If you copy it, analyze it and train an AI model on it that could be considered fair use in certain jurisdictions. Whether the outputs are adaptations of the training data is a matter of debate in the copyright community.
But importantly if you commercialise that model and the resulting outputs compete with the copyright protected material you used to train, your fair use argument may fail.
So when you buy a book you are actually party to what is effectively a licence granted by the copyright holder, albeit it to the publisher. But as the end user of the book you are still restricted in what you can do with that copyright protected work, through a universal end user licence encoded in law.
The four factors of fair use in the US:
> the purpose and character of your use
Commercial, for-profit. Not scholarship, not research, not commentary, not parody, etc.
> the nature of the copyrighted work
Absolutely everything. Artistic, creative, not purely factual.
> the amount and substantiality of the portion taken, and
All of it, from everyone.
> the effect of the use upon the potential market.
Directly competing with those whose data was copied.
An LLM doesnt compete with Art the same way that Photoshop doesnt compete with Art.
>All of it, from everyone.
With the result that anything produced by the LLM does not reproduce any single source in its entirety (and where compelled if they are able to do that is a bug not a feature)
Fair use is too specific tbh, rather than ruling it fair use (which seems to be where things are going) it should just be ruled "use". There's nothing wrong with building a mathematical model using available data.
Yes, it does. Many people are using AI-generated works in places where they originally would have either paid an artist, programmer, or other creative professional, or done without. Many companies are claiming to reduce staff because of AI (whether that's true or an excuse). There is plenty of evidence that AI is directly competing with various individuals, businesses, and industries.
> With the result that anything produced by the LLM does not reproduce any single source in its entirety
You do not have to reproduce sources in their entirety to produce derivative works.
> All of it, from everyone.
Yea I'd like to see how drawing two circles violates the copyright of drawing one circle!
In addition, the idea that you need to pay rent on *your observation* of someone else's work is absurd. No one pays Newton's descendants for making lifts or hosting bungee jump sport activities.
So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?
> In addition, the idea that you need to pay rent on your observation of someone else's work is absurd.
I agree that's absurd. But training a model is no more "observing images" than an F1 car is "walking" down a race track. Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human. That comparison you're making is the real absurdity.
Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.
Like LLM's, it retains the produced index but not the original data.
The big concern is whether producing an LLM is competing with artists directly, but as artists dont make LLMs, this seems to be consistently ruled as non competing.
People _do_ use LLMs to make art in someone else's style (knowingly or unknowingly) and claim it as their own creation.
Also, I wouldn't say the creators of LLMs are competing with artists. The users of LLMs are. Arists don't make LLMs, they make art, and people who use midjourney and such make art.
But I'd argue that creators of LLMs are still liable for the harm people cause using their tools. Perhaps not legally, but certainly ethically.
It shouldn't be!
The model works by training on what features humans can make sense out of the image they're presented with, if the image and the observations of the image's feature were clear/observable enough. Then the generation makes use of those observations. I'm just using 10% as an arbitrary number to describe proportions. If the generation were 100% of the observations from the same image, the model would be overfitting, and many would have deemed it to have produced a copy.
> Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human.
WTF does this even mean? A race car uses concepts from Newton, just as how a human uses gravity to train it's muscles to move be it knowingly or unknowingly. But you don't see them (car makers/humans) paying rent to Newton after he discovered gravity. Come on!
If I buy a book and use it to prop up a table, the author likewise does not own the table, or any works I undertake on that table.
If I buy a book and rip out the pages to make a collage, the US is the only legal jurisdiction where I run even slight risk of civil penalties.
An LLM is downstream of a book. Using a book to make an LLM does not confer any rights or privilges towards the LLM on the original author, just as using a hammer or nails dont permit the hammer or nail manufacturers any royalties on what I make, even if I build a hammer making machine with them. Theres no right to the works of people who build on your work without reproducing your work, at least outside of strict copyleft.
Its like demanding a cut from people who learned how to use photoshop by watching your photoshop tutorial youtube videos.
This is why the most successful cases against LLMs have been on the "Did they purchase the book" side of the fence, and not on the "What did they do with it" outside of the one case, where the legal company tried to use the LLM to 1:1 reproduce the content they had a limited license to, but thats obviously a no go and they should have known better.
> Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?
If you meant it literally.. I'd think that such a version would be a sort of parody. It'd be up to lawyers doing their cross-examinations to prove the work was intended for such a purpose though..
> Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.
Probably a lawyer would answer this better than me, but the 'content' is the same and would violate copyright. There's also other factors, like if it was translated/distributed for free.
Besides that I regard that LLMs to hold mathematical observations in contrast to a translated work. So long as the user ensures the output isn't close to what's already available imo it fits the transformative criteria.
I can just as well say that a translated work contains "linguistic observations". In fact a translator has to do a lot of transformative work in order to translate a text.
An LLM just takes a set of texts, looks at n-gram distributions, and generates similar text. It is quite literally a fuzzy way of copying. There aren't any mathematical observations in the output. Any math (statistics) is done in the copying process.
Oh even if it's not a parody it would look transformed enough that a first-time reader would be getting a completely different interpretation of the story* compared to the original source. And that's all that matters.
> There aren't any mathematical observations in the output. Any math (statistics) is done in the copying process.
Wrong. Weights, which these models comprise of, are literally numbers to an extensive mathematical equation.
> It is quite literally a fuzzy way of copying.
And no one knows/there is no consensus on what a 'fuzzy way of copying' is. It is either copying or it is not. You could say that training an LLM is abstracting and integrating various text into it's weights, hereby transforming the source material and again transforming it a second time via integrating it into its weights.
Even if it involved copying that isnt immediately an issue. Its the distribution of a copy thats an issue. And if you look at the data side by side, you can see that while copying might be part of the process of creating an LLM, the LLM is not a copy of its source material.