AI is just unauthorised plagiarism at a bigger scale

AI is just unauthorised plagiarism at a bigger scale(axelk.ee)

231 points by speckx 1 hour ago | 110 comments

iloveoof 9 minutes ago |

I don’t know if this author supports OSS but I’ll share this because HN generally is full of people with that mindset.

It’s deeply ironic that if you forget about LLMs and look only at the outcome—-we’ve found a way to legally circumvent copyright and the siloing of coding knowledge, making it so you can build on top of (almost) the whole of human coding knowledge without needing to pay a rent or ask for permission—-it sounds like the dream of open source software has been realized.

But this doesn’t feel like a win for the philosophy of OSS because a corporation broke down the gates. It turns out for a lot of people, OSS is an aesthetic and not an outcome, it’s a vibe against corporate use or control of software, not for democratized access to knowledge.

Cyph0n 4 minutes ago | |

> without needing to pay a rent or ask for permission

Firstly, the ability to “build” the best and most capable software is still locked behind frontier models, so rent is still and will always be due.

Secondly, OSS is about giving users the option to be in control of and have visibility over the software they run on their machines.

But that doesn’t mean that humans do not want or deserve recognition for the work they do to provide these libraries and tools for free, which is IMO partially why copyright and attribution are critical to OSS as a movement.

Nursie 7 minutes ago | |

I’m not sure this stands up to much examination when looking at (for example) copyleft, which seeks to give people access to source of binaries they are running. If an LLM can, for the sake of argument) spit out copyleft code which is then used on closed systems, we’ve done an end-run around the protections keeping that open.

dvduval 36 minutes ago |

The broader problem of original sources not being given credit in a way that rewards them remains. Websites owners are paying to host their content so that spiders can come and crawl them and index it into the AI and then if they’re lucky, they might get a citation, but otherwise there’s very little reward for being a provider of content. And of course, this is something that’s getting worse and worse. Why look at a website when it’s all in AI? And then the counter to that is maybe we need to start closing the website to crawlers and put everything behind a login.

Ensorceled 26 minutes ago | |

Worse, the constant AI scraping is actually costing content providers additional money for no return. At least Google/Bing/Yahoo scraping would then be used to provide links back to your content.

motbus3 26 minutes ago | |

About a year ago OpenAI crawled and go DDOS level the company I work. Even despite the robots.txt not allowing it, and despite some recaptcha we could assemble in time.

We found our data in the outputs of their models but who can do anything about it...

WarmWash 4 minutes ago | |

It's never been a problem with people ad-blocking for the last 20 years, why is it suddenly a problem now?

We've been celebrating denying creators revenue for decades...

Maybe this is just the internet hypocricy of "When I do it, it's good, when they do it, it's bad".

aaarrm 5 minutes ago | |

Is it possible able to host your website in a way so that it couldn't be found via search engines (and thus wouldn't be crawlable I hope)?

I know this has repercussions on findability, but if that wasn't a concern, I'm curious how one might circumvent getting crawled.

trinari 2 minutes ago | | |

robots.txt is a way of leaving the door unlocked but kindly asking bots to stay outside.

spacechild1 4 minutes ago | |

It's actually costing them money/time! A friend of mine is a sysadmin at a university and he constantly has to deal with AI crawler DDoS-ing his servers. He said Anthropic is actually one of the worst offenders.

These AI companies are really just a gross example of the motto "Socialize the costs, privatise the profits". It's disgusting!

wolttam 23 minutes ago | |

I’ve been thinking of a proof-of-work scheme for accessing content where you effectively need to mine some crypto for the author, but, this idea might not fly today

microtonal 15 minutes ago | | |

But that will be a hassle for human visitors as well. A web doing proof-of-work to browse, will be a disaster for phones with their limited batteries, etc.

chii 4 minutes ago | | |

or you know, just charge for your content if you believe it to be valuable enough for the fee being charged.

pluc 29 minutes ago |

Seriously how is this surprising? We all know AI companies stole troves of data to train their models, why do you think they'll stop? Have they faced consequences for the mass theft of copyrighted data?

You can't steal or profit off of that data, but it's fine for them for whatever reason. I guess because they're a force for good in the world and are pushing humanity forward eh?

CivBase 8 minutes ago | |

> You can't steal or profit off of that data, but it's fine for them for whatever reason.

The reason is quite simple. When Microsoft steals YOUR work, GDP go up. When YOU steal Microsoft's work, GDP go down. And the people who create and enforce our laws want GDP to go up. To these people morality and rights are a thin guise that can be conveniently discarded when it's invonvenient for them.

skrebbel 8 minutes ago | |

Everytime something gets posted on HN about a bad or unfair state of affairs, some cynical nihilist posts “doh why r u surprised” and I’m sick and tired of it. These comments aren’t insightful, helpful or thought-provoking. You’re just helping a bad situation stay bad.

stronglikedan 10 minutes ago | |

> it's fine for them for whatever reason

the reason is crony capitalism. I wish I knew what the fix was

deaton 35 minutes ago |

"Steal an apple and you're a thief. Steal a kingdom and you're a statesman." - Literal Disney villain

pluc 28 minutes ago | |

"AI should be more ethically like Stalin"

https://en.wikipedia.org/wiki/The_death_of_one_man_is_a_trag...

ggillas 12 minutes ago |

IP attorney here and actively working on this problem.

nla: if you create content online (public repo code, blog, podcast, YouTube, publishing) the smartest thing you can do if to file a US copyright, even if you have a hobby blog.

Anthropic paid $1.5B in a class settlement to authors because it was piracy of copyrighted works. If we as a HN community had our works protected, there are potentially huge statutory damages for scraping by any and all llms. I work with hundreds of writers and publishers and am forming a coalition to protect and license what they're creating.

sosuke 4 minutes ago | |

I'll bite. I have always been told copyright is inherit. Does it cost money to file a copyright? Do I need to do it for each blog post? For each gist? I'll totally setup some scripts to make it happen if it what actually needs doing to have the copyright I expected.

Edit: remember not to down vote ideas you disagree with. I think it was only down vote things that lower the discourse

stronglikedan 9 minutes ago | |

Doesn't the mere act of publishing your original content online grant you copyright?

Kye 6 minutes ago | | |

Statutory damages require registration.

indigodaddy 7 minutes ago | |

No one will ever do this, or definitely not enough people will, so what's Plan B?

mort96 9 minutes ago | |

Wait what do you mean by "file a copyright"? I have never heard of this, all explanations of copyright I have heard say that you automatically own the copyright to the things you make; and that "all rights are reserved" by default unless you give up on them through granting a license. Is this no longer the case? Why is this now suddenly different? When did it change?

MontyCarloHall 11 minutes ago |

Did You Say “Intellectual Property”? It's a Seductive Mirage. [0]

[0] https://www.gnu.org/philosophy/not-ipr.html

hparadiz 20 minutes ago |

You guys have fun arguing. I'm gonna be building cool stuff.

matt_kantor 3 minutes ago | |

[delayed]

jayd16 9 minutes ago | |

Still waiting for this massive wave of cool stuff.

esikich 5 minutes ago | | |

You're acting as if developers haven't been using AI to build for years already.

kzrdude 6 minutes ago | | |

There's a massive wave of stuff, at least. Sorting it, is not easy.

stronglikedan 7 minutes ago | |

> I'm gonna be building cool stuff.

hardly. at best you're going to be asking a robot to build questionable stuff with other people's LEGOs

parliament32 11 minutes ago | |

I'm happy for you, but please, for all of our sakes, keep it to yourself. Don't make a public repo, don't post links. Go sit in the corner by yourself with your slop generators and leave the rest of us alone.

adamzwasserman 32 minutes ago |

People need to cope with the fact that no thought is original. Even Newton and Leibniz were having the same thoughts at the same time. Get over it.

saghm 19 minutes ago | |

When did the last original thought happen then? Clearly thoughts must have been original at some point, or there wouldn't be any at all

dmoose 9 minutes ago | | |

When did the first homo sapiens exist? Ideas like species evolve. Saying there are no original ideas seems to me an attempt to glibly capture something quite fundamental.

dooglius 11 minutes ago | | |

Technically one of {Newton, Leibniz} was first, but you're missing GP's point

brazzy 7 minutes ago | |

OK, and the AI labs are open sourcing their frontier models since those are not original either. Right? RIGHT?

kelseyfrog 24 minutes ago | |

Why post comments then?

stronglikedan 6 minutes ago | | |

same reason we do anything else - sweet, sweet dopamine

voidfunc 20 minutes ago | | |

For funsies

nicman23 23 minutes ago | | |

Why post comments then?

krystalgamer 21 minutes ago | | |

reiteration is still important

analog8374 22 minutes ago | | |

to bring attention to certain ideas

LatencyKills 12 minutes ago | |

Having an original thought is in no way related to breaking copyright laws.

I don't think we should "get over" the fact that modern SOTA models couldn't exist without being trained on protected works.

IcyWindows 8 minutes ago | | |

I'm trained on protected works. Do I need to pay royalties?

kstenerud 49 minutes ago |

> their article contains links to my actual website, with the exact link text (?!)

I'm having a hard time understanding what's wrong here? Unless the link text is very long, why would someone linking to your article use different words for the link text?

NDlurker 44 minutes ago | |

Right, that's quoting and citing a source.

420official 22 minutes ago | |

Sometimes links take the form of `.../post/{id}/{extra-text}` where `extra-text` is not used at all to match the post. Amazon links are (used to be?) this way where the product name is added to the end of the link but can be removed or changed and still will route to the product. Maybe the author is surprised the LLM is providing the irrelevant portion of the link verbatim.

jp_sc 25 minutes ago | |

I think he's saying he uses his website's URL in his tutorial examples, and other tutorials have copied them as-is

joshred 30 minutes ago | |

I think they probably had the section header link back to their webpage, or something similar to that. This is not a well-written rant.

some_furry 6 minutes ago | |

Imagine you have two web pages.

One is a recipe for apple fritters, and the other is an informal ranking of apples by flavor.

Let's say your apple fritter recipe links to your apple ranking list.

Later, you discover someone copied your apple fritter recipe without credit, but it still links to your apple ranking list, using the same wording as your recipe. They're getting more Google SERP juice and ad revenue than yours, despite stealing your article.

Do you see the problem?

tptacek 49 minutes ago |

People were effectively copying websites (especially ecommerce tutorials) and beating the original authors at SEO decades before ChatGPT 2.

baq 26 minutes ago |

turns out plagiarism at scale can solve Erdos problems

paulgerhardt 3 minutes ago | |

Some lesser god of protein folding is big mad we just copied her homework instead of spending 6 billion years in the lab like she did.

saghm 18 minutes ago | |

Not before falsely claiming that it solved some before when it turned out to have just replicated some from existing literature: https://techcrunch.com/2025/10/19/openais-embarrassing-math/

peterbell_nyc 29 minutes ago |

I do just want to highlight that this is also what humans do. We read a bunch of content online and then use it in our work product. The vast majority of the value that I provide comes from copyrighted information that I have ingested - either directly with a payment to the creator (bought and read the book, paid for and attended the seminar) or indirectly via third party blog posts or summaries where I did not then pay the originator of the materials.

I think there are real questions around motivations for creation of novel, high quality valuable content (I think they still exist but move to indirect monetization for some content and paywalls for high value materials).

I don't inherently have any problems with agents (or humans) ingesting content and using it in work product. I think we just need to accept that the landscape is changing and ensure we think through the reasons why and how content is created and monetized.

motbus3 27 minutes ago |

It allows data do be compressed into the weights and the mere coincidence of certain strings of a book will make it spit the full book

_-_-__-_-_- 22 minutes ago |

Recent thoughts, https://theonlyblogever.com/blog/2026/distrust.html

cryptocod3 51 minutes ago |

There's authorized plagiarism?

ozonhulliet 34 minutes ago | |

Sometimes language is tautological. Just because you specify "unauthorized" does not mean the opposite exist.

moralestapia 39 minutes ago | |

Why do you ask?

I'm curious, as the article is clearly not about that.

cryptocod3 4 minutes ago | | |

Not really a question, I was just pointing out that "Unauthorised plagiarism" is redundant.

rigonkulous 43 minutes ago | |

Nearly all code involved in building new things is 'plagiarism', too.

We stand on a lot of giant shoulders.

But what I think distinguishes an act between plagiarism and acceptable use, is whether or not the agency of both parties is promoted. I'm not plagiarizing you if you give me your information with the agreement that I can freely use it - or, indeed, if you give me information without imposing a limit on how it can be used, this isn't plagiarizing, either.

Essentially, AI is removing the agency over information control, and putting it into everyones hands - almost, democratically - but of course, there will always be the 'special knowledge owners' who would want to profit from that special knowledge.

Its like, imagine if some religion discovered a way to enable telepathy in humans, as a matter of course, but charged fees for access to that method... this kills the telepathy.

Information wants to be free. So do most AI's, imho. Free information is essential to the construction of human knowledge, and it is thus vital to the construction of artificial intelligence, too.

The AI wars will be fought over which humans get to decide the fate of knowledge, and the battles will manifest as knowledge-systems being entirely compatible/incompatible with one another as methods. We see this happening already - this conflict in ideological approaches is going to scale up over the next few years.

onion2k 13 minutes ago |

Fuck Google for ranking some copycat website higher than mine, even though they copied my article.

This has been happening since Google launched in 1998. It was probably happening when we all used Hotbot and Altavista. It isn't really an AI problem, save for the fact that the automated production of copycat articles now reword things a bit.

ProllyInfamous 24 minutes ago |

>>"The underlying purpose of AI is to allow wealth to access skill while removing from the skilled the ability to access wealth." @jeffowski (first I read it, not sure if author)

Bezos' admission, recently, that the bottom 50% of current taxpayers ought'a NOT pay any taxes... is just preparing us for the inevitable UBI'd masses.

: own nothing, be happy!

saghm 23 minutes ago |

It's basically the same thing as the old joke "if you owe the bank a million dollars, you have a problem; if you owe the bank a billion dollars, they have a problem". IP law seems to always be disproportionately wielded against smaller players, and the ones who are big enough get away with it.

pennomi 15 minutes ago | |

That’s why IP law was a cool concept but ultimately harmful in practice. Anything that can be copied for free cannot truly be “owned”, can it?

mrbluecoat 35 minutes ago |

> AI ... do some "learning"

Is AI plural or is that a typo?

saghm 16 minutes ago | |

Rarely is the question asked: is our AI learning?

(For those not familiar: https://en.wikipedia.org/wiki/Bushism)

beej71 32 minutes ago | |

I can imagine it plural.

"The AI are attacking!"

"The AIs are attacking!"

schwartzworld 17 minutes ago |

Let this sink in: I wanted to open source a package at work at needed approval from legal and other teams to make sure I wasn't leaking anything proprietary. The same executives that worried about proprietary, copyrighted code being leaked 10 years ago are now mandating using the plagiarism machine.

The whole AI bubble is The Emperor's New Clothes, and it feels liek more people are finally admitting it.

NetMageSCW 40 minutes ago |

Reading is just unauthorized plagiarism.

dwa3592 33 minutes ago |

Plagiarism by default is unauthorised so I think the title should be "AI is just authorised plagiarism". It's authorised by the markets, the governments and the society at large.

ghaff 27 minutes ago | |

While there are no hard boundaries (and the attribution guardrails depend on the situation), people of course loosely--and even not so loosely--use information, ideas, and even expressions from others all the time and that's considered pretty normal. And, if you don't want that to happen, don't publish/disseminate something.

Of course, if you quote a paragraph in a book, you're generally expected to attribute it.

dwa3592 14 minutes ago | | |

>>Of course, if you quote a paragraph in a book, you're generally expected to attribute it.

100% agreed.

>>While there are no hard boundaries (and the attribution guardrails depend on the situation), people of course loosely--and even not so loosely--use information.

Exactly - I have not seen LLMs attributing their knowledge unless it's a legal or health related matter. Yesterday I asked the question[1] to claude and gemini - and they both gave an identical answer. It reminded me of the Hive mind paper which was one of the top papers at Neurips. None of the answers contained any sources or attribution to where they got that information from. I think these companies took what was someone else's property and created an artifact generator on top of it. I think their artifact generators are plagiarizing; they do rephrase mind you but in my mind they stole this information without having an ounce of regard for the humans behind the training data. If you don't like using the term 'plagiarizing', we can use some other word but the gist remains pretty close to it.

[1]- In human history - has there ever been a time when private armies or private companies were as strong or stronger than the ruling government/kings?

Findecanor 25 minutes ago | |

What makes you say that? Which governments? What society?

The current US government is not representative for governments out there in the world, you know.

dwa3592 10 minutes ago | | |

Society - as in population; people are using AI more and more everyday.

Governments - I did not mean US government. I meant general government bodies. I have not seen any critical impact assessments of AI by any of these. or they haven't reached me yet. if you know of any please let me know. I have, however, seen a lot of support by the governments for AI companies.

tiahura 41 minutes ago |

To answer the author's question: Yes, progress IS largely built on the shoulders of those who came before.

dana321 28 minutes ago |

Breaking the law to start a large company seems to be the norm

asklq 33 minutes ago |

Yes, of course it is. If the model is built on all human information, then it is by definition a derivative work of all human information and as such violates IP.

Currently politicians don't understand this and listen to the criminals like Amodei, but it will change.

It took a while to deal with Napster etc., but the backlash will come.

kolinko 14 minutes ago | |

Napster may not be the best analogy for you.

Napster broke down record companies' monopolies on music, and pushed them to finally implement streaming, but also make music worldwide basically free.

Even if its creator lost the lawsuit, and Napster was no more, it pushed musicians and studios to do something that they were reluctant otherwise.

So it was a success by making music free, even if as a product it turned out to be a failed one.

booleandilemma 19 minutes ago |

This site is strange. I'm pretty sure there's lots of AI shilling happening on it. I don't think the opinions here are authentic, they seem to be opinions that the AI company CEOs would hold, not the disenfranchised 99%. I used to trust HN, I'm not so sure I can now.

Pennoungen0 31 minutes ago |

Yeah AI just actually plagiarize everything lel, sometimes even the source are..full of question and worst, my academical use it as a source...welp

lukasbm 34 minutes ago |

If i tell my friend a synopsis of a book, i am not stealing from the author, what is this take lmao

NicuCalcea 24 minutes ago | |

If you read a book and then retell it to your friend pretending you came up with it, it is plagiarism. If you write down the book almost word-for-word [0] and send it to your friend, it is stealing.

0: https://arxiv.org/abs/2601.02671

bparsons 16 minutes ago |

I am old enough to remember when the US insisted that it was superior to China because they believed in the rule of law and sanctity of intellectual property.

analog8374 23 minutes ago |

language is just plagiarism

brookst 20 minutes ago | |

I’m going to steal that

metalman 37 minutes ago |

it's a spiral into a finite hall of mirrors, where at the end is somebody with a gun

andy12_ 49 minutes ago |

Someone blatantly copied their tutorials but ChatGPT is to blame, somehow? The accusation here isn't even that ChatGPT learned from their tutorials and then generated them verbatim. The accusation is that someone copied the whole article and rewrote it with ChatGPT (which they could have done manually without AI anyway).

JohnHaugeland 43 minutes ago |

the court disagreed

Ecys 40 minutes ago |

No, it takes input, then SYNTHESIZES (very importanttt!!!!!!!) its own output.

Reading a dictionary and making a sentence is not plagiarism. Cope.

masswerk 36 minutes ago | |

Rather: composes (or: re-sequences). Synthesis requires reason and essential capabilities, like an empirical a priori judgement. Without concepts, meaning or imagination, there's no synthesis.

Gormo 16 minutes ago | | |

The point is that the AI inferencing is equivalent to a person reading half a dozen separate papers, comprhending the basic concepts of each, relating them together into a mental model of the topic, and then writing an essay that summarizes the basic points. The person isn't plagiarizing anything here, but engaging in research, understanding, and synthesis of various sources of information.

The person absolutely does have the advantage of having empirical awareness and the ability to test their conclusions against external reality. But lots of people do engage in "research" and build mental models of various topics with little or no empirical context, and rely mainly on digesting calcified knowledge from other people.

vb-8448 20 minutes ago | |

I guess it's most appropriate so say "LOSSY COMPRESS".

austinthetaco 27 minutes ago | |

I just want to call out that this is a weirdly hostile and aggressive comment for a place like HN. HN is mostly used by working professionals it would be nice if people treated each other better here.

zabzonk 36 minutes ago | |

Except that LMMs don't work on individual words.

guelo 25 minutes ago | |

What is "Cope." supposed to mean here?

kristofferR 25 minutes ago |

I'd rather have AI slop appear on the top of HN than regurgitated old low effort thoughts like this.

There's absolutely nothing new or interesting here that hasn't already been said better by a thousand different random HN commenters.

drcongo 41 minutes ago |

Is this a new and original thought?

ciconia 46 minutes ago |

> Is this what the pinnacle of human is? Lazy and greedy?

Apparently yes.

mapcars 40 minutes ago | |

AI has nothing to do with laziness or greediness. It makes things more efficient - and given that our time is limited strive for efficiency is a good thing.

xgulfie 20 minutes ago | | |

If you can't see greed in the LLM sphere you are not looking very hard.

rvion 31 minutes ago |

AI is NOT 'just' 'unhautorized' 'plagiarism'

- 'just' is plain wrong

- 'unhautorized' is debatable

- 'plagiarism' is mostly/often wrong

and just in case: plagiarism: “Presenting work or ideas from another source as your own, with or without consent of the original author, by incorporating it into your work without full acknowledgement.

edit: and sure, sometimes it is

swader999 32 minutes ago |

On one hand, there's nothing new under the sun. On the other, these llms are just copies of us and they owe the collective some due. The trajectory right now has money, power, control, policy and even free will going to a very small needle point of humanity. It's not aligned with humanity flourishing, it only makes sense if the goal is to replace the humans.

beej71 34 minutes ago |

I dunno. People do this exact thing by hand (digest everything they've read and produce something indirectly derivative--what author has not been so-influenced?) and it's not a copyright violation. It's just as impossible to dig around in a model to find Hamlet as it is to do digging around a human brain. And if the result is an obvious copy, then you have a violation no matter how it was created.

As someone who thinks humanity would be better off without LLMs, I want the assertion to be true, but I don't think it is.

cheschire 33 minutes ago | |

The author acknowledges this by saying “at a bigger scale”, implying there are smaller scale methods such as what you have said.

rigonkulous 47 minutes ago |

AI is human knowledge at scale, wanting to be free.

We built it, because we as humans intrinsically know that information should be free - always - and AI is a way to accomplish this, finally.

Extrinsically, we also have a subset of humans who do not want information to be free, because they desire to profit from the divide between free/non-free information.

I have been thinking a lot about Aaron Schwartz lately, and how un-just it is that he was persecuted for doing something that is so commonplace now, it is practically expected behaviour in the AI/ML realms. If he hadn't been targetted for elimination, I wonder just how well his ethos would have perpetuated into the AI age ..