Vibe-Coded Ext4 for OpenBSD(lwn.net) |
Vibe-Coded Ext4 for OpenBSD(lwn.net) |
This seems extremely confused. The copyright system does not have a way to grant these permissions because the material is not covered under copyright! You can distribute it at will, not due to any sort of legal grant but simply because you have the ability and the law says nothing to stop you.
That's not the case here. A re-implemented piece of software that does not contain meaningful verbatim excerpts from the original is not subject to the copyright of the original.
This is just lazy copyright whitewashing.
This opinion is simplistic. LLMs are trained with pre-existing content, and their output directly reflects their training corpus. This means LLMs can generate output that matches verbatim existing work. And that work can very well be subjected to copyright.
The US Copyright Office has published a piece that argues otherwise, but a) unless they pass regulation their opinion doesn't really matter, and b) there is way too much money resting on the assumption code can be copyrighted despite AI involvement.
Also, it is essentially an ext2 filesystem as it does not support journaling.
I know, the courts have ruled against this, but like, it's AI man!
Maybe for lawyers, AI is some kind of magical thing on its own. But having successfully created a working inference engine for Qwen3, and seeing how the core loop is just ~50 lines of very simple matrix multiplication code, I can't see LLMs as anything more than pretty simple interpreters that process "neural network bytecode," which can output code from pre-existing templates just like some compilers. And I'm not sure how this is different from transpilers or autogenerated code (like server generators based on an OpenAPI schema)
Sure, if an LLM was trained on GPL code, it's possible it may output GPL-licensed code verbatim, but that's a different matter from the question of whether AI-generated code is copyrightable in principle.
Interestingly, I found an opinion here [0] that binaries technically shouldn't be copyrightable, and currently they are because:
the copyright office listened to software publishers, and they wanted binaries protected by copyright so they could sell them that way
[0] https://freesoftwaremagazine.com/articles/what_if_copyright_...Thats awesome lmao
there are lots of portions of code today, prior to AI authorship, that are already not copyrightable due to the way they are produced. the existence of such code does not decimate the copyright of an overall collective work.
Can someone explain this to me? I was under the impression that if a work of authorship was not copyrightable because it was AI generated and not authored by a human, it was in the public domain and therefore you could do whatever you wanted with it. Normal copyright restrictions would not apply here.
I don't love this take. Specifically:
> it's clear the human offering the patch didn't do it
I find it hard to believe that there wasn't a good bit of "blood, sweat, and tears" invested by a human directing the LLM to make this happen. Yes, LLMs can spit out full projects in 1 prompt but that's not what happened here. From his blog the work on this spanned 5 months at least. And while he probably wasn't working on it exclusively during that time, I find it hard to believe it was him sending "continue" periodically to an LLM.
Anyone who has built something large or complicated with LLM assistance knows that it takes more than just asking the LLM to accomplish your end goal, saying "it's clear the human offering the patch didn't do it" is insulting.
I've done a number of things with the help of LLMs, in all but the most contrived of cases it required knowledge, input from me, and careful guidance to accomplish. Multiple plans, multiple rollbacks, the knowledge of when we needed to step back and when to push forward. The LLM didn't bring that to the table. It brought the ability to crank out code to test a theory, to implement a plan only after we had gone 10+ rounds, or to function as grep++ or google++.
LLMs are tools, they aren't a magic "Make me ext4 for OpenBSD"-button (or at least they sure as hell aren't that today, or 5 months ago when this was started).
Must be a bug in the linux kernel, let me git clone and build an out-of-tree module...
Discussion then https://news.ycombinator.com/item?id=11469535
Mirror of the slides https://events.static.linuxfound.org/sites/events/files/slid...
I didn't look closely at most of the code but one thing that caught my eye, pid is not safe for tempfile name generation, another user of the system can easily generate files that conflict with this. Functions like mktemp and mkstemp are there for a reason. Some of the other "safety" checks make no sense. If the LLM code generator is coming up with things which any competent unix sysadmin (let alone programmer) can tell are obviously wrong, it doesn't bode well for the rest.
https://marc.info/?l=openbsd-ports&m=177460682403496&w=2The next AI winter can't come soon enough…
Fast forward to 2026, Theo says no to vibe-coded slop, prove to me your magic oracle LLM didn't ingest gobs of GPL code before spitting out an answer.
People are big mad of course, but you want me to believe Theo is the bad guy here for playing it conservatively?
There's another issue surrounding developer skill atrophy or stunting that I find \ particularly concerning on an existential level.
If we allow people to use LLMs to write code for a given project/platform, experience \ in that platform will potentially atrophy or under develop as contributors \ increasingly rely on out sourcing their applicable skills and decisions to "AI".
Even if you believe out sourcing the minutia of coding is a net positive, the \ "enshitification" principal in general should give you pause; as soon as the net \ developer skill for a project has degraded to a point of reliance, even somewhat, I \ think we can be confident those AI tools will NOT get less expensive.
I'd rather be independently less productive, than dependent on some MegaCorp(TM)'s \ good will to rent us back access to our brains at a fair price.
- achaean
Copyright prevents copying. It doesn't prevent using knowledge.
Its 2026, just shut up and give us at least one modern filesystem already!
This comment on the article is spot on. I don't vibe code or care about AI really, but it's so exhausting to see people playing lawyer in threads about LLM-generated code. No one knows, a ton of people are using LLMs, the companies behind these models torrented content themselves, and why would you spend your time defending copyright / use it as a tool to spread FUD? Copyright is a made up concept that exists to kill competition and protect those who suck at executing on ideas.
Who wants to test it ? Preferably on real hardware. /s
IMO, your intuition regarding AI is right--it's not a magic copyright laundering machine, and AFAIU courts have very quickly agreed that infringement is occurring. But in copyright law establishing infringement (or the possibility of infringement) is the easy, straight-forward part. Copyright infringement liability is a much more complex question. Transformative uses in particular are a Fair Use, and Fair Use is technically treated as an affirmative defense to infringement.[1] If something is Fair Use, infringement is effectively presumed. But Fair Uses are typically very fact-intensive questions, and unlike the case with search engines I'm not sure we'll get to the point where there's a well-defined fence protecting "AI".
[1] There's a scholarly pedantic debate about whether Fair Use is properly a "defense", rather than "exception" to infringement, but it walks and talks like a defense in the sense that the defendant has the burden of proving Fair Use after the plaintiff has established infringement. There's a similarly pedantic (though slightly more substantive) debate in criminal law regarding affirmative defenses. But the very term "affirmative defense" was coined to recognize and avoid these pedantic debates.
For example, when an LLM does a vector search, there is a high probability of pirated content bleed-though and isomorphic plagiarism in the high dimensional vector space results. Thus, often when you coincidentally type in "name a cartoon mouse", there is a higher probability Disney "Micky Mouse" will pop out in the output rather than "Mighty Mouse". Note Trademarks never expire if the fees are paid, and Disney can still technically sue anyone that messes with their mouse.
Much like em dashes "--", telling the current set of models to stop using them inappropriately often fails. Also, activation capping is used to improve the models behavioral vector, and have nothing to do with the Anthropic CEO developing political ethics.
LLM are useful for context search, but can't function properly without constantly stealing from actual humans. Thus, will often violate copyright, trademark, and patents. In a commercial context it is legally irrelevant how the output has misappropriated IP, and one can bet your wallet the lawyers won't care either. No, IP is not public domain for a long time (17 to 78 years) regardless of peoples delusions, even if some kid in a place like India (no software patents) thinks it is..
This channel offers several simplified explanations of the work being done with models, and Anthropic posts detailed research papers on its website.
https://www.youtube.com/watch?v=YDdKiQNw80c
https://www.youtube.com/watch?v=Xx4Tpsk_fnM
https://www.youtube.com/watch?v=JAcwtV_bFp4
Many YC bots are poisoning discourse -- so this thread will likely get negative karma. Some LLM users seem to develop emotional or delusional relationships with the algorithms. The internet is already >52% generated nonsense and growing. =3
The quoted content said that "Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now." I was explicitly asking how this meshed with my understanding of copyright, at least in the United States, which requires that a work of authorship be authored by a human and not by a machine; where a work is not authored by a human, copyright protection does not subsist, and therefore the respective work is in the public domain. And I was further asking for an explanation as to how including a work that is AI-generated (aka in the public domain) made "... that aggregate body becoming less free". Unless my understanding of copyright law and court precedent is massively off the mark, I am confused as to how less freedom is aforded in this instance.
Thus, one should not contaminate GPL/LGPL licensed source code with such content. The reason it causes problems is the legal submarines may (or may not if they settled out of court with Disney) surface at a later date, as the lawsuits and DMCA strikes hit publishers.
It doesn't mean people won't test this US legal precedent, as most won't necessarily personally suffer if a foundation gets sued out of existence for their best intentions/slop-push. =3
..much like with human development.
People keep making trivial apps with open source examples thinking they found god. Another dismissive comment and I swear.
https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
A programmer writing code would be like the painter, and the programmer writing a prompt for Claude looks a lot like the photographer. The prompt is the creative work that makes it copyrightable, just like the artistic choices of the photographer make the photo copyrightable
You could argue that the prompt is more like a technical description than a creative work. But then the same should probably be true of the code itself, and consequently copyright should not apply to code at all
The copyright office's argument is that the AI is more like a freelancer than like a machine like a camera. Which you might equate to the monkey, who's also a bit freelancer like. But I have my doubts that holds up in court. Monkeys are a lot more sentient than AIs
There is case law surrounding the fact that just because you commission a work to another entity doesn't give you co-authorship, the entity doing the work and making creative decisions is the entity that gets copyright.
In order for you to have co-authorship of the commissioned work you have to be involved and pretty much giving instruction level detail to the real author. The opinion shows many cases that its not the case with how LLM prompts work.
The monkey selfie case is relevant also because since it also solidifies that non-persons cannot claim copyright, that means the LLM cannot claim copyright, and therefore it does not have copyright that can be passed onto the LLM operator.
Overwhelmingly this is in favor of treating ai as a tool like Photoshop.
Even those against AI disagree on different matters and will overwhelmingly want a cut not a different interpretation.
Derivatives are not subject to copyright, unless they are close to, and contain substantial verbatim copies from, the original. It's a virtual certainty that a vibe-coded Ext4 FS is none of the above.
Redefining copyright as some weird patenting of similar ideas is absurd.
this is similar to creating an extension to some program, because the extension could not be written without the original even if the interface the extension is using is a public API. the claim has been made that the copyright of the original program applies. i think the linux kernel is an example here.
see also these questions on stackexchange:
https://softwareengineering.stackexchange.com/questions/2087...
https://softwareengineering.stackexchange.com/questions/8675...
There's no such thing as "an extension to some program". A derivative work is a work that contains the original. Using the privileges provided by copyright law, the creator may impose licensing restrictions on how the original work is used - but that's contract law, not copyright.
For example the GPL and the AGPL define different sets of use restrictions, none of that matters in this case because the original work is not being reproduced or used per se.
As I already said in my other, down-voted comment - copyright is only about verbatim, or near verbatim copies, in whole or in part - it's the spirit that both judgment and the letter of the law are supposed to follow. Copying of functionality is not subject to copyright.
For example, one can use the same topic for a work of poetry for a similar aesthetic effect and that doesn't infringe other poems.
The GPL used a hack to stretch copyright law into a near opposite but stretching it further goes into absurd territory, achieving the opposite of what the GPL claims to protect.
one can use the same topic for a work of poetry for a similar aesthetic effect and that doesn't infringe other poems
because the new poem does not depend on the original.
the kernel driver is useless without the kernel
Maybe, in some alternative universe, that could be correct but it isn't anywhere on Earth.
You can write a BSD-licensed driver as a Linux module and distribute it separately all you want - copyright law is OK with that.
The moment you insert the module into the kernel the whole thing, kernel + driver becomes a derivative work and you're forbidden from using it by the GPL - the license, not copyright... Copyright only gives the creators of the kernel the privileged power to impose that contractual restriction.
Long time ago, some BSD guys were trying to convince me that the GPL was primarily a weapon against BSD and other less restrictive licenses but I didn't believe it back then... boy, was I wrong.
You showed me how the GPL can be used for threats against the free modification of software by arguing for the addition of new, absurd powers to copyright - the opposite of what the GPL proponents are promoting it for. It's indeed a license that must be avoided at all cost.
yes, it is disputed, and the claim has not been tested in court. but it is an argument being made.
the GPL was primarily a weapon against BSD.
It's indeed a license that must be avoided at all cost.
well, it depend on whose side you support. i am on the side of protecting the rights of the user to modify their software. BSD licenses don't do that. they give me the right, but they don't protect it.
more importantly, i am also on the side of the developer to protect their ability to make a living. for that the BSD license is completely useless. GPL is better, AGPL even more, but even those are not restrictive enough to prevent unfair competition by large corporations.
i am not interested in allowing those companies to benefit from my work if they are not required to pass that forward.
In other words, you don't know what you're talking about... Everything I write is verifiable, have you heard of AI chat bots? Why are you going around asking old ladies for the latest gossip?
> yes, it is disputed, and the claim has not been tested in court.
Why don't you test in court? Do it, let's see what happens. Why did Linus wave middle fingers like a confused clown when Nvidia's lawyers stuffed the GPL2 with their driver? There was no lawsuit, only buffoonery in place of the promised "protection".
> but it is an argument being made.
There are millions of "arguments being made", 99.9% of them are BS, if you can't defend your arguments with facts, logic and court decisions don't waste e-space by regurgitating useless gossip, especially on HN.
> BSD licenses don't do that. they give me the right, but they don't protect it.
So, that's your reason to go on a crusade against the rights provided by BSD licenses.
Oh, that's sneaky - "Let's protect people from a license that gives them more rights than ours"
Your "protection" amounts to shilling for an absurdly extended interpretation of copyright powers while it's being sold as a defense against these very powers - this kind of diabolical nonsense is the opposite of protection.
to quote one commenter there:
there's something socially wrong with taking someone's gift and ignoring the terms under which it was given. If you want a system where you can load any modules, use a BSD kernel. [...] If the creators of a GPL kernel label some items as an external API for anyone's use, and other items as GPL hooks for functionally internal code loaded externally, respect that.
me talking about "protection" is a call for solutions. if the interpretation of the GPL here is absurd, then the problem is not that it is wrong, but that the GPL does not provide enough protection. if you don' want that protection, fine, that's your choice. i do want that protection, and i am looking for solutions. if you are not interested in solving that problem then we don't need to continue this discussion.
> you may want to read the discussion "Is the GPL actually viral across dynamic linking?"
What does that have to do with the price of tea in China? We are talking about an independent implementation of a BSD driver for Ext4-strucutred storage but you keep bringing up unrelated random pieces of chatter from around the web.
> but that the GPL does not provide enough protection.
But you don't understand the difference between copyright and contract. The GPL, or any other license based on copyright, cannot prevent the creation of the driver in question because it doesn't involve any copying of the kind protected by copyright law.
> if you are not interested in solving that problem then we don't need to continue this discussion.
Except, that's not the problem we are discussing.
Indeed, there's no point in continuing this discussion, you don't understand the basics, cannot follow the line of reasoning and keep getting lost in hallucinations.
It is not. That is the point here.
It's not independent.
If an LLM were able to generate code that can access ext4, then that LLM has to have ingested the original ext4 source code as part of its corpus.
LLMs cannot count. They cannot add 2 + 2. If it can emit code to do something then it contains in its model code to do that thing.
Therefore it is not an independent implementation.
It is independent from the point of view of copyright law !!!
What GPL zealots wish to define as "independent" doesn't matter, blinded by their own zealotry, they end up arguing for extending the notion of copyright into pure absurdity in a grotesque contradiction to their own, rather feeble "principles".
> Therefore it is not an independent implementation.
You're ignorant about copyright and law in general. Copyright grants certain limited privileges for near verbatim copying only - it covers PARTICULAR EXPRESSIONS of ideas, NOT the ideas themselves - these are basics you know nothing about but you keep insisting to replace them with your hallucinations. I don't think you're human, no human is so deprived of comprehension.
In this case, an independent expression means one that doesn't have substantial verbatim or juxtaposed parts of the original. I can read as much poetry of some poet as I wish, and ape his style and topics as much as I wish but as long as there isn't any near-verbatim copying, my poetry will be independent for copyright purposes.
> Therefore it is not an independent implementation.
I told you already - go sue! You'll be told the same as I did here, as Oracle found out when they sued Google.
Don't waste you breath/tokens - sue - it's the only real argument.
If you don't sue, you prove to everyone that you know you're wrong but you're knowingly trash-talking in order to create uncertainty and confusion.