Vibe-Coded Ext4 for OpenBSD

77 points by corbet 97 days ago | 81 comments

> So as of today, the Copyright system does not have a way for the output of a non-human produced set of files to contain the grant of permissions which the OpenBSD project needs to perform combination and redistribution.

This seems extremely confused. The copyright system does not have a way to grant these permissions because the material is not covered under copyright! You can distribute it at will, not due to any sort of legal grant but simply because you have the ability and the law says nothing to stop you.

plorg 97 days ago | |

This all relies, as the article points out, on everyone looking directly at code that both looks like and works like the only extant codebase for EXT4 and nonetheless concluding that in fact the computer conjured it from the aether. If I wrote a program that zipped up the Linux kernel source, unzipped it, and grepped -v for comments it would not then be magically transformed into unattributable public domain software.

FeepingCreature 97 days ago | | |

Under the premise advanced in the quote, copyright is not being violated because there is none. Thus, the quote makes no sense as stated. It may be that, additionally, copyright is in fact being violated (I don't believe it myself), but if so that's a separate argument.

bigbadfeline 96 days ago | | |

> If I wrote a program that zipped up the Linux kernel source, unzipped it, and grepped -v for comments it would not then be magically transformed into unattributable public domain software.

That's not the case here. A re-implemented piece of software that does not contain meaningful verbatim excerpts from the original is not subject to the copyright of the original.

themafia 97 days ago | |

Just because you can distribute something doesn't mean you aren't violating someone else's copyright. You cannot assume that just because a language model popped out some code for you that it is clear of any other claims.

This is just lazy copyright whitewashing.

locknitpicker 96 days ago | |

> This seems extremely confused. The copyright system does not have a way to grant these permissions because the material is not covered under copyright!

This opinion is simplistic. LLMs are trained with pre-existing content, and their output directly reflects their training corpus. This means LLMs can generate output that matches verbatim existing work. And that work can very well be subjected to copyright.

rho_soul_kg_m3 96 days ago | | |

Language models are good at translation and retrieval. This also extends to computer languages. LLMs translate from GPL to other licenses the same way Google translate turns French to English, except that the source material is implicitly stored in the LLM.

jagged-chisel 97 days ago | |

Eh … the argument will likely be things created by Thing at the behest of Author is owned by the Author. It’ll take a few cases going through the courts, or an Act of Congress to solidify this stuff.

wongarsu 97 days ago | | |

Just like we settled on photographers havin copyright on the works created by their camera. The same arguments seem to apply

The US Copyright Office has published a piece that argues otherwise, but a) unless they pass regulation their opinion doesn't really matter, and b) there is way too much money resting on the assumption code can be copyrighted despite AI involvement.

HappySweeney 97 days ago | | |

Haven't there already been a few cases, each of which found that mechanically-produced works are not copywritable?

LeFantome 97 days ago |

The article is largely about the copyright concerns of LLM generated code that was almost certainly trained on the GPL original.

Also, it is essentially an ext2 filesystem as it does not support journaling.

phendrenad2 96 days ago | |

By that logic, everything is a GPL violation, because someone has written a GPL version of everything you could conceivably think of so anything you try to use AI to write, oops, tainted. Also should apply to people's brains, too. If you looked at GPL code in your life, you're tainted.

I know, the courts have ruled against this, but like, it's AI man!

kgeist 97 days ago |

Binaries are copyrightable in both the US and the EU, and they are not technically produced by a human either, they're produced by a computer program. I honestly don't understand why this isn't extended to AI-generated code. Isn't it the same thing? One could argue that compilers merely transform source code into binaries "as is," while AI models have some "knowledge" baked in that they extract and paste as code. But there are compilers that also generate binaries by selecting ready-to-use binary patches authored by compiler developers and combining them into a program. One could also argue that, in the case of compilers, at least the input source code is authored by a human. But why can't we treat prompts as "source code in natural language" too? Where is the line between authorship and non-authorship, and how is the line defined? "Your prompt was too basic to constitute authorship" doesn't sound like an objectibe criterion.

Maybe for lawyers, AI is some kind of magical thing on its own. But having successfully created a working inference engine for Qwen3, and seeing how the core loop is just ~50 lines of very simple matrix multiplication code, I can't see LLMs as anything more than pretty simple interpreters that process "neural network bytecode," which can output code from pre-existing templates just like some compilers. And I'm not sure how this is different from transpilers or autogenerated code (like server generators based on an OpenAPI schema)

Sure, if an LLM was trained on GPL code, it's possible it may output GPL-licensed code verbatim, but that's a different matter from the question of whether AI-generated code is copyrightable in principle.

Interestingly, I found an opinion here [0] that binaries technically shouldn't be copyrightable, and currently they are because:

  the copyright office listened to software publishers, and they wanted binaries protected by copyright so they could sell them that way

[0] https://freesoftwaremagazine.com/articles/what_if_copyright_...

g0xA52A2A 97 days ago |

Wow that thread just kept going. Whilst the LWN article covered most of the "highlights" I think this reply from Theo is pretty suscient on the topic at large [1].

[1] https://marc.info/?l=openbsd-tech&m=177425035627562&w=2

bt1a 97 days ago | |

> Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now.

Thats awesome lmao

raggi 97 days ago | | |

that's not a statement from a lawyer, and it's confused. there is one true thing in there which is that at least under US considerations the LLM output may not be copyrightable due to insufficient human involvement, but the rest of the implications are poorly extrapolated.

there are lots of portions of code today, prior to AI authorship, that are already not copyrightable due to the way they are produced. the existence of such code does not decimate the copyright of an overall collective work.

ethin 97 days ago |

Can someone explain this to me? I was under the impression that if a work of authorship was not copyrightable because it was AI generated and not authored by a human, it was in the public domain and therefore you could do whatever you wanted with it. Normal copyright restrictions would not apply here.

joshstrange 97 days ago |

> Who is the copyright holder in this case? It clearly draws heavily from an existing work, and it's clear the human offering the patch didn't do it. It's not the AI, because only persons can own copyright. Is it the set of people whose work was represented in the training corpus? Was the it the set of people who wrote ext4 and whose work was in the training corpus? The company who own the AI who wrote the code? Someone else?

I don't love this take. Specifically:

> it's clear the human offering the patch didn't do it

I find it hard to believe that there wasn't a good bit of "blood, sweat, and tears" invested by a human directing the LLM to make this happen. Yes, LLMs can spit out full projects in 1 prompt but that's not what happened here. From his blog the work on this spanned 5 months at least. And while he probably wasn't working on it exclusively during that time, I find it hard to believe it was him sending "continue" periodically to an LLM.

Anyone who has built something large or complicated with LLM assistance knows that it takes more than just asking the LLM to accomplish your end goal, saying "it's clear the human offering the patch didn't do it" is insulting.

I've done a number of things with the help of LLMs, in all but the most contrived of cases it required knowledge, input from me, and careful guidance to accomplish. Multiple plans, multiple rollbacks, the knowledge of when we needed to step back and when to push forward. The LLM didn't bring that to the table. It brought the ability to crank out code to test a theory, to implement a plan only after we had gone 10+ rounds, or to function as grep++ or google++.

LLMs are tools, they aren't a magic "Make me ext4 for OpenBSD"-button (or at least they sure as hell aren't that today, or 5 months ago when this was started).

LeFantome 97 days ago |

Vibe coding and OpenBSD. The perfect combination.

croes 97 days ago | |

Vibe coding and file systems are even better

himata4113 97 days ago | | |

trying to load with linux ext4 hmm doesn't load, but it works with my version!

Must be a bug in the linux kernel, let me git clone and build an out-of-tree module...

LeFantome 97 days ago | | |

Kent Overstreet has already blazed that trail.

api 97 days ago | | |

It's clearly an experiment.

whalesalad 97 days ago | |

I vibe-configured an Edgerouter 4 as a hot-drop box that would establish a secure tunnel and create a fake WAN for some servers that had to be temporarily pulled from service but remain operational in someones home garage. I overnight shipped it to them with two of the ports labeled, they plugged in home internet on one port, the rack on the other port, and it secure tunneled to a Linode VPS to get a public IP, circumventing all the Verizon home internet crap. I used OpenBSD. Claude did most of the work.

sombragris 96 days ago |

Regardless of license status, I'd be very hesitant to trust a vibe-coded filesystem implementation with my data.

throwatdem12311 97 days ago |

Can someone just copyright wash Windows already.

wongarsu 97 days ago | |

The Windows 2000 and Windows XP sources are readily available and must have made it into the training data. But most software has dropped XP support. You really need at least some of the Win 8 and Win 10 APIs to claim compatibility with modern software, and I doubt claude has seen those from the inside

greyface- 97 days ago | |

ReactOS did this without any need for an LLM.

ziml77 97 days ago | | |

No they didn't. It would be copyright washing if someone contributed to ReactOS who remembered large portions of the Windows code and wrote the ReactOS implementations based on that.

cachius 97 days ago |

I'd like to see it AFL fuzzed and compared to the original. Took 2 hours to first bug ten years ago in 2016.

Discussion then https://news.ycombinator.com/item?id=11469535

Mirror of the slides https://events.static.linuxfound.org/sites/events/files/slid...

nurettin 97 days ago |

It is amusing to see that the only concern seems to be about a confusion around licensing, not the validity or maintainability of the code itself.

tolciho 97 days ago | |

Eh, well, if your guns are trained on the "copyright" portion of the ship and you can sink it from there, no need to waste ammo or time trying to figure out if code bits are as explosive as the copyright bits are. Probably the code is just as sinkable, e.g. here's a recent response to some other AI slop:

  I didn't look closely at most of the code but one thing that caught my eye, pid is not safe for tempfile name generation, another user of the system can easily generate files that conflict with this. Functions like mktemp and mkstemp are there for a reason. Some of the other "safety" checks make no sense. If the LLM code generator is coming up with things which any competent unix sysadmin (let alone programmer) can tell are obviously wrong, it doesn't bode well for the rest.

https://marc.info/?l=openbsd-ports&m=177460682403496&w=2

The next AI winter can't come soon enough…

kvuj 97 days ago | |

How is that different than a human writing the code? Whether an AI or a human wrote it, I would expect the same bar of validity/maintainability.

nurettin 97 days ago | | |

To me, SOTA is just bad at DRY, KISS, succint, well architected, top down, easy to test code and has to be constantly steered to come close. Even the article suggests that. YMMV.

scuff3d 97 days ago | | |

Because humans make design decisions, AI just bangs it's head against the problem until it gets something that "works".

g0xA52A2A 97 days ago | |

Is it worth the effort to review until such implications are understood?

nurettin 97 days ago | | |

No of course not, bike shedding licenses is where it is at.

longislandguido 97 days ago |

~20 years ago, the Linux camp accused OpenBSD of importing GPL'd code (a wireless driver IIRC) and cried foul. The code was removed.

Fast forward to 2026, Theo says no to vibe-coded slop, prove to me your magic oracle LLM didn't ingest gobs of GPL code before spitting out an answer.

People are big mad of course, but you want me to believe Theo is the bad guy here for playing it conservatively?

ksherlock 97 days ago | |

The history is a bit backwards but the point is good. OpenBSD atheros wireless code was imported into linux, the BSD attributions were removed, and it was re-declared as GPL. That was later changed back.

longislandguido 97 days ago | | |

https://marc.info/?l=linux-wireless&m=117579116031296&w=2

ptidhomme 97 days ago |

I liked this reply in the thread :

There's another issue surrounding developer skill atrophy or stunting that I find \ particularly concerning on an existential level.

If we allow people to use LLMs to write code for a given project/platform, experience \ in that platform will potentially atrophy or under develop as contributors \ increasingly rely on out sourcing their applicable skills and decisions to "AI".

Even if you believe out sourcing the minutia of coding is a net positive, the \ "enshitification" principal in general should give you pause; as soon as the net \ developer skill for a project has degraded to a point of reliance, even somewhat, I \ think we can be confident those AI tools will NOT get less expensive.

I'd rather be independently less productive, than dependent on some MegaCorp(TM)'s \ good will to rent us back access to our brains at a fair price.

- achaean

https://marc.info/?l=openbsd-tech&m=177430829313972&w=2

majorchord 96 days ago |

Why did they even mention it was vibe-coded? Would it not be a lot harder for someone to prove that fact if you just didn't tell them?

charcircuit 97 days ago |

>incorporate knowledge carrying an illiberal license.

bigfishrunning 97 days ago | |

Good luck proving an LLM has "Knowledge", and isn't just a statistical model that tries to form outputs as a copy of it's training data...

throwaway270925 96 days ago |

As someone handling dozens of OpenBSD servers and VMs at work, I dont care about copyright and licenses anymore.

Its 2026, just shut up and give us at least one modern filesystem already!

hypeatei 97 days ago |

> This obsession with copyrights between different free software ecosystems - who put the lawyers in charge?

This comment on the article is spot on. I don't vibe code or care about AI really, but it's so exhausting to see people playing lawyer in threads about LLM-generated code. No one knows, a ton of people are using LLMs, the companies behind these models torrented content themselves, and why would you spend your time defending copyright / use it as a tool to spread FUD? Copyright is a made up concept that exists to kill competition and protect those who suck at executing on ideas.

hulitu 97 days ago |

> Vibe-Coded Ext4 for OpenBSD

Who wants to test it ? Preferably on real hardware. /s

bitwizeshift 97 days ago |

Paywalled article on something vibe-coded? That seems like a bold strategy.

dana321 97 days ago | |

click to continue

CodeWriter23 97 days ago |

Well this is ironic, GPL advocate(s) declaring a clean implementation based on specifications infringing due to someone/something reading specs provided under license. Didn't Oracle lose that argument in court as pertains to Android implementation of Java libraries?

corbet 97 days ago | |

I'm not sure what you're reading; there is a distinct lack of GPL advocates in that conversation.