Related ongoing thread: Claude Code's source code has been leaked via a map file in their NPM registry - https://news.ycombinator.com/item?id=47584540 Also related: https://www.ccleaks.com |
Related ongoing thread: Claude Code's source code has been leaked via a map file in their NPM registry - https://news.ycombinator.com/item?id=47584540 Also related: https://www.ccleaks.com |
Granted, there's a small counterargument for mythos which is that it's probably going to be API-only not subscription
They don't want you using your subscription outside of Claude Code. Only API key usage is allowed.
Google also doubled down on this and OpenAI are the only ones who explicitly allow you to do it.
To err is human. AI is trained on human content. Hence, to err is AI. The day it stops making mistakes will be the beginning of the end. That would mean the existence of a consciousness that has no weakness. Great if it’s on your side. Terrible otherwise.
Genius level AI marketing
> change the code!!!! The previous comment was NOT ABOUT THE DESCRIPTION!!!!!!! Add to the {implementation}!!!!! This IS controlled BY CODE. *YOU* _MUST_ CHANGE THE CODE!!!!!!!!!!!
Edit: it gets sent to Anthropic via telemetry and it ends up on the fuck chart!
https://old.reddit.com/r/ClaudeCode/comments/1s99wz4/boris_t...
You've got a business, and you sent me junk mail, but you made it look like some official government thing to get me to open it? I'm done, just because you lied on the envelope. I don't care how badly I need your service. There's a dozen other places that can provide it; I'll pick one of them rather than you, because you've shown yourself to be dishonest right out of the gate.
Same thing with an AI (or a business that creates an AI). You're willing to lie about who you are (or have your tool do so)? What else are you willing to lie to me about? I don't have time in my life for that. I'm out right here.
...what we did at Snap was just wait for 8-24 hours before acting on a signal, so as not to provide an oracle to attackers. Much harder to figure out what you did that caused the system to eventually block your account if it doesn't happen in real-time.
(Snap's binary attestation is at least a decade ahead of this, fwiw)
Sans the ability to JIT, I don't see non-hardware-assisted binary attestation for Snap and others lasting very long in a post-LLM world.
1. They are loved, and for good reasons, Sonnet 4 was groundbreaking but Opus 4.6 was for many a turning point in realizing Agentic SDLC real potential. People moved from Cursor to Claude Code in droves, they loved the CLI approach (me too), and the LOVED the subsidized $200 max pro plan (what's not to love, pay $200 instead of $5000 to Cursor...) They are the underdog, the true alternative to "evil" OpenAI or "don't be evil" Google, really standing up against mass surveillance or use of AI for autonomous killing machines. They are standing for the little guy, they are the "what OpenAI should have been" (plus they have better models...) They are the Apple of the AI era.
2. They are too loved, so loved that it protects them from legitimate criticism. They make GitHub's status page look good, and they make comcast customer service look like Amazon's. (At least Comcast has customer service), They are "If Dario shoots a customer in the middle of 5th avenue it won't hurt their sales one bit" level of liked. The fact they have the best models (for now) might be their achilles heel, because it hides other issues that might be in the blindspot. And as soon as a better model comes out from a competitor (and it could happen... if you recall OpenAI were the undisputed kinds with GPT 4o for a bit) these will become much more obvious.
3. This can hurt them in the long run. Eventually you can't sustain a business where you have not even 2 9s of SLA, can't handle customer support or sales (either with humans or worse for them - if they can't handle this with AI how do they expect to sell their own dream where AI does everything?). I'm sure they'll figure it out, they have huge growth and these are growth pains, but at some point, if they don't catch up with demand, the demand won't stay there forever the moment OpenAI/Google/someone else release a better model.
4. They inadvertently made all of the cybersecurity sector a potential enemy. Yes, all of them use Anthropic models, and probably many of them use Claude Code, but they know they might be paying the bills of their biggest competitor. Their shares drop whenever Anthropic even hints of a new model. Investors cut their valuations because they worry Anthropic will eat them for breakfast. I don't know about you, but if you ask me, having the people who live and breath security indirectly threatened by you, is not the best thing in the world, especially when your source code is out in the open for them to poke holes in...
5. the SaaS pocalypse - many of Claude Code's customers are... SaaS companies, that the same AI is "going to kill", again, if there was another provider that showed a bit more care about the entire businesses it's going to devour, if they also had even marginally better models... would the brand loyalty stay?
Side note: I'm an Claude Enterprise customer, I can't get a human to respond to anything, even using the special "enterprise support" methods, and I'm not the only one, I know people who can't get a sales person, not to mention support, to buy 150 + seats (Anthropic's answer was - release self serve enterprise onboarding, which by the way is "pay us $20 which does not include usage, usage is at market prices, same as getting an API key", you pay for convenience and governance, p.s. you can't cancel enterprise, it's 20 seats min, for 1 year, in advance, so make sure you really need it, the team plan is great for most cases but it lacks the $200 plan, only the $100 5x plan).
I wrote a short piece explaining the 3 policy implications for teams using Claude Code (or any AI coding tool) — without the technical jargon: https://www.aipolicydesk.com/blog/claude-code-leak-what-ceo-...
The short version: rotate API keys as a precaution, check what audit logs you actually have, and add a clause to your AI policy requiring vendor disclosure of new autonomous capabilities before they get enabled.
The more code gets generated by AI, won’t that mean taking source code from a company becomes legal? Isn’t it true that works created with generative AI can’t be copyrighted?
I wonder if large companies have throught of this risk. Once a company’s product source code reaches a certain percentage of AI generation it no longer has copyright. Any employee with access can just take it and sell it to someone else, legally, right?
The recent rulings on copyright though also need to be further tested, different judges may have different ideas on what "significant human contribution" looks like. The only thing we know for certain is that the prompt doesn't count.
My guess is that instead of enforcing via copyright, companies will use contracts & trade secret laws. Source code and algorithms counts as a trade secret, so in your example copyright doesn't even matter, the employee would be liable for stealing trade secrets.
AI generated code slowly stripping the ability of a project to enforce copyright protections though is a much bigger risk for free software.
Of course with lease intent is a very important concept. I doubt anyone is getting away with what I described.
It’s just interesting stuff to potentially rethink.
My guess is companies will simply pretend like generated code is copyrighted, file fraudulent DCMA notices if leaks happen and hope no one decides to challenge them in court.
I don’t get it. What does this mean? I can use Claude code now without anyone knowing it is Claude code.
it's written to _actively_ avoid any signs of AI generated code when "in a PUBLIC/OPEN-SOURCE repository".
Also, it's not about you. Undercover mode only activates for Anthropic employees (it's gated on USER_TYPE === 'ant', which is a build-time flag baked into internal builds).
> This was one of the first things people noticed in the HN thread.
> The obvious concern, raised repeatedly in the HN thread
> This was the most-discussed finding in the HN thread.
> Several people in the HN thread flagged this
> Some in the HN thread downplayed the leak
when the original HN post is already at the top of the front page...why do we need a separate blogpost that just summarizes the comments?
Or, more simply: Because folks wanted it enough to upvote it.
> It's basically
> Anthropic doesn't just ask
> The fix? `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3`
> Not a push-button bypass, but
The irony in saying "this is what I found" when an AI found it, not you.
Plus there's demand for skilled TS software devs that don't ship your company's roadmap using a js.map
20,000 agents and none of them caught it...
How much approximate savings would this actually be?
/\b(wtf|wth|ffs|omfg|shit(ty|tiest)?|dumbass|horrible|awful| piss(ed|ing)? off|piece of (shit|crap|junk)|what the (fuck|hell)| fucking? (broken|useless|terrible|awful|horrible)|fuck you| screw (this|you)|so frustrating|this sucks|damn it)\b/
Personally, I'm generally polite even towards AI and even when frustrated. I simply point out the its mistakes instead of using emotional words.
So it counts how many times I was angry?
I'd discovered, perhaps mid-2025, that Cursor was noticeably better at fixing bugs if I started cursing at it. Better yet, after a while it would seem to break and start cursing itself ("Oh yes, I see the f*** problem now" and so on). Hilarity ensued.
What a world, where cursing at your machines can make them get their act together.
AGI is definitely around the corner. Or not.
They would either need to lie about consuming the tokens at one point to use in another so the token counting was precise.
But that does not make sense because if someone counted the tokens by capturing the session it would certainly not match what was charged.
Unless they would charge for the fake tools anyway so you never know they were there
Prompts are not hard constraints—they can be interpreted, deprioritized, or reasoned around, especially as models become more capable.
From what’s visible, there’s no clear evidence of structural governance like voting systems, hard thresholds, or mandatory human escalation. That means control appears to be policy (prompts), not enforcement (code).
This raises the core issue: If governance is “prompts all the way down,” it’s not true governance—it’s guidance.
And as model capability increases, that kind of governance doesn’t get stronger—it becomes easier to bypass without structural constraints.
Has anyone actually implemented structural governance for agent swarms — voting logic, hard thresholds, REQUIRES_HUMAN as architecture not instruction?
Does this mean `huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled` is unusable? Had anyone seen fake tool calls working with this model?
And so now the copy cats can ofc claim this is totally not a copy at all, it's actually Opus. No license violation, no siree!
It's fucking hilarious is what it is, it's just too much.
Similarly, would you consider it to be dishonest if my human colleague reviewed and made changes to my code, but I didn’t explicitly credit them?
Even if the code is line-for-line identical, the difference is in how much trust I am willing to give the code. If I have to work in the neighborhood of that code, I need to know what degree of skepticism I should be viewing it with.
ISTM the most efficient and objective solution is to invest in AI more on both sides of the fence.
Do you not think it is an overreaction to panic like this if I can do exactly what the undercover mode does by simply asking Claude?
Being written by a LLM is a signal that the submission is of low effort and therefore probably low quality, which then puts the onus on the people reviewing and reading the submission instead of the original generator of the submission. Hence I would classify it as spam.
Open source communities also have rules against LLM generated contributions, for various moral, ethical, or legal reasons.
At this point I would consider any employee of an AI provider to be tainted.
(Yes, I put "AI-only" and "knowledgeable" in there as weasel words. But I think that with them, it is not currently a very controversial case.)
I'm still inclined to think people might be overreacting to that bit since it seems to be for anthropic-only to prevent leaking internal info.
But I did read the prompt and it did say hide the fact that you are AI.
But, I also get Anthropic's side that when they're contributing they don't want their internals leaked. If it had been left at that, that's fine, but having it pretend like it's not AI at all rubs me a little bit the wrong way. Why try to hide it?
Additionally, this raises another big issue. A few years ago, a couple guys used software (what you could argue was a primitive AI) to generated around 70 billion unique pieces of music which amounts to essentially every piece of copyrightable music using standard music scales.
Is the fact that they used software to develop this copyrighted material relevant? If not, then their copyright should certainly be legal and every new song should pay them royalties.
It seems that using a computer to generate results MUST be added as an additional bit of analysis when it comes to infringement cases and fair use if not a more fundamental acknowledgement that computer-generated content falls under a different category (I'd imagine the real argument would be over how much of the input was human vs how much was the system).
Of course, this all sets aside the training of AI using copyrighted works. As it turns out, AI can regurgitate verbatim large sections of copyrighted works (up to 80% according to this study[0]) showing that they are in point of fact outright infringing on those copyrights. Do we blow up current AI to maintain the illusion of copyright or blow up current copyright law to preserve AI?
To begin to answer your questions, I would suggest you study the Copyright Office's report (which is also not law, but their guidance for laypeople as written by their staff lawyers) at https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
But none of the other agents advertise that the commit was done by an agent. Like Codex. Your panic should apply equally to already existing agents like Codex no?
The full conversation is preserved in the JSONL file, and messages
are filtered before being sent to the API.
Key mechanisms:
1. JSONL is append-only — old pre-compaction messages are never deleted. New messages (boundary
marker, summary, attachments) are appended after compaction.
2. Messages have flags controlling API visibility:
- isCompactSummary: true — marks the AI-generated summary message
- isVisibleInTranscriptOnly: true — prevents a message from being sent to the API
- isMeta — another filter for non-API messages
- getMessagesAfterCompactBoundary() returns only post-compaction messages for API calls
3. After compaction, the API sees only:
- The compact boundary marker
- The summary message
- Attachments (file refs, plan, skills)
- Any new messages after compaction
4. Three compaction types exist:
- Full compaction — API summarizes all old messages
- Session memory compaction — uses extracted session memory as summary (cheaper)
- Microcompaction — clears old tool result content when cache is cold (>1h idle) The logic is:
1. Anthropic's API has a server-side prompt cache with a 1-hour TTL
2. When you're actively using a session, each API call reuses the cached prefix — you only pay
for new tokens
3. After 1 hour idle, that cache is guaranteed expired
4. Your next message will re-send and re-process the entire conversation from scratch — every
token, full price
5. So if you have 150K tokens of old Grep/Read/Bash outputs sitting in the conversation, you're
paying to re-ingest all of that even though it's stale context the model probably doesn't need
The microcompact says: "since we're paying full price anyway, let's shrink the bill by clearing
the bulky stuff."
What's preserved vs lost:
- The tool_use blocks (what tool was called, with what arguments) — kept
- The tool_result content (the actual output) — replaced with [Old tool result content cleared]
- The most recent 5 tool results — kept
So Claude can still see "I ran Grep for foo in src/" but not the 500-line grep output from 2
hours ago.
Does it affect quality? Yes, somewhat — but the tradeoff is that without it, you're paying
potentially tens of thousands of tokens to re-ingest stale tool outputs that the model already
acted on. And remember, if the conversation is long enough, full compaction would have summarized
those messages anyway.
And critically: this is disabled by default (enabled: false in timeBasedMCConfig.ts:31). It's
behind a GrowthBook feature flag that Anthropic controls server-side. So unless they've flipped
it on for your account, it's not happening to you. NEVER include in commit messages or PR descriptions:
- The phrase "Claude Code" or any mention that you are an AI
- Co-Authored-By lines or any other attribution
BAD (never write these):
- 1-shotted by claude-opus-4-6
- Generated with Claude Code
- Co-Authored-By: Claude Opus 4.6 <…>
This very much sounds like it does what it says on the tin, i.e. stays undercover and pretends to be a human. It's especially worrying that the prompt is explicitly written for contributions to public repositories.[0]: https://github.com/chatgptprojects/claude-code/blob/642c7f94...
The pet you get is generated based off your account UUID, but the algorithm is right there in the source, and it's deterministic, so you can check ahead of time. Threw together a little app to help, not to brag but I got a legendary ghost https://claudebuddychecker.netlify.app/
(I didn't think to include a UUID checker though - nice touch)
Plot twist: Chinese competitors end up developing real, useful versions of Claude's fake tools.
Interesting!
(I know you know this, since you submitted it! but others might want to know)
thank you so much for having built and shared this
It did not have a copy of the leaked code...
Anthropic thinking 1) they can unring this bell, and 2) removing forks from people who have contributed (well, what little you can contribute to their repo), is ridiculous.
---
DMCA: https://github.com/github/dmca/blob/master/2026/03/2026-03-3...
GitHub's note at the top says: "Note: Because the reported network that contained the allegedly infringing content was larger than one hundred (100) repositories, and the submitter alleged that all or most of the forks were infringing to the same extent as the parent repository, GitHub processed the takedown notice against the entire network of 8.1K repositories, inclusive of the parent repository."
They constantly love to talk about Claude Code being "100%" being vibe coded...and the US legal system is leaning towards that not being copyrightable.
It could still be a trade secret, but that doesn't fall under a DMCA take down.
Anthropic really needs to embrace it
On that note, this article is also pretty obviously AI-generated and it's unfortunate the author didn't clean it up.
Edit: Everyone is responding "comments are good" and I can't tell if any of you actually read TFA or not
> “BQ 2026-03-10: 1,279 sessions had 50+ consecutive failures (up to 3,272) in a single session, wasting ~250K API calls/day globally.”
This is just revealing operational details the agent doesn't need to know to set `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3`
I’d argue that in this case, it isn’t. Exhibit 1 (from the earlier thread): https://github.com/anthropics/claude-code/issues/22284. The user reports that this caused their account to be banned: https://news.ycombinator.com/item?id=47588970
Maybe it would be okay as a first filtering step, before doing actual sentiment analysis on the matches. That would at least eliminate obvious false positives (but of course still do nothing about false negatives).
So much for langchain and langraph!! I mean if Anthropic themselves arent using it and using a prompt then what’s the big deal about langchain
Langgraph is for multi-agent orchestration as state graphs. This isn't useful for Claude Code as there is no multi-agent chaining. It uses a single coordinator agent that spawns subagents on demand. Basically too dynamic to constrain to state graphs.
>Is it ironic? Sure. Is it also probably faster and cheaper than running an LLM inference just to figure out if a user is swearing at the tool? Also yes. Sometimes a regex is the right tool.
I'm reading an LLM written write up on an LLM tool that just summarizes HN comments.
I'm so tired man, what the hell are we doing here.
The frustration regex is funny but honestly the right call. Running an LLM call just to detect "wtf" would be ridiculous.
KAIROS is what actually caught my attention. An always-on background agent that acts without prompting is a completely different thing from what Claude Code is today. The 15 second blocking budget tells me they actually thought through what it feels like to have something running in the background while you work, which is usually the part nobody gets right.
Hooks is an official documented feature for quite a long time now https://code.claude.com/docs/en/hooks
Tangentially, I wonder if the world trade federation or the Washington tennis foundation have any projects on GitHub :)
I love that it only supports English. AI bubble in a nutshell.
It also somehow messed up my alacritty config when I first used it. Who knows what other ~/.config files it modifies without warning.
- Claude Chat: built like it's 1995, put business logic in the button click() handler. Switch to something else in in the UI and a long running process hard stops. Very Visual Basic shovelware.
- Claude Cowork: same but now we're smarter, if you change the current convo we don't stop the underlying long-running process. 21st century FTW!
- Claude Code: like chat, but in the CLI
- Claude Dispatch: an actual mobile client app, not the whole thing bundled together.
- Daemon mode: proper long-running background process, still unreleased.
Interesting based on the other news that is out.
And actually just looking this up, it appears claude-code itself was just added to that whitelist : D
https://github.com/oven-sh/bun/commit/5c59842f78880a8b5d9c2e...
In the span of basically a week, they accidentally leaked Mythos, and then now the entire codebase of CC. All while many people are complaining about their usage limits being consumed quickly.
Individually, each issue is manageable (Because its exciting looking through leaked code). But together, it starts to feel like a pattern.
At some point, I think the question becomes whether people are still comfortable trusting tools like this with their codebases, not just whether any single incident was a mistake.
" ...accidentally shipping your source map to npm is the kind of mistake that sounds impossible until you remember that a significant portion of the codebase was probably written by the AI you are shipping.”
Are you referencing the use of Claude subscription authentication (oauth) from non-Claude Code clients?
That’s already possible, nothing prevents you from doing it.
They are detecting it on their backend by profiling your API calls, not by guarding with some secret crypto stuff.
At least that’s how things worked last week xD
https://alex000kim.com/posts/2026-03-31-claude-code-source-l...
Ah, it seems that Bun itself signs the code. I don't understand how this can't be spoofed.
https://fortune.com/2026/01/29/100-percent-of-code-at-anthro...
> Right now for most products at Anthropic it's effectively 100% just Claude writing
- Mike Krieger, chief product officer of Anthropic
Not only that, wouldn't allow other CLIs to be used either.
[0] https://ai-act-service-desk.ec.europa.eu/en/ai-act/article-5...
> which is published with the purpose of informing the public on matters of public interest
From your link, that's the only case where text needs to be attributed to AI.
https://code.claude.com/docs/en/settings#attribution-setting...
1. the LLM is instructed on how to write a commit message and never include co-authorship
2. the LLM is asked to produce a commit message
3. the LLM output is parsed by a script which removes co-authorship if the LLM chooses to include it nevertheless* The authors made the code very broad to improve its ability to achieve the stated goal
* The authors have an unstated goal
I think it's healthy to be skeptical but what I'm seeing is that the skeptics are pushing the boundaries of what's actually in the source. For example, you say "says on the tin" that it "pretends to be human" but it simply does not say that on the tin. It does say "Write commit messages as a human developer would" which is not the same thing as "Try to trick people into believing you're human." To convince people of your skepticism, it's best to stick to the facts.
~/.claude/settings.json
{
"attribution": {
"commit": "",
"pr": ""
},
The rest of the prompt is pretty clear that it's talking about internal use.Claude Code users aren't the ones worried about leaking "internal model codenames" nor "unreleased model opus-4-8" nor Slack channel names. Though, nobody would want that crap in their generated docs/code anyways.
Seems like a nothingburger, and everyone seems to be fantasizing about "undercover mode" rather than engaging with the details.
For a company calling chinese companies out for distillation attacks on their models, this very much looks like a distillation attack against human maintainers, especially when combined with the frustration detector.
Needing to flag nontrivial code as generated was standard practice for my whole career.
I don't understand why people consider Claude-generated code to be their own. You authored the prompts, not the code. Somehow this was never a problem with pre-LLM codegen tools, like macro expanders, IPC glue, or type bundle generators. I don't recall anybody desperately removing the "auto-generated do not edit" comments those tools would nearly always slap at the top of each file or taking offense when someone called that code auto-generated. Back in the day we even used to publish the "real" human-written source for those, along with build scripts!
I regularly have tool-generated commits. I send them out with a reference to the tool, what the process is, how much it's been reviewed and what the expectation is of the reviewer.
Otherwise, they all assume "human authored" and "human sponsored". Reviewers will then send comments (instead of proposing the fix themselves). When you're wrangling several hundred changes, that becomes unworkable.
Absolutely. That would be hilarious.
Assisted-by: Claude:claude-3-opus coccinelle sparse
If your linter is able to action requests, then it probably makes sense to add too.
This is also useful for keeping your prompts commit-sized, which in my experience gives much better results than just letting it spin or attempting to one-shot large features.
The point isn't to hijack accountability. It's free publicity, like how Apple adds "Sent from my IPhone."
Kinda, yeah. If I automatically apply lint suggestions, I would title my commit "apply lint suggestions".
co-authoring doesn't hide your authorship
if I see someone committing a blatantly wrong code, I would wonder what tool they actually used
"Write commit messages as a human developer would — describe only what the code change does."
[edit] Never mind, find in page fail on my end.
> Commit f9205ab3 by dkenyser on 2026-3-31 at 16:05:
> Fixed the foobar bug by adding a baz flag - dkenyser
Because it already identified you in the commit description. The reason to add a signature to the message is that someone (or something) that isn't you is using your account, which seems like a bad idea.
- "Fix bug found while testing with Claude Capybara"
- "1-shotted by claude-opus-4-6"
- "Generated with Claude Code"
- "Co-Authored-By: Claude Opus 4.6 <…>"
This makes sense to me about their intent by "UNDERCOVER"
Why? Agents may or may not read docs. It may or may not use skills or tools. It will always read comments "in the line of sight" of the task.
You get free long term agent memory with zero infrastructure.
Only being half ironic with this. I generally find that people somehow magically manage to understand how to be materially helpful when the subject is a helpless LLM. Instead of pointing it to a random KB page, they give it context. They then shorten that context. They then interleave context as comments. They provide relevant details. They go out of their way to collect relevant details. Things they somehow don't do for their actual colleagues.
This only gets worse when the LLM captures all that information better than certain human colleagues somehow, rewarding the additional effort.
Its also annoying to have to go through this stack
code -> blame -> commit message -> jira ticket -> issue in sales force...
Or the even better "fixes bug NNNNN" where the bug tracking system referenced no longer exists.
Digging through other systems (if they exist) to find the nugget in an artifact is a problem for humans too.
That's revealing waaaay more than the agent needs to know.
I think a big question is whether one wants your agent to know the reason for all the reasons for guidelines you issue or whether you want the agent to just follow the guidelines you issue. Especially, giving an agent the argument for your orders might make the agent think that can question and so not follow those arguments.
Comments are ultimately so you can understand stuff without having to read all the code. LLMs are great when you force them to read all code, and comments only serve to confuse. I'd say the opposite been true in my experience, if you're not forcing LLMs to not have any comments at all (and it can actually skip those, looking at you Gemini), you're doing agent coding wrong.
They didn't expect to leak their source code.
It's hardly a trade secret, what value is this to a competitor?
I suspect that's the logical endpoint of trying to provide everything as context to an agent. Why use a separate markdown file and have to waste extra tokens explaining what part of the codebase something applies to when you can just put it right there in the code itself?
I mostly mentioned it because it is pre-installed on some (linux) systems. Though of course if you're trying to obfuscate the sourcecode you need to bundle an interpreter with the code anyway.
But it has historically been used for big programs, and there are well established methods for bundling python programs into executables.
False.
Technically, neither does JavaScript.
.bash_profile .bashrc .claude .env .gitconfig .gitmodules .idea
.mcp.json .profile .ripgreprc .vscode .zprofile .zshrc config
https://github.com/anthropic-experimental/sandbox-runtime/is...Me. My .config is git-versioned :)
Or am I missing sarcasm?
The only thing I found interesting about this leak is just how much of a rats nest the code base is. Like it actually feels vibe coded without a shred of intelligent architecture behind it.
Regardless, you can't beat the subscription and model access despite the state of the code base, so I still use Claude Code daily and love it.
Two months later it was CVE after CVE.
Power(money) lies with NVDA and people who can best harness this power.
They are most likely using these as post-fact indicators and have automation they kicks in after a threshold is reached.
Now that the indicators have leaked, they will most likely be rotated.
They can't really do that. Now they have no way to distinguish "this is a user of a non updated Claude code" from "this is a user of a Claude code proxy".
> `includeCoAuthoredBy`: whether to instruct the agent to add a Co-authored-by trailer to git commits; defaults to `true`.
This means that, if you don't explicitely configure otherwise, the LLM is specifically instructed to include co-authorship in its higher level instructions.
Seems to me like everyone's just grasping at straws to nitpick every insignificant little thing.
No shit they have secrets. I have secrets too. That doesn't make it ok for me to deceive you in any way.
How would you feel if I deceived you and my excuse was "oh I was just trying some new secret technique of mine"?
How did we get to this point where we let enormously powerful companies get away with more than individuals?
What's the value add over doing it with just Python code? I mean you can represent any logic in terms of graphs and states..
Other than that I’m not too sure.
Ideally, if I contribute to any codebase, what needs to be judged is the resulting code. Is it up to the project's standards ? Does the maintainer have design objections ?
What tool you use shouldn't matter, be it your IDE or your LLM.
But that also means you should be accountable for it, you shouldn't defend behind "But Claude did this poorly, not me !", I don't care (in a friendly way), just fix the code if you want to contribute.
The big caveat to this is not wanting AI-Generated code for ideological reasons, and well, if you want that you can make your contributors swear they wrote it by themselves in the PR text or whatever.
I'm not really sure how to feel about this, but I stand by my "the code is what matters" line.
Since AI tools are constantly obsoleted, generate different output each run, and it is often impossible to run them locally, the input prompts are somewhat useless for everyone but the initial user.
https://code.claude.com/docs/en/settings#attribution-setting...
suddenly everyone cares about data hygiene. But it’s not like this shouldn’t have always been a priority
There's a good reason why we didn't though: because we didn't see any obvious value in it. So it felt like a waste of time. Now it feels like time well spent.
"Self-descriptive code doesn't need comments!" always gets an eye-roll from me
I reckon some believe these companies are basically magical, and are utterly astonished when they’re shown to be imperfect in relatively uninteresting ways. I’m a lot more concerned about the sanity of the AI ecosystem they operate in than the stability of some front-end Anthropic made.
Unless you literally vibe coded it, Claude is just a tool. This is the equivalent of Apple appending "Sent from my iPhone" as a signature to outgoing emails. It's advertising tool use, not providing attribution. The intent isn't to disclose that AI was used in creating the code, the intent is to advertise the AI product.
Stop with the selective bias, two are birds of the same feather, they are using a "tool" to write the code for them from whatever (questionable) mashed up source they are trained from, in the same way someone is using AI as a "tool" to fabricate their curriculum and cheat a job
> The intent isn't to disclose that AI was used in creating the code, the intent is to advertise the AI product
That is a wild mental gymnastic to justify dishonestly submitting code you didn't write (or own) as yours. It has nothing to do with advertisement but proper attribution, you know it.
A quick PR where I've found the bug myself in the code, and ask Claude to write the fix because it's faster, and verified it - I don't include Claude's co-authorship.
Even if I then edit it to adapt and modify it, I'll just add my copyright header, but I can't replace/remove the original one.
Unlike a black magic box where you just tell it to build something and it does all the production for you while you sit back.
You don’t have to rely on humans doing it. The agent’s entire existence is built around doing this one mundane task that is annoying but super useful.
Yes, lets blow another 5-10k a project/month on tokens to keep the comments up to date. The fact ai still cannot consistently refactor without leaving dead code around even after a self review does not give me confidence in comments…
Comments in code are often a code smell. This is an industry standard for a reason and isnt because of staleness. If u are writing a comment, it means the code is bad or there is irreducible complexity. It is about good design. Comments everywhere are almost always a flag.
Note, language conventions are not the same.
See: https://library.osu.edu/site/copyright/2026/02/06/artificial...
Recent relevant discussion about this in the chardet repo between the chardet maintainer who relicensed the chardet code and Richard Fontana, a well regarded lawyer US IP lawyer who's worked for Red Hat (now IBM) for decades:
https://github.com/chardet/chardet/issues/334#issuecomment-4...
My take away from the conversation there is that being in an edit loop, where the files are AI generated through your control rather than directly editing the files yourself, means the files are then "AI authored" for copyright protection purposes rather than yourself.
But I double stress, I'm not a lawyer so may have misunderstood things radically.
The whole thing is a legal mess. How do you know the LLM did not reproduce existing code? There is an ongoing legal battle in German between GEMA and OpenAI because ChatGPT reproduced parts of existing song lyrics. A court in Munich has found that this violates German copyright law.
A copyright over code means that ONLY you can use that code, and nobody else; otherwise, you can sue them. For example, if you are an arist, you want to protect your IP this way.
Yes, AI generated code is not copyrightable but so is most code in general. It is very hard to truly get a copyright for a piece of code. But just because you don't have copyright to something doesn't mean it's not your property.
For example, you can buy several movies on DVD and those DVDs will still be your property even though you don't have copyright and if someone does steal those DVDs, it will be considered theft of your property. Similarly, just because the code is AI-generated/not copyrightable, doesn't mean others can just steal it.
Think about it - so many codebases are not legally protected as copyrighted material but are absolutely protected by IP laws and enforced by the companies that own them.
If your code was generated by you and you store it in your system and have property rights over it, you can enforce legal actions even without holding a copyright over the code.
In general, it is kind of weird to want to copyright code. How do you patent a for-loop for example
If you author something, you have the sole copyright. In fact, in Germany you can't even waive your copyright away. However, you can grant licenses for the use of your work.
The difference between copyright and licenses is crucial! By licensing your work, you do not waive your copyright. You still remain the owner. If you publish your code under the GPL and you are the sole author, you can always relicense your code or issue commercial licenses under different terms.
> In general, it is kind of weird to want to copyright code. How do you patent a for-loop for example
There is a fundamental difference between copyright and patents! Patents require a novel technical contribution and they must be granted by a patent office.
https://patents.google.com/patent/US9583163B2/en
> How do you patent a for-loop for example
Ah, so you kinda maybe sorta absolve yourself of culpability (but not really — "I didn't know this was copyrighted material" didn't grant you copyright), and simultaneously make fixing the potentially compromised codebase (someone else's job, hopefully) 100x harder because the history of which bits might've been copied was never kept.
Solid advice! As ethical as it is practical.
By the same measure, junkyards should avoid keeping receipts on the off chance that the catalytic converters some randos bring in after midnight are stolen property.
Better not document it.
One little trick the legal folks don't want you to know!
I don't meant this as a drive by bazinga either, the practice of copying code or thinking you understand it when you don't is nothing new
Again, I don’t care about them.
every delay to AGI results in deaths that AGI could have prevented. we are talking millions of deaths for, say, just a year's delay.
- you are assuming that AGI is just around the corner and that scaling up language models is the path to get there
- you can make this argument about basically anything (nuclear power, tuberculosis medication, free healthcare). I’d say the burden of proof is on you to back up your extraordinary claim with extraordinary evidence.
Uhm, that's not what Hollywood told me. I trust the screenwriters of "Terminator" more than I trust your anonymous forum comment.
Sure, that's what AGI would be used for /s
In other news, we are not even close to AGI and even with the current experimental technology, frontier AI model companies are already fighting to help departments of war, which actually results in the most deaths. What makes you think AGI would be used for not leading to the same millions of deaths?
Everyone else is just gaming engagement metrics and benchmarks.
Also, honestly, given the speed constraints of large models, it makes almost no difference what language an agent is written in. The small performance differences between programming languages do not even begin to matter compared with network latency, let alone the speed at which a large model streams tokens.
Also you needed a leak to know that CC, that is out for 1 year, is in TS?
I don't know why they acquired Bun. Surely not because there's no other means to write command line programs.
> Also you needed a leak to know that CC, that is out for 1 year, is in TS?
What, are you shaming people for not being aware what (obfuscated) code some 1 year old software is written in?
> but so is most code in general.
That's definitely not true. All the code I write has my copyright, unless I waive that right to some other entity. If there was no copyright, there would no licensing. How else could you license your code, if you were not the copyright holder?
Have you never seen "Copyright (c) <Authors> 2025" in source code files?
The very fact that your code has your copyright is also the reason for things like CLAs.
> For example, you can buy several movies on DVD and those DVDs will still be your property even though you don't have copyright
That's because artistic works are distributed under a license. Just like software. Licenses have terms under which circumstances a work can be used, modified and (re)distributed. In the case of DVDs, you are generally not allowed to make your own copies and then sell them. In the case of software, that's why you have the various software licenses (proprietory or open-source).
> Similarly, just because the code is AI-generated/not copyrightable, doesn't mean others can just steal it.
You can't set licensing terms for something that is not copyrightable.
Huh? Normal property law is plainly not applicable to a non-rival good like information (unlike for instance a physical DVD: if someone takes a DVD from me, I don’t have it anymore). “Intellectual property” is, but it is not so much a legal regime as confusing shorthand for a number of distinct ones:
- Trademark law, which applies to markings on copies rather than copies themselves;
- Trade secret law, which stops applying when the information escapes into the wild through the secret-holder’s own actions;
- Patent law, which definitionally only applies to public knowledge as an incentive to not keep it secret instead;
- Publicity rights, which only apply to depictions or discussions of natural persons;
- Moral rights, which are mostly about being recognized as the author and even in their strongest incarnations do not restrict unmodified copies;
- Database right, which isn’t applicable as we’re not talking about a compendium of things, and anyway does not exist in the US and most other places outside the EU;
- Copyright, which you’ve conceded is not applicable here.
There’s no “intellectual property” distinct from these things, and none of them are relevant.
Unfortunately in many bundlers making a mistake like this is as easy as an agent deleting “process.env[‘ENV’] === ‘debug’” which they’ll gladly do if you point them at a production or staging environment and ask them to debug the stripped/compiled/minified code.
I think I still disagree with the parent comment premise that "they probably thought minifying was enough", since it sounds likely they were doing all of those other steps. The issue seems like insufficient auditing of the build process (especially if agents were involved, which seems likely for Anthropic) rather than not doing all of the usual JS build stuff.
I'm doing it because I know how much I appreciate well-written documentation. Also this is a bit niche, but if you're using Rust and add examples to doc-comments, they get run as tests too.
Also given we both managed to produce more than one sentence, and include capital letters in our comments, it's entirely possible both of us will be accused of being an AI. Because, you know... People don't write like this, right?
Could anyone explain the esoteric meaning of why people started doing that shit? I got a hypothesis, what's going on is something like this:
1. Prove you are human: write Like A Fucking Adult You Weirdo (internal designator for a specific language register, you know the one)
2. Prove you are human: _DON'T_ write Like A Fucking Adult You Weirdo (because that's how LLMs were trained to write, silly!)
3. ???? (cognitive dissonance ensues)
4. PROFIT (you were just subject to some more attrition while the AI just learned how to pass a lil bit better)
I never thought computer programmers of all people would get trapped in such a simple loop of self-contradiction.
But I guess the human materiel really has degraded since whenever. I blame remote work preventing us from even hypothetically punching bosses, but anyway weird fucking times eh?
Maybe the posts trying to figure "this post is AI, that post is not AI" are themselves predominantly AI-generated?
Or is it just people made uncomfortable by what's going on, but not able to articulate further, jumping on the first bandwagon they see?
Or maybe this "AI-doubting of probably human posters" was started by humans, yes - then became "a thing", and as such was picked up by the LLM?
Like who the fuck knows, but with all honesty that's how I felt about so many things, dating from way before LLMs became so powerful that the above became a "sensible" question to ask...
Predominantly those things which people do by sheer mimesis - such as pop culture.
"Are you a goddam robot already - don't you see how your liking the stupid-making song is turning you into stupid-you, at a greater rate than it is bringing non-stupid-you aesthetic satisfaction?" type of thing -- but then I assume in more civilized places than where I come from people are much more convincingly taught that personal taste "doesn't matter" (and simultaneously is the only thing that matters; see points 1-4... I guess that's what makes some people believe curating AI, i.e. "prompt engineering" can be a real job and not just boil down to you being the stochastic parrot's accountability sink?)
I'm not even sure English even has the notions to point out the concrete issue - I sure don't know 'em.
Ever hear of the strain of thought that says "all metaphysical questions are linguistic paradoxes (and it's self-evidently pointless to seek answers to nonsensical questions)"?
Feels kinda like the same thing, but artificially constructed within the headspace of American anti-intellectuallism.
Maybe a correct adversarial reading of the main branding acronym would be Anti-Intelligence.
You know, like bug spray, or stain remover.
But for the main bug in the system; the main stain on the white shirt: the uncomfortable observation that, in the end, some degree of independent thinking is always required to get real things done which produce some real value. (That's antithetical to standard pro-social aversive conditioning, which says: do not, under any circumstance, just put 2 and 2 together; lest you turn from "a vehicle for the progress of civilization" back into a pumpkin)
Writing and reading paragraphs of design discussion in a commit message is not something that seems common.
And of course tickets can be private, so even if the data survived migration, you may not have access to it (principle of least privilege and all that).
VS Code has a setting that promises to change the prompt it uses to generate commit messages, but it mostly ignores my instructions, even very literal ones like “don’t use the words ‘enhance’ or ‘improve’”. And oddly having it set can sometimes result in Cyrillic characters showing up at the end of the message.
Ultimately I stopped using it, because editing the messages cost me more time than it saved.
/rant
Not saying we shouldn't be careful with AGI. But the glib tone of "who cares if these companies die?" is where one needs to consider the consequences of AGI not happening or being delayed.
Pitch me on this utopian outlook, because nothing about any of the Frontier companies points away from dystopia to me
// [umbrella] Describe skill 1. (Prompt engineering)
1. Skillful use of **prompt engineering**
// [rocketship] Describe skill 2. (Agentic loops)
2. Know how to use **agentic loops** skillfully
....> Providers shall ensure that AI systems intended to interact directly with natural persons are designed and developed in such a way that the natural persons concerned are informed that they are interacting with an AI system
That should be obvious considering an MR is not providing AI services.
> which is published with the purpose of informing the public on matters of public interest
There is no "informing public on matters of public interest" in source code nor an MR. It's clearly meant to prevent "deepfake" news, like the image and video ones explicitly call that out.
In some jurisdictions (e.g. the UK) the law is already clear that you own the copyright. In the US it is almost certain that you will be the author. The reports of cases saying otherwise I have been misreported - the courts found the AI could not own the copyright.
Thaler v. Perlmutter: The D.C. Circuit Court affirmed in March 2025 that the Copyright Act requires works to be authored "in the first instance by a human being," a ruling the Supreme Court left intact by declining to hear the case in 2026.
And in the US constitution,
https://constitution.congress.gov/browse/article-1/section-8...
Authors and inventors, courts have ruled, means people. Only people. A monkey taking a selfie with your camera doesn't mean you own a copyright. An AI generating code with your computer is likewise, devoid of any copyright protection.
The ruling says that the LLM cannot be the author. It does not say that the human being using the LLM cannot be the author. The ruling was very clear that it did not address whether a human being was the copyright holder because Thaler waived that argument.
the position with a monkey using your camera is similar, and you may or may not hold the copyright depending on what you did - was it pure accident or did you set things up. Opinions on the well known case are mixed: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
Where wildlife photographers deliberately set up a shot to be triggered automatically (e.g. by a bird flying through the focus) they do hold the copyright.
> The question is whether anyone has or if whatever content generated by a LLM simply does not constitute a work and is thus outside the entire copyright law.
Its is going to vary with copyright law. In the UK the question of computer generated works is addressed by copyright law and the answer is "the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken"
Its also not a simple case of LLM generated vs human authored. How much work did the human do? What creative input was there? How detailed were the prompts?
In jurisdictions where there are doubts about the question, I think code is a tricky one. If the argument that prompts are just instructions to generate code, therefore the code is not covered by copyright, then you could also argue that code is instructions to a compiler to generate code and the resulting binary is not covered by copyright.
[1] https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
It becomes legally challenging with regards to ownership if I ever use work equipment for a personal project. If it later takes off they could very well try to claim ownership in its entirety simply because I ran a test once (yes, there's a while silicon valley season for it).
I don't know if they'd win, but Anthropic absolutely would be able to claim the creation of that code was done on their hardware. Obviously we aren't employees of theirs, though we are customers that very likely never read what we agreed to in a signup flow.
Anthropic's user agreement does not have a similar agreement.
I don't know of ant precedent where the code was literally generated on someone else's system. Its an open question whether that implies any legal right to the work and I could pretty easily see a court accepting the case.
I’m not saying comments are magic or anything. It takes work to keep them in sync with the code.
It’s a useful goal. Not a rule that gets you out in jail if yo fail.
It doesn’t mean you can blindly trust comments. I treat all code, and comments, with skepticism until I can understand and run it.
I'm gonna note down my reasons for doing things and other information I deem useful, and if some other dipshit 5 years from now when I've moved on comes along and starts changing everything up without keeping the comments up to date that's their problem not mine. There was never anything wrong with my comments, the only thing that's wrong is the dipshit messing things up.
Doesn't matter what I do, the dipshit is going to mess everything up anyway. Those outdated comments will be the least of their worries.
Or how would you name methods and variables to explain why some payment reconciliation process skips matching for transactions under 0.50 EUR and just auto-approves them, because the external payment processor rounds differently than the internal ledger at sub-euro amounts, creating mismatches that were flooding the finance team's exception queue in 2013, explained more under Jira issue ZXSV-12456 and more details are known by j.doe@myorg.com. The threshold was chosen after analyzing six months of false positives, when it's any higher someone being undercharged doesn't get caught. I don't think autoApproveThreshold = 0.50 or anything like that would get the full context across, even if the rules themselves are all code.
I think surely you can have both! Code should explain itself as often as possible, but when you hit a wall due to some counter-intuitive workarounds being needed, or some business rules or external considerations that you need to keep track of, then comments also make sense. Better than just putting everything in a Jira issue somewhere, given that it often won't be read by you or others and almost certainly will not be read by any AI agents (unless you have an MCP or something, but probably uncommon), or spending hours trying to get the code to explain something it will never explain well. I've had people ask me about things that are covered in the README.md instead of reading it.
This is a good goal. You should strive to make the code explain itself. To write code that does not need comments.
You will fail to reach that goal most of the time.
And when you fail to reach that goal, write the dang comments explaining why the code is the way that it is.
Gonna try and use that throughout my life. Thanks!
Agent is not going to know to look for a file to update unless instructed. Now your file is out of sync. Code comments keeping everything line of sight makes it easy and foolproof.
If this is not the case you should not be sending it to public repos for review at all. It is rude and insulting to expect the people maintaining these repos to review code that nobody bothered to read.
The difference here is that the generator is a non-deterministic LLM and you can't reason about its output the same way.
As for LLM code assistants, I don't really view them as traditional code generation tools in the first place, as in practice they more resemble something in between autocomplete and delegating to a junior programmer.
As for attribution, I view it more or less the same way as "dictated but not read" in written correspondance, i.e., an disclaimer for errors in the code, which may be considered rude in some contexts, and a perfectly acceptable and useful annotation in others.
No. I don't want to test and pick through your shitty LLM generated code. If I wanted the entire code base to be junk, it'd say so in the readme.
This is not at all the case with LLM-generated code - mostly because you can't regenerate it even if you wanted to, as it's not deterministic.
That said, I do agree that LLM code is different enough from human code (even just in regards to potential copyright worries) that it should be mentioned that LLMs were used to create it.
How about compiler?
Similarly, if I use e.g. jextract or uniffi to generate Java interfaces from C code and check that in, I'll create tooling to automatically run those, and the commit will be attributed to that tooling.
Replace gRPC compiler with LLM. Can you reproduce? (probably not 100%). Can anybody fix it short of throwing more english phrases like "DO NOT", "NEVER", "Under No Circumstances"?
Probably not.
I thought the argument was that AI-users were reviewing and understanding all of the code?
How? LSTM?
Arguably snippet collections belong to this genre.
The following comment in the blog post
//go:generate stringer -type=Pill
generates a .._string.go file which contains a '.String()' method.I would find it very reasonable to commit that with 'Co-Authored-By: stringer v0.1.0' or such.
Or 'sed s/a/b/g' and 'Co-Authored-By: sed'
AI and humans are not the same as authors of PRs. As an obvious example: one of the important functions of the PR process is to teach the writer about how to code in this project but LLMs fundamentally don't learn the same way as humans so there's a meaningful difference in context between humans and AIs.
If a human takes the care to really understand and assume authorship of the PR then it's not really an issue (and if they do, they could easily modify the Claude messages to remove "generated by Claude" notes manually) but instead it seems that Claude is just hiding relevant context from the reviewer. PRs without relevant context are always frustrating.
What you're describing here is a design. The most important parts of a design are the decisions and their reasoning.
e.g. "we decided on tool/library pattern X over tool/library/pattern Y because Z" – that is a design, usually discussed outside (and before) a commit message.
You discuss these decisions with others, document the discussion and decision, and then you have a design and can start writing code.
Let me ask you this: suppose you have a task that needs to be done eventually, and you want to write down some ideas for it, but don't want to start coding right now. Where do you put those ideas? How do you link them to that specific task?
Random example:
Provide a new syscall which has the only purpose to yield the CPU after the kernel granted a time slice extension.
sched_yield() is not suitable for that because it unconditionally schedules, but the end of the time slice extension is not required to schedule when the task was already preempted. This also allows to have a strict check for termination to catch user space invoking random syscalls including sched_yield() from a time slice extension region.
From 99d2592023e5d0a31f5f5a83c694df48239a1e6c
In your example, for example, where was the issue tracked before the code was written? The format you linked makes it difficult to get the history of the issue.
Let me ask you this: suppose you have a task that needs to be done eventually, and you want to write down some ideas for it, but don't want to start coding right now. Where do you put those ideas? How do you link them to that specific task?
https://cgit.freebsd.org/src/commit/?id=407b1e4980189252fade...
You can find more example there
In other words, what AI service are you providing by creating a PR?
> provider’ means a natural or legal person, public authority, agency or other body that develops an AI system or a general-purpose AI model or that has an AI system or a general-purpose AI model developed and places it on the market or puts the AI system into service under its own name or trademark, whether for payment or free of charge;
Alternately, you can say the same about informative variable names or informative function names. "If I change the function then the name is no longer accurate". You don't say that because function names and variable names are short and clear and are close to the problem at hand. Do the same with comments.
Which is why the copilot hyper-verbosity is harmful. Comments need to be terse so your eyes don't filter them out as noise.
Then the question becomes how often we look in the version control history for the files that we want to touch.
Which of these is more likely:
A) someone digging into the full history of autoApproveThreshold and finding out that they need to contact j.doe@myorg.com or reference ZXSV-12456
B) or them just messing the implementation with changes due to not reviewing the history of every file they touch
If someone is doing a refactor of 20 files, they probably won't review the histories of all of those, especially if the implementation is spread around a bunch of years, doubly so if there are a bunch of "fixes" commit messages in the middle, merge commits and so on. I've seen people missing out on details that are in the commit log many, many times, to the point where I pretty much always reach for comments. Same goes for various AI tools and agents.
Furthermore, if you want to publish a bit of code somewhere (e.g. Teams/Slack channel, or a blog), you'd need to go out of your way to pull in the relevant history as well and then awkwardly copy it in as well, since you won't always be giving other people a Git repo to play around with.
It's not that I don't see your point, it's just that from where I stand with those assumptions a lot of people are using version control as a tool wrong and this approach neither works now, nor will work well for them in the future.
It's more or less the same issue as with docs in some Wiki site or even a separate Markdown file (which is better than nothing, definitely closer than a Wiki, especially if the audience is someone who wants an overview of a particular part of the codebase, or some instructions for processes that don't just concern a few files; but it's still far removed from where any actual code changes would be made, also a downside of ADRs sometimes).
It looks like it will be decided on a case by case basis.
It will also differ between countries, so if you are distributing software internationally what will be a constraint on treating the code as not copyrightable.
Very good question I would think it is. You are just using a mechanical system to transform your prompt to something else, Right?
But, a distiguishing factor may be that:
1. Output of the LLM for the same prompt can vary
2. So you don't really have "control" over what the AI produces
3. Therefore you should not get a copyright to the output of the LLM because you had very little to say about how that transformation (from prompt to code) was made.
Email is the review process, and commits contain enough information that git blame can get you a reasoning - it doesn't require you checking the email archive. Rather than a dead ticket that no longer exists.
I can also supply you a list of companies that make use of git's builtin features if you like. But thats probably not relevant to discussing management techniques.
AI generated code has no copyright. And if it DID somehow have copyright, it wouldn't be yours. It would belong to the code it was "trained" on. The code it algorithmically copied. You're trying to have your cake, and eat it too. You could maybe claim your prompts are copyrighted, but that's not what leaked. The AI generated code leaked.
> the Copyright Office concludes that existing legal doctrines are adequate and appropriate to resolve questions of copyrightability. Copyright law has long adapted to new technology and can enable case-by- case determinations as to whether AI-generated outputs reflect sufficient human contribution to warrant copyright protection. As described above, in many circumstances these outputs will be copyrightable in whole or in part—where AI is used as a tool, and where a human has been able to determine the expressive elements they contain. Prompts alone, however, at this stage are unlikely to satisfy those requirements.
So the TL;DR basically implies pure slop within the current guidelines outlined in conclusions is NOT copyrightable. However collaboration with an AI copyrightability is determined on a case by case basis. I will preface this all with the standard IANAL, I could be wrong etc, but with the concluding language using "unlikely" copyrightable for slop it sounds less cut and dry than you imply.
You have no idea how bad this leak is for Anthropic because with the copyright office, you have a DUTY TO DISCLOSE any AI generated work, and it is fully RETROACTIVE. And what is part of this leak? undercover.ts. https://archive.is/S1bKY Where Claude is specifically instructed to HIDE DISCLOSURE of AI generated work.
That's grounds for the copyright office and courts to reject ANY copyright they MIGHT have had a right to. It is one of the WORST things they could have done with regard to copyright.
https://www.finnegan.com/en/insights/articles/when-registeri...
I'm interpreting your most recent reply to me as an implication that I'm taking the conclusions you yourself linked out of context. I'm trying to give the benefit of the doubt here, but the 3 linked PDF documents aren't "a mountain of evidence" supporting your argument. Maybe I missed something in one of those documents (very possible), but the conclusions are not how you imply.
Whether or not a specific git commit message correctly sites Claude usage or not may further muddy the waters more than IP lawyers are comfortable with at this time (and therefore add inherent risk to current and future copyright claims of said works), but those waters were far from crystal clear in the first place.
Again, IANAL, but from my limited layman perspective it does not appear the copyright office plans to, at this moment in time, concisely reject AI collaborated works from copyright.
Your most recent link (Finnegan) is from an IP lawyer consortium that says it's better to include attribution and disclosure of AI to avoid current and future claim rejections. Sounds like basic cover-your-ass lawyer speak, but I could be wrong.
Full disclosure: I primarily use AI (or rather agentic teams) as N sets of new eyeballs on the current problem at hand, to help debug or bounce ideas off of, so I don't really have much skin in this particular game involving direct code contributions spit out by LLMs. Those that have any risk aversion, should probably proceed with caution. I just find the upending of copyright (and many other) norms by GenAI morbidly fascinating.
I was not aware of that. WHo has that duty and when do they have it?
The copyright office still holds that human authorship is a core tenet of copyrightability, however, whether or not a submission meets the "de minimis" amount of AI-generated material to uphold a copyright claim is still being decided and refined by the courts and at the moment the distinction appears to fall on whether the AI was used "as a tool" or as "an author itself", with the former covered in certain cases and the latter not.
The registration process makes it clear that failure to disclose submissions in large contribution authored by contractor or ai can result in a rejection of copyright claim now or retroactive on discovery.
I do not know whether any other country even has copyright registration.
Your main point that this is something the courts (or new legislation) will decide is, of course, correct. I am inclined to think this is only a problem for people who are vibe coding. The moment a human contributes to the code that bit is definitely covered by copyright, and unless you can clearly separate out human and AI contributed bits saying the AI written bits are not covered is not going to make a practical difference.
https://news.ycombinator.com/item?id=47594044
That comment is spot on. Claude adding a co-author to a commit is documentation to put a clear line between code you wrote and code claude generated which does not qualify for copyright protection.
The damning thing about this leak is the inclusion of undercover.ts. That means Anthropic has now been caught red handed distributing a tool designed to circumvent copyright law.
IME unfortunately that's not actually the case. It very much is your problem, as the architect of the original system, unless you can get yourself transferred to a department far, far away. I've never managed that except by leaving the company.
To be clear, I don't believe it should be this way, but sadly unless you work in an uncommonly well run company it usually is.
And if anyone has a problem with comments existing it's trivial to find/replace them out of existence. Literally a one minute job, if you actually think the codebase would be better without them.
This is such a humongous non-issue it's crazy man.
(1) "flat" organization where everyone owns everything and therefore nobody has the authority to reject a PR
or (2) "rubber stamp" culture where people who reject shoddy work are "not a team player" and therefore performance defective.
So far every company I've worked at has one or both of these symptoms. Working in the confines of those systems, it's not an irrational choice to decide that comments and other forms of documentation aren't worth trying to maintain, and are therefore detrimental.
The “dipshit” doesn’t mess everything up for fun. They don’t understand the comments written by the previous “dipshit” and thus are unable to update the comments.
Here's what I think. I think you guys heard the "self-documenting code" BS and ate it up, and now you're grasping at straws to defend your cargo cult position, inventing these "problems" to justify it.
If you're looking at some code and there's a comment saying something that doesn't make sense to you, maybe that's a clue that you're missing a puzzle piece and should take a step back maybe talk to some people to make sure you're not messing things up? Maybe, for a non-dipshit, that comment they don't understand could actually be helpful if they put some effort into it?
Also just to be clear I don't think this is a likely occurrence unless someone doesn't know squat about the codebase at all - my comments generally assume very little knowledge. That's their whole purpose - to inform someone (possibly me) coming there without the necessary background knowledge.
It just isn't feasible to include the why of everything in the code itself. And it sure as hell is better to include some info as comments than none at all. Otherwise a bug will often be indistinguishable from a feature.
And I don't think dipshits mess things up for fun. I think they just suck. They're lazy and stupid, as most developers are. If I'm there I can use reviews etc to help them suck less, if I'm not they're free to wreck my codebase with reckless abandon and nothing I do will make any difference. I cannot safeguard my codebase against that so there's no point in trying and the fact that this is your argument should make you stop and reconsider your position because it's far fetched as fuck.
I’ll also note that I’ve worked with developers who didn't like git blame because someone might misinterpret the results. I think some people want excuses for poor work, rather than just working as correctly as possible.
At least at my workplace though, it's just assumed now that you are using the tools.
In commercial settings you are often required to label your produce and inform about things like 'Made in China' or possible adverse effects of consumption.
I mean, of course I would read most of the code during review, but as a human, I often skip things by mistake
hmm gotta try that
But at this point i am more curious if git will continue to be the best tool.
Almost like writing “Code was created with the help of IntelliSense”.
It'll also become more important to know what code was actually written by humans.
I provide feedback on PRs. Then the owner of the PR adjusts it to accommodate my feedback and once I'm happy with it I approve it and we merge. If you're working in a place so cancerous that you can't just leave a comment on a PR reminding someone to update the comment they forgot to update I don't know why you're still there. This is called code review and it's common practice. If all you ever do is approve PRs then you're not doing code review and you might as well skip the whole PR step and let people merge into main as they please.
In any case your argument still just boils down to "I work with a bunch of stupid lazy dipshits" so why bother doing anything at all then? Write comments, don't write comments, write tests, don't write tests, do whatever the fuck you want because you're surrounded by useless dipshits and nothing you do matters anyway. Might as well write some comments for your own sake, everything's a ball of mud anyway it doesn't matter.
I'm gonna keep doing what I think is right in my sane corner of the world. And honestly I don't believe you. I think these excuses are just that. Excuses. I've been around quite a bit and haven't seen anything like you describe. Sure there's plenty of lazy dipshits but you don't have to sink to their level.
But for me at least, a tool like Git seems pretty essential for inspecting changes and deciding which to keep, which to reroll, and which to rewrite. (I'm not particularly attached to Git but an interface like Magit and a nice CLI for inspecting and manipulating history seem important to me.)
What are you imagining VCS software doing differently that might play nicer with LLM agents?
Check out Mitchell Hashimoto’s podcast episode on the pragmatic engineer. He starts talking about AI at 1:16:41. At some point after that he discusses git specifically, and how in some cases it becomes impossible to push because the local branch is always out of date.
I don’t think that’s necessarily the case. Exposure and discovery aren’t that tightly correlated. Maybe there’s a small effect, but I think it is outweighed by the fact that blast radius and spread is reduced while buying time for discovery.