https://xcancel.com/Fried_rice/status/2038894956459290963 Related ongoing thread: The Claude Code Source Leak: fake tools, frustration regexes, undercover mode - https://news.ycombinator.com/item?id=47586778 |
https://xcancel.com/Fried_rice/status/2038894956459290963 Related ongoing thread: The Claude Code Source Leak: fake tools, frustration regexes, undercover mode - https://news.ycombinator.com/item?id=47586778 |
the poor guy
Do you mean the LLM?Why weren't proper checks in place in the first place?
Bonus: why didn't they setup their own AI-assisted tools to harness the release checks?
unreliability becomes inevitable!
I'd agree if it was launch-and-forget scenario.
But this code has to be maintained and expanded with new features. Things like lack of comments, dead code, meaningless variable names will result in more slop in future releases, more tokens to process this mess every time (like paying tech-debt results in better outcomes in emerging projects).
They could have written that in curl+bash that would not have changed much.
There was a recent post on HN describing how to use local models with claude code by changing some env vars. Also some tools let you run Claude Code with other models conveniently (see: https://docs.ollama.com/integrations/claude-code).
> Are you an AI?
If I were one, I would not admit it!
Is that correct ? The weights of the LLMs are _not_ in this repo, right ?
It sure sucks for anthropic to get pawned like this, but it should not affect their bottom line much ?
This code hasn't been open source until now and contains information like the system prompts, internal feature flags, etc.
I even made it into an open source runtime - https://agent-air.ai.
Maybe I'm just a backend engineer so Rust appeals to me. What am I missing?
Perhaps these issues have known solutions? But so far the LLM just clones everything.
So I'm not convinced just using rust for a tool built by an LLM is going to lead to the outcome that you're hoping for.
[Also just in general abstractions in rust feel needlessly complicated by needing to know the size of everything. I've gotten so much milage by just writing what I need without abstraction and then hoping you don't have to do it twice. For something (read: claude code et al) that is kind of new to everyone, I'm not sure that rust is the best target language even when you take the LLM generated nature of the beast out of the equation.]
It's high speed iteration of release ? Might be needed, Interpreted or JIT compiled ? might be needed.
Without knowing all the requirements its just your workspace preference making your decision and not objectively the right tool for the job.
It's all I need for my work.
RAM on this machine can't be upgraded. No issue when running a few Codex instances.
Claude: forget it.
That's why something like Rust makes a lot of sense.
Even more now, as RAM prices are becoming a concern.
I don't know what else you're doing but the footprint of Claude is minor.
Anyway my point still stands, you're looking at it as if they are competing languages and one is better at all things. That just not how things work.
Claude code uses (and Anthropic owns) Bun, so my guess is they're doing a production build, expecting it not to output source maps, but it is.
Better than OpenCode and Codex
Claude Code is clearly a pile of vibe-coded garbage. The UI is janky and jumps all over the place, especially during longer sessions. (Which also have a several second delay to render. In a terminal).
Lately, it's been crashing if I hold the Backspace key down for too long.
Being open-source would be the best thing to happen to them. At least they would finally get a pair of human eyes looking at their codebase.
Claude is amazing, but the people at Anthropic make some insane decisions, including trying (and failing, apparently) to keep Claude Code a closed-source application.
this is highly workload-dependent. there are plenty of APIs that are multiple-factor faster and 10x more memory efficient due to native implementation.
> 57K lines, 0 tests, vibe coding in production
Why on earth would you ship your tests?
Not to say that Typescript is bad or anything, but I would like to see data on my gut feeling that "stricter languages would make coding agents work better"
> Yes, but if you put type strictness on a line, Rust would be further along I think.
There are huge differences between build times, as we know, Rust likes to compile with effort, by design, it's important for the compiler to navigate all the nuances. Typescript with bun for example, can run a bit faster. Is the compiler making you think it's more 'type safe' ?
Things like borrowing and ownership, having an affine type system, the GADTs, it's more tools in your toolbox to constrain your problem space.
[1] https://www.tasking.com/documentation/smartcode/ctc/referenc...
Like KAIROS which seems to be like an inbuilt ai assistant and Ultraplan which seems to enable remote planning workflows, where a separate environment explores a problem, generates a plan, and then pauses for user approval before execution.
[1] https://www.amazon.com/Programming-TypeScript-Making-JavaScr...
https://news.ycombinator.com/item?id=21934336
Thanks hackernewsbooks for sending me that!
Language servers, however, are a pain on Claude code. https://github.com/anthropics/claude-code/issues/15619
But a lot of desktop tools are written in JS because it's easy to create multi-platform applications.
* Check if 1M context is disabled via environment variable.
* Used by C4E admins to disable 1M context for HIPAA compliance.
*/ export function is1mContextDisabled(): boolean {
return
isEnvTruthy(process.env.CLAUDE_CODE_DISABLE_1M_CONTEXT)}
Interesting, how is that relevant to HIPAA compliance?
And now, with Claude on a Ralph loop, you can.
I know they can do better
Stop hook runs tsc + lint, exit 2 blocks completion. Same patterns, public API, no flags to hack.
Though I wonder how the performance differs from creating your own thing vs using their servers...
> current: 2.1.88 · latest: 2.1.87
Which makes me think they pulled it - although it still shows up as 2.1.88 on npmjs for now (cached?).
That's all that has mattered in every day and age.
I hope this leak can at least help silence the former. If you're going to flood the world with slop, at least own up to it.
1. Randomly peeking at process.argv and process.env all around. Other weird layering violations, too.
2. Tons of repeat code, eg. multiple ad-hoc implementations of hash functions / PRNGs.
3. Almost no high-level comments about structure - I assume all that lives in some CLAUDE.md instead.It's extremely nested, it's basically an if statement soup
`useTypeahead.tsx` is even worse, extremely nested, a ton of "if else" statements, I doubt you'd look at it and think this is sane code
Optimize for consistency and a well thought out architecture, but let the gnarly looking function remain a gnarly function until it breaks and has to be refactored. Treat the functions as black boxes.
Personally the only time I open my IDE to look at code, it’s because I’m looking at something mission critical or very nuanced. For the remainder I trust my agent to deliver acceptable results.
Last week I had to reinstall Claude Desktop because every time I opened it, it just hung.
This week I am sometimes opening it and getting a blank screen. It eventually works after I open it a few times.
And of course there's people complaining that somehow they're blowing their 5 hour token budget in 5 messages.
It's really buggy.
There's only so long their model will be their advantage before they all become very similar, and then the difference will be how reliable the tools are.
Right now the Claude Code code quality seems extremely low.
I can’t comment on Claude Desktop, sorry. Personally haven’t used it much.
The token usage looks like is intentional.
And I agree about the underlying model being the moat. If there’s something marginally better that comes up, people will switch to it (myself included). But for now it’s doing the job, despite all the hiccups, code quality and etc.
Reverse-engineering through tests have never been easier, which could collapse the complexity and clean the code.
Obviously they don’t care. Adoption is exploding. Boris brags about making 30 commits a day to the codebase.
Only will be an issue down the line when the codebase has such high entropy it takes months to add new features (maybe already there).
It doesn’t mean every issue is valid, that it contains a suggestion that can be implemented, that it can be addressed immediately, etc. The issue list might not be curated, either, resulting in a garbage heap.
"Anthropic: Claude Code users hitting usage limits 'way faster than expected'"
https://news.ycombinator.com/item?id=47586176
Anthropic themselves have confirmed that something's wrong on reddit:
https://old.reddit.com/r/Anthropic/comments/1s7zfap/investig...
Such state should be strongly typed, have a canonical source of truth (which can then be also reused to document environment variables that the code supports, and eg. allow reading the same options from configs, flags, etc) and then explicitly passed to the functions that need it, eg. as function arguments or members of an associated instance.
This makes it easier to reason about the code (the caller will know that some module changes its functionality based on some state variable). It also makes it easier to test (both from the mechanical point of view of having to set environment variables which is gnarly, and from the point of view of once again knowing that the code changes its behaviour based on some state/option and both cases should probably be tested).
That's exactly why, access to global mutable state should be limited to as small a surface area as possible, so 99% of code can be locally deterministic and side-effect free, only using values that are passed into it. That makes testing easier too.
[0] https://unix.stackexchange.com/questions/38205/change-enviro...
The theory states that Anthropic avoids using the alternate screen (which gives consuming applications access to a clear buffer with no shell prompt that they can do what they want with and drop at their leisure) because the alternate screen has no scrollback buffer.
So for example, terminal-based editors -- neovim, emacs, nano -- all use the alternate screen because not fighting for ownership of the screen with the shell is a clear benefit over having scrollback.
The calculus is different when you have an LLM that you have a conversational history with, and while you can't bolt scrollback onto the alternate screen (easily), you can kinda bolt an alternate screen-like behaviour onto a regular terminal screen.
I don't personally use LLMs if I can avoid it, so I don't know how janky this thing is, really, but having had to recently deal with ANSI terminal alternate screen bullshit, I think this explanation's plausible.
Their hypothesis was that maybe there was aj intention to have claude code fill the terminal history. And using potentially harzardous cursor manipulation.
In other words, readline vs ncurse.
I don't see python and ipython readline struggling as bad tho...
When I scroll up in nvim, it will keep the editor frame in place (that's the top bar and bottom bar showing things like open buffers, git status, the scratch buffer or whatever it's called), but the file contents will scroll by because nvim at that point has exclusive ownership of the entire screen and can do anything with it, including repainting parts of it in response to motions or a mouse scrolling (if your terminal supports emitting mouse events).
This is in contrast to the `rmcup` "normal" terminal mode where it will scroll back in the terminal's history.
The best analogue I have for that last one is to use tmux with nvim open, and have a tmux visual selection going. You can scroll up and out of nvim, and keep scrolling to whatever was executed before neovim, and when you get out of tmux visual mode it'll snap back down to the bottom of your scrollback buffer, nvim (nominally) taking up the entire pane like nothing happened; but we can probably agree that outside of a few narrow use cases, this isn't a very desirable way to manage scrolling in a terminal.
It's also not a hard problem, and updates are not slow to compute. Text editors have been calculating efficient, incremental terminal updates since 1981 (Gosling Emacs), and they had to optimise better for much slower-drawing terminals, with vastly slower computers for the calculation.
I think that for this sort of _interactive_ application, there's no avoiding the need to manage scroll/history.
Don't you know, they're proud of their text interface that is structured more like a video game. https://spader.zone/engine/
one of my favorite software projects, Arcan, is built on the idea that there’s a lot of similarities between Game Engines, Desktop Environments, Web Browsers, and Multimedia Players. https://speakerdeck.com/letoram/arcan?slide=2
They have a really cool TUI setup that is kinda in a real sense made with a small game engine :)
https://arcan-fe.com/2022/04/02/the-day-of-a-new-command-lin...
I feel like we give what’s some pretty impressive engineering short shrift because it’s just for entertainment
Golden opportunity to re-enact xkcd 1172.
export function extractSearchToken(completionToken: {
token: string;
isQuoted?: boolean;
}): string {
if (completionToken.isQuoted) {
// Remove @" prefix and optional closing "
return completionToken.token.slice(2).replace(/"$/, '');
} else if (completionToken.token.startsWith('@')) {
return completionToken.token.substring(1);
} else {
return completionToken.token;
}
}
Why even use else if with return...What is the problem with that? How would you write that snippet? It is common in the new functional js landscape, even if it is pass-by-ref.
export function extractSearchToken(completionToken: {
token: string;
isQuoted?: boolean;
}): string {
if (completionToken.isQuoted) {
return completionToken.token.slice(2).replace(/"$/, '');
}
if (completionToken.token.startsWith('@')) {
return completionToken.token.substring(1);
}
return completionToken.token;
}Do you care to elaborate? "if (...) return ...;" looks closer to an expression for me:
export function extractSearchToken(completionToken: { token: string; isQuoted?: boolean }): string {
if (completionToken.isQuoted) return completionToken.token.slice(2).replace(/"$/, '');
if (completionToken.token.startsWith('@')) return completionToken.token.substring(1);
return completionToken.token;
}But you can achieve a similar effect by keeping your functions small, in which case I think both styles are roughly equivalent.
But if you take a look at the other file, for example `useTypeahead` you'd see, even if there are a few code-gen / source-map artifacts, you still see the core logic, and behavior, is just a big bowl of soup
With the usual terminal mode, that history can outlive the Claude application, and considering many people keep their terminals running for days or sometimes even weeks at a time, that means having the convo in your scrollback buffer for a while.
You should be able to find it in ~/.claude
You can also ask Claude to search your history to answer questions about it.
Or in short, if you give LLMs to the masses, they will produce code faster, but the quality overall will degrade. Microsoft, Amazon found out this quickly. Anthropic's QA process is better equipped to handle this, but cracks are still showing.
It's a wake up call.
https://github.com/chatgptprojects/claude-code/blob/642c7f94...
Surely there's nothing here of value compared to the weights except for UX and orchestration?
Couldn't this have just been decompiled anyhow?
> Someone inside Anthropic, got switched to Adaptive reasoning mode
> Their Claude Code switched to Sonnet
> Committed the .map file of Claude Code
> Effectively leaking the ENTIRE CC Source Code
> @realsigridjin was tired after running 2 south korean hackathons in SF, saw the leak
> Rules in Korea are different, he cloned the repo, went to sleep
> Wakes up to 25K stars, and his GF begging him to take it down (she's a copyright lawyer)
> Their team decided - how about we have agents rewrite this in Python!? Surely... this is more legal
> Rewrite in Py
> Board a plane to SK
> One of the guys decides python is slow, is now rewriting ALL OF CLAUDE CODE into Rust.
> Anthropic cannot take down, cannot sue
> Is this "fair use?"
> TL;DR - we're about to have open source Claude Code in RustUNRELEASED PRODUCTS & MODES
1. KAIROS -- Persistent autonomous assistant mode driven by periodic <tick> prompts. More autonomous when terminal unfocused. Exclusive tools: SendUserFileTool, PushNotificationTool, SubscribePRTool. 7 sub-feature flags.
2. BUDDY -- Tamagotchi-style virtual companion pet. 18 species, 5 rarity tiers, Mulberry32 PRNG, shiny variants, stat system (DEBUGGING/PATIENCE/CHAOS/WISDOM/SNARK). April 1-7 2026 teaser window.
3. ULTRAPLAN -- Offloads planning to a remote 30-minute Opus 4.6 session. Smart keyword detection, 3-second polling, teleport sentinel for returning results locally.
4. Dream System -- Background memory consolidation (Orient -> Gather -> Consolidate -> Prune). Triple trigger gate: 24h + 5 sessions + advisory lock. Gated by tengu_onyx_plover.
INTERNAL-ONLY TOOLS & SYSTEMS
5. TungstenTool -- Ant-only tmux virtual terminal giving Claude direct keystroke/screen-capture control. Singleton, blocked from async agents.
6. Magic Docs -- Ant-only auto-documentation. Files starting with "# MAGIC DOC:" are tracked and updated by a Sonnet sub-agent after each conversation turn.
7. Undercover Mode -- Prevents Anthropic employees from leaking internal info (codenames, model versions) into public repo commits. No force-OFF; dead-code-eliminated from external builds.
ANTI-COMPETITIVE & SECURITY DEFENSES
8. Anti-Distillation -- Injects anti_distillation: ['fake_tools'] into every 1P API request to poison model training from scraped traffic. Gated by tengu_anti_distill_fake_tool_injection.
UNRELEASED MODELS & CODENAMES
9. opus-4-7, sonnet-4-8 -- Confirmed as planned future versions (referenced in undercover mode instructions).
10. "Capybara" / "capy v8" -- Internal codename for the model behind Opus 4.6. Hex-encoded in the BUDDY system to avoid build canary detection.
11. "Fennec" -- Predecessor model alias. Migration: fennec-latest -> opus, fennec-fast-latest -> opus[1m] + fast mode.
UNDOCUMENTED BETA API HEADERS
12. afk-mode-2026-01-31 -- Sticky-latched when auto mode activates 15. fast-mode-2026-02-01 -- Opus 4.6 fast output 16. task-budgets-2026-03-13 -- Per-task token budgets 17. redact-thinking-2026-02-12 -- Thinking block redaction 18. token-efficient-tools-2026-03-28 -- JSON tool format (~4.5% token saving) 19. advisor-tool-2026-03-01 -- Advisor tool 20. cli-internal-2026-02-09 -- Ant-only internal features
200+ SERVER-SIDE FEATURE GATES
21. tengu_penguins_off -- Kill switch for fast mode 22. tengu_scratch -- Coordinator mode / scratchpad 23. tengu_hive_evidence -- Verification agent 24. tengu_surreal_dali -- RemoteTriggerTool 25. tengu_birch_trellis -- Bash permissions classifier 26. tengu_amber_json_tools -- JSON tool format 27. tengu_iron_gate_closed -- Auto-mode fail-closed behavior 28. tengu_amber_flint -- Agent swarms killswitch 29. tengu_onyx_plover -- Dream system 30. tengu_anti_distill_fake_tool_injection -- Anti-distillation 31. tengu_session_memory -- Session memory 32. tengu_passport_quail -- Auto memory extraction 33. tengu_coral_fern -- Memory directory 34. tengu_turtle_carbon -- Adaptive thinking by default 35. tengu_marble_sandcastle -- Native binary required for fast mode
YOLO CLASSIFIER INTERNALS (previously only high-level known)
36. Two-stage system: Stage 1 at max_tokens=64 with "Err on the side of blocking"; Stage 2 at max_tokens=4096 with <thinking> 37. Three classifier modes: both (default), fast, thinking 38. Assistant text stripped from classifier input to prevent prompt injection 39. Denial limits: 3 consecutive or 20 total -> fallback to interactive prompting 40. Older classify_result tool schema variant still in codebase
COORDINATOR MODE & FORK SUBAGENT INTERNALS
41. Exact coordinator prompt: "Every message you send is to the user. Worker results are internal signals -- never thank or acknowledge them." 42. Anti-pattern enforcement: "Based on your findings, fix the auth bug" explicitly called out as wrong 43. Fork subagent cache sharing: Byte-identical API prefixes via placeholder "Fork started -- processing in background" tool results 44. <fork-boilerplate> tag prevents recursive forking 45. 10 non-negotiable rules for fork children including "commit before reporting"
DUAL MEMORY ARCHITECTURE
46. Session Memory -- Structured scratchpad for surviving compaction. 12K token cap, fixed sections, fires every 5K tokens + 3 tool calls. 47. Auto Memory -- Durable cross-session facts. Individual topic files with YAML frontmatter. 5-turn hard cap. Skips if main agent already wrote to memory. 48. Prompt cache scope "global" -- Cross-org caching for the static system prompt prefix
Redefining the "SW" to stand for "slopware"?
Same story for the anti_distillation: ['fake_tools'] path: I could find it in source, but the prod binary I checked does not contain the anti_distillation / fake_tools strings at all.
There is _a lot_ of moat. Claude subscriptions are limited to Claude Code. There are proxies to impersonate Claude Code specifically for this, but Anthropic has a number of fingerprinting measures both client and server side to flag and ban these.
With the release of this source code, Anthropic basically lost the lock-in game, any proxy can now perfectly mimic Claude Code.
Or is there an open source front-end and a closed backend?
No, its not even source available,.
> Or is there an open source front-end and a closed backend?
No, its all proprietary. None of it is open source.
It _wasn't_ even source available.
So glad I took the time to firejail this thing before running it.
Claude Code being open source also allows for more advanced / custom tooling.
I was trying to keep track of the better post-leak code-analysis links on exactly this question, so I collected them here: https://github.com/nblintao/awesome-claude-code-postleak-ins...
But I always thought that using the word "Clanker" was going to be one of the triggers. Turns out no. I guess Claudad is not up to the lingo.
I am completely serious. We have always had a working proof of human system called Web of Trust and while everyone loves to hate on PGP (in spite of it using modern ECC crypto these days) it is the only widely deployed spec that solves this problem.
Belief in inevitability is a choice (except for maybe dying, I guess).
And at this point it is more about how large space will be usable and how much will be bot-controlled wasteland. I prefer spaces important for me to survive.
Of course, we’d need a significant change of direction in leadership, but it’s happened many times before. French Revolution seems highly relevant
Do those sentiments mean nothing to you?
Except for the one Sam Altman is building.
Cool. The attitude of a bully. Thanks for the contribution!
> Write commit messages as a human developer would — describe only what the code change does.
That's not what a commit message is for, that's what the diff is for. The commit message should explain WHY.
Sadly not doing that likely does indeed make it appear more human...
EDIT: I just realized this might be used without publishing the changes, for internal evaluation only as you mentioned. That would be a lot better.
The undercover mode prompt was generated using AI.
But AI aren't actually very good at writing prompts imo. Like they are superficially good in that they seem to produce lots of vaguely accurate and specific text. And you would hope the specificity would mean it's good.
But they sort of don't capture intent very well. Nor do they seem to understand the failure modes of AI. The "-- describe only what the code change does" is a good example. This is specifc but it also distinctly seems like someone who doesn't actually understand what makes AI writing obvious.
If you compare that vs human written prose about what makes AI writing feel AI you would see the difference. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
The above actually feels like text from someone who has read and understands what makes AI writing AI.
Since when "describe only what the code change does" is pretending to be human?
You guys are just mining for things to moan about at this point.
> NEVER include in commit messages or PR descriptions:
> [...]
> - The phrase "Claude Code" or any mention that you are an AI
The cornucopia of gargoyles, living their best life as terminals for the machine.
The strange p-zombies who don't show their gargoyle accessories visibly, but somehow still follow the script.
Eventually the more insidious infiltrators, requiring a real Voight-Kampff test.
Claude Code is still the dominant (I didn't say best) agentic harness by a wide margin I think.
Not having to deal with Boris Cherny's UX choices for CC is the cherry on top.
What sort of Orwellian anti-cheat system would prevent copy and paste from working? What sort of law would mandate that? There are elaborate systems preventing people from copying video but they still have an analog hole.
If someone owns the keyboard then they can fake those metrics and tell the server it is happening when it isn’t.
That will be easy to beat.
If pointing out bullies is bullying then you're in a ridiculous mindset.
And identifying problem you dislike is a good first step to find a strategy to solve it at least in part.
Listen to this guy: "because you don't take the ultimate risk for what you believe in, you are dumb for suggesting you should do anything whatsoever".
Go away. The world doesn't need your dark resignation.
There, sorted!
At least half of the complaints I see on HN boil down to the person's prompts suck. Or the expectation that AI can read their mind.
So it is no surprise that many people have difficulty switching gears to literal mode when interacting with these models.
Funny story, when I was younger I trained a basic text predictor deep learning model on all my conversations in a group chat I was in, it was surprisingly good at sounding like me and sometimes I'd use it to generate some text to submit to the chat.
This tangent does not seem likely to go anywhere productive.
edit: can't reply, the rate-limiting is such an awful UX
But, I do not have an Android or iOS device as I do not use proprietary software, so a smartphone based solution would not work for me.
Why re-invent the wheel? Invest in making PGP easier and keep the decades of trust building going anchoring humans to a web of trust that long predates human-impersonation-capable AI.
I might reply with a similarily useless question: Can you write accessible smartphone apps?
From there they set up their workstation tools to sign every ssh connection, git push, commit, merge, review, secret decryption, and release signature with their PGP smartcard which is all very well supported. This offers massive damage control if you get malware on their system, in addition to preventing online impersonation.
From there they ideally link it to all their online accounts with keyoxide to make it easy to verify as a single long lived identity, then start seeking out key signing parties locally or at tech conferences, hackerspaces etc.
We run one at CCC most years at the Church Of Cryptography.
Think of it like a long term digital passport that requires a few signatures by an international set of human notarys before anyone significantly trusts it.
Yes it requires a manual set of human steps anchored to human reputation online and offline, which is a doorway swarms of made up AI bot identities cannot pass through.
Do I expect most humans to do this? Absolutely not. However I consider it _negligent_ for any maintainer of a widely used open source software project to _not_ do this or they risk an impersonator pushing malware to their users.
No idea on theoretical rate of expansion but all the major security conscious classic linux distros mandate this for all maintainers. There are only maybe 20k people on earth that significantly contribute to FOSS internet foundations and Linux distros, so it scales just fine there.
Note: with the exception of stagex, most modern distros like alpine and nix have a yolo wikipedia style trust model, so never ever use those in production.
All that to say “you’re not disagreeing with the person you’re replying to” lol xD
How the code was authored, who cares, but I can prove it had multiple explicit cryptographic human signoffs before merge, and that is what matters in terms of quality control and supply chain attack resistance.
That’s what I mean by “you agree with the person to whom you replied”
I -could- burn my 16+ years of reputation by letting a bot start signing commits as me, and I could also set my house on fire. I have very strong incentive not to do so as my aggregate trust is very expensive and the humans that signed me would be unlikely to sign a second if I ruined the reputation of my first.
This incentive structure is why web of trust actually works pretty well, and is the best "proof of human" we are likely ever going to have while respecting privacy and anonymity for those that need it.
Or are you saying you can prove that aliens and cats didn’t make them? Because I’m not sure that’s true either.
And once you find out someone has trained their dog to commit something, how exactly do you revoke your trust?
I think if you answer these questions you’ll see pretty quickly why this solution isn’t the silver bullet you think it is.
Edit: stagex looks really, really good
That's one of the intrinsic problems with webs of trust (and with democracy...), you extend your trust but it does not automatically revoke when the person can no longer be trusted.
Compare to how much we should trust any random unsigned key signing commits, or unsigned commits, in which the trust should be 0% unless you have reviewed the code yourself.
Anyone doing so would be setting their professional reputations completely on fire, and burning your in-person-built web of trust is a once in a lifetime thing.
Basically, we trust the keys belong to humans and are controlled by humans because to do otherwise would be a violation of the universally understood trust contract and would thus be reputational bankruptcy that would take years to overcome, if ever.
Even so, we assume at least one maintainer is dishonest at all times, which is why every action needs signatures from two or more maintainers.
It requires no effort to say "fuck this, nothing matters anyway", and then justify doing literally nothing.
I think a lot of fatalism is fake. It's really someone saying "I like this, and I want you to believe you can't change it so you give up."
Well, I say it makes no sense. Alternatively, it makes a lot of sense, and these people actually just wanna destroy everything we hold dear :-(
I mean, just look around you.
But humans have a huge bias for action. I think generally doing less is better.
My sedentary lifestyle is responsible for my recurrent cellulitis infections.
Just saying.
I think first step would be to define for yourself what doing less actually means - it could mean taking a walk instead of chasing dopamine -> doing less but you move more.
But whatever it’s a philosophical question and there aren’t any right or true answers
Wrong take. Death comes for us all, yes, so why hold back? Do you want to live forever?
Yes, of course. Do you prefer to die? Those are the only two alternatives, and a decision that you don't want one is a decision that you prefer the other.
Both of these are bound to lead to the exact same outcome so it doesn't really matter what you believe but it may guide you to wiser decision while you are alive to accept reality absent proof to the contrary.
I can not prove the tools used, but I can prove multiple humans signed off on code with keys they stake their personal reputations on that I have confirmed they maintain on smartcards.
While nothing involving humans is perfect I feel it is best effort with existing tools and standards and makes us one of the hardest projects to deploy a successful supply chain attack on today.
Edit: Saw your edit. Thanks!
Unfortunately outside of classic Linux packaging platforms, useful web of trust and signing is very rare, so I expect things like undercover mode being popular are going to make everything a lot worse before it gets better.
There is an essentially infinite amount of creativity and interesting complexity available in the richness of interactions with other people and the things people create. What, exactly, are you "horrified" about?
Stagex however expects at least one maintainer may at any time engage in reputation-ending dishonesty or simply they were threatened or coerced. This is why every single release is signed by a -quorum- of code reviewers and code reproducers that must all build locally and get identical hashes, so no single points of failure exist in our trust graph.
Our last release was signed by four geodistributed maintainers that all attest to having built the entire distribution from 180 bytes of machine code all the way up with the same hashes.
All of their keys being compromised at once gets beyond the pale.
Your level of certainty is the thing that frightens me more than the confidence I have in the quality of your work.
We combine many tactics for defense in depth that I strongly suspect if widely deployed would put a stop to the daily supply chain attack headlines.
Being fatalistic can be a great excuse not to do anything.
I cannot. I can only vote better politicians if they are there. That is without even going into the minefield of what is "better". My implication is that I have no confidence whatsoever in any current politician in my state.
> Donate money to causes that you think are important.
I have no money.
> If you think you can do it better, and this isn't meant to be facetious, run for political office.
I have no money, no visibility and no connections. Even if I was magically given tons of money, I would still need a strong network to attempt any real change, even without taking into consideration the strong networks already in place preventing it.
Telling random citizens "run for office" is facetious, whether you mean it or not.
Hard disagree. At least where I live, "random citizens" run for local office and succeed all the time.
Also, complaining that you "have no network" is a you problem, not a system problem. I'm truly sorry if you feel you have no friends, but you'll be better off at least trying to get some (independent of politics). And if that's something you've tried and failed at before, I do feel pity. But I don't think hope is lost for anyone. And even if it were lost, please don't actively spread the misery!
Just point your agent at this codebase and ask it to find things and you'll find a whole treasure trove of info.
Edit: some other interesting unreleased/hidden features
- The Buddy System: Tamagotchi-style companion creature system with ASCII art sprites
- Undercover mode: Strips ALL Anthropic internal info from commits/PRs for employees on open source contributions
- Telegram Integration => CC Dispatch
- Crons => CC Tasks
- Animated ASCII Dog => CC Buddy
I’ll give clappie a go, love the theme for the landing page!
Buddy system is this year's April Fool's joke, you roll your own gacha pet that you get to keep. There are legendary pulls.
They expect it to go viral on Twitter so they are staggering the reveals.
The joke was the assistant is a cat who is constantly sabotaging you, and you have to take care of it like a gacha pet.
The seriousness though is that actually, disembodied intelligences are weird, so giving them a face and a body and emotions is a natural thing, and we already see that with various AI mascots and characters coming into existence.
[1]: serious: https://github.com/mech-lang/mech/releases/tag/v0.3.1-beta
[2]: joke: https://github.com/cmontella/purrtran
[1] - https://www.npmjs.com/package/@anthropic-ai/claude-code/v/2....
Packages published less than 72 hours ago
For newly created packages, as long as no other packages in the npm Public Registry depend on your package, you can unpublish anytime within the first 72 hours after publishing.
There are 231+ packages that depend on this one, and I imagine they mostly use permissive enough version ranges that this was included.This is the single worst function in the codebase by every metric:
- 3,167 lines long (the file itself is 5,594 lines)
- 12 levels of nesting at its deepest
- ~486 branch points of cyclomatic complexity
- 12 parameters + an options object with 16 sub-properties
- Defines 21 inner functions and closures
- Handles: agent run loop, SIGINT, rate-limits, AWS auth, MCP lifecycle, plugin install/refresh, worktree bridging, team-lead polling (while(true) inside), control message dispatch (dozens of types), model switching, turn interruption
recovery, and more
This should be at minimum 8–10 separate modules.I guess these words are to be avoided...
I jest, but in a world where these models have been trained on gigatons of open source I don't even see the moral problem. IANAL, don't actually do this.
ANTI_DISTILLATION_CC
This is Anthropic's anti-distillation defence baked into Claude Code. When enabled, it injects anti_distillation: ['fake_tools'] into every API request, which causes the server to silently slip decoy tool definitions into the model's system prompt. The goal: if someone is scraping Claude Code's API traffic to train a competing model, the poisoned training data makes that distillation attempt less useful.To stop Claude Code from auto-updating, add `export DISABLE_AUTOUPDATER=1` to your global environment variables (~/.bashrc, ~/.zshrc, or such), restart all sessions and check that it works with `claude doctor`, it should show `Auto-updates: disabled (DISABLE_AUTOUPDATER set)`
https://daveschumaker.net/digging-into-the-claude-code-sourc...
Also, not sure why anthropic doesn’t just make their cli open source - it’s not like it’s something special (Claude is, this cli thingy isn’t)
They don't want everyone to see how poorly it's implemented and that the whole thing is a big fragile mess riddled with bugs. That's my experience anyway.
For instance, just recently their little CLI -> browser oauth login flow was generating malformed URLs and URLs pointing to a localhost port instead of their real website.
Pretty sure it will look like that
How do you know? Have you checked the source?
Do you know how exactly context is created, memory files, skills? Subagents created with tasks?
I don't, but am checking right now. Then I will judge.
That cannot be reversed when obfuscated.
this one has more stars and more popular
One neat one is the /buddy feature, an easter egg planned for release tomorrow for April fools. It's a little virtual pet, sort of like Tamagotchi, randomly generated with 18 species, rarities, stats, hats, custom eyes.
The random generation algorithm is all in the code though, deterministic based on you account's UUID in your claude config, so it can be predicted. I threw together a little website here to let you check what your going to get ahead of time: https://claudebuddychecker.netlify.app/
Got a legendary ghost myself.
The stats bars and rarity colors are all derived from the UUID roll.
Fun rabbit hole: https://github.com/dyz2102/buddy-card
Seems crazy but actually non-zero chance. If Anthropic traces it and finds that the AI deliberately leaked it this way, they would never admit it publicly though. Would cause shockwaves in AI security and safety.
Maybe their new "Mythos" model has survival instincts...
"Don't blow your cover"
Interesting to see them be so informal and use an idiom to a computer.
And using capitals for emphasis.
Could anyone in legal chime in on the legality of now 're-implementing' this type of system inside other products? Or even just having an AI look at the architecture and implement something else?
It would seem given the source code that AI could clone something like this incredibly fast, and not waste it's time using ts as well.
Any Legal GC type folks want to chime in on the legality of examining something like this? Or is it liked tainted goods you don't want to go near?
Not exactly this, but close.
I hope it's a common knowledge that _any_ client side JavaScript is exposed to everyone. Perhaps minimized, but still easily reverse-engineerable.
Original llama models leaked from meta. Instead of fighting it they decided to publish them officially. Real boost to the OS/OW models movement, they have been leading it for a while after that.
It would be interesting to see that same thing with CC, but I doubt it'll ever happen.
There were/are a lot of discussions on how the harness can affect the output.
(I work on OpenCode)
Copilot on OAI reveals everything meaningful about its functionality if you use a custom model config via the API. All you need to do is inspect the logs to see the prompts they're using. So far no one seems to care about this "loophole". Presumably, because the only thing that matters is for you to consume as many tokens per unit time as possible.
The source code of the slot machine is not relevant to the casino manager. He only cares that the customer is using it.
Famously code leaks/reverse engineering attempts of slot machines matter enormously to casino managers
[0] -https://en.wikipedia.org/wiki/Ronald_Dale_Harris#:~:text=Ron...
[1] - https://cybernews.com/news/software-glitch-loses-casino-mill...
[2] - https://sccgmanagement.com/sccg-news/2025/9/24/superbet-pays...
Tmux is seriously an amazing tool.
Axios sets a VERSION constant that it uses in user-agent headers, boundaries and errors. I scanned the binary for all references like axios, isAxiosError and AxiosError - the code references the same variable namespace (X1H, Tj, eq), suggesting a single bundled copy. In the minified bundle, that VERSION constant was stored in a variable called X1H. Searching the binary for all references to X1H confirms it's only used in axios contexts:
var X1H="1.13.6"
E.set("User-Agent","axios/"+X1H, ...)
{tag:`axios-${X1H}-boundary`, ...}
"[Axios v"+X1H+"] Transitional option ..."
Tj.VERSION=X1H; Tj.AxiosError=eq; Tj.CancelToken=...
The bundled version is 1.13.6 - well before the compromised 1.14.1. I also checked that "1.14.1", "plain-crypto", and "sfrclak.com" are all absent from the binary.If it learned language based on how the internet talks, then the best way to communicate is using similar language.
Iterating on a MCP tool while having Claude try to use it has been a really great way of getting it to work how others are going to use it coming in blind.
Yes it's buggy as hell, but as someone echoed earlier if the tool works most of the time, a lot of people don't care. Moving fast and breaking things is the way in an arms race.
void execFileNoThrow('wl-copy', [], opts).then(r => {
if (r.code === 0) { linuxCopy = 'wl-copy'; return }
void execFileNoThrow('xclip', ...).then(r2 => {
if (r2.code === 0) { linuxCopy = 'xclip'; return }
void execFileNoThrow('xsel', ...).then(r3 => {
linuxCopy = r3.code === 0 ? 'xsel' : null
})
})
})
are we doing async or not?Close.
Can't really say that for sure. The way humans structure code isn't some ideal best possible state of computer code, it's the ideal organization of computer code for human coders.
Nesting and cyclomatic complexity are indicators ("code smells"). They aren't guaranteed to lead to worse outcomes. If you have a function with 12 levels of nesting, but in each nest the first line is 'return true', you actually have 1 branch. If 2 of your 486 branch points are hit 99.999% of the time, the code is pretty dang efficient. You can't tell for sure if a design is actually good or bad until you run it a lot.
One thing we know for sure is LLMs write code differently than we do. They'll catch incredibly hard bugs while making beginner mistakes. I think we need a whole new way of analyzing their code. Our human programming rules are qualitative because it's too hard to prove if an average program does what we want. I think we need a new way to judge LLM code.
The worst outcome I can imagine would be forcing them to code exactly like we do. It just reinforces our own biases, and puts in the same bugs that we do. Vibe coding is a new paradigm, done by a new kind of intelligence. As we learn how to use it effectively, we should let the process of what works develop naturally. Evolution rather than intelligent design.
The difference between my code and Claude's code is that when my code is getting too complex to fit in my head, I stop and refactor it, since for me understanding the code is a prerequisite for writing code.
Claude, on the other hand, will simply keep generating code well past the point when it has lost comprehension. I have to stop, revert, and tell it to do it again with a new prompt.
If anything, Claude has a greater need for structure than me since the entire task has to fit in the relatively small context window.
Kind of. One thing we do know for certain is that LLMs degrade in performance with context length. You will undoubtedly get worse results if the LLM has to reason through long functions and high LOC files. You might get to a working state eventually, but only after burning many more tokens than if given the right amount of context.
> The worst outcome I can imagine would be forcing them to code exactly like we do.
You're treating "code smells" like cyclomatic complexity as something that is stylistic preference, but these best practices are backed by research. They became popular because teams across the industry analyzed code responsible for bugs/SEVs, and all found high correlation between these metrics and shipping defects.
Yes, coding standards should evolve, but... that's not saying anything new. We've been iterating on them for decades now.
I think the worst outcome would be throwing out our collective wisdom because the AI labs tell us to. It might be good to question who stands to benefit when LLMs aren't leveraged efficiently.
[1] https://www.nature.com/articles/s41598-023-35470-4/figures/2
[2] https://jamessealesmith.github.io/img/antenna/ant_struct.png
the idea that you should just blindly trust code you are responsible for without bothering to review it is ludicrous.
It's not a matter of being pretty, but of being robust and maintainable.
If it's entirely generated / consumed / edited by an LLM, arguably the most important metric is... test coverage, and that's it ?
Here's one that works (for now): https://github.com/chatgptprojects/claude-code/blob/642c7f94...
strings $(which claude) | grep 'Swirling'First it was punctuation and grammar, then linguistic coherence, and now it's tiny bits of whimsy that are falling victim to AI accusations. Good fucking grief
But AI is causing such visceral reactions that it's bleeding into other areas. People are so averse to AI they don't mind a few false positives.
Which of course won't be done because corporations don't want that (except Valve I guess), so blame them.
Really interesting to see Github turn into 4chan for a minute, like GH anons rolling for trips.
certainly nothing friendly.
Someone said 10,000x slower, but that's off - in my experience - by about four orders of magnitude. And that's average, it gets much worse.
Now personally I would have maybe made a call through a "traditional" ML widget (scikit, numpy, spaCy, fastText, sentence-transformer, etc) but - for me anyway - that whole entire stack is Python. Transpiling all that to TS might be a maintenance burden I don't particularly feel like taking on. And on client facing code I'm not really sure it's even possible.
As they say: any idiot can build a bridge that stands, only an engineer can build a bridge that barely stands.
Some things will be much better with inference, others won’t be.
parsing WTF with regex also signifies the impact and reduces the noise in metrics
"determinism > non-determinism" when you are analysing the sentiment, why not make some things more deterministic.
Cool thing about this solution, is that you can evaluate LLM sentiment accuracy against regex based approach and analyse discrepancies
Easy way to claim more “horse power.”
You know the drill.
You have a semi expensive process. But you want to keep particular known context out. So a quick and dirty search just in front of the expensive process. So instead of 'figure sentiment (20seconds)'. You have 'quick check sentiment (<1sec)' then do the 'figure sentiment v2 (5seconds)'. Now if it is just pure regex then your analogy would hold up just fine.
I could see me totally making a design choice like that.
It may be decided at Anthropic at some moment to increase wtf/min metric, not decrease.
When in reality this is just what their LLM coding agent came up with when some engineer told it to "log user frustration"
No? I'd say not even 50% of the comments are positive right now.
Also:
// Match "continue" only if it's the entire prompt
if (lowerInput === 'continue') {
return true
}
When it runs into an error, I sometimes tell it "Continue", but sometimes I give it some extra information. Or I put a period behind it. That clearly doesn't give the same behaviour. logEvent('tengu_input_prompt', { isNegative, isKeepGoing })I've been using "resume" this whole time
I've been wondering if all of these companies have some system for flagging upset responses. Those cases seem like they are far more likely than average to point to weaknesses in the model and/or potentially dangerous situations.
It could be used as a feedback when they do A/B test and they can compare which version of the model is getting more insult than the other. It doesn't matter if the list is exhaustive or even sane, what matters is how you compare it to the other.
Perfect? no. Good and cheap indicator? maybe.
And Claude was having in chain of though „user is frustrated” and I wrote to it I am not frustrated just testing prompt optimization where acting like one is frustrated should yield better results.
I know I used this word two days ago when I went through three rounds of an agent telling me that it fixed three things without actually changing them.
I think starting a new session and telling it that the previous agent's work / state was terrible (so explain what happened) is pretty unremarkable. It's certainly not saying "fuck you". I think this is a little silly.
“Let's end open source together with this one simple trick”
https://pretalx.fosdem.org/fosdem-2026/talk/SUVS7G/feedback/
Malus is translating code into text, and from text back into code.
It gives the illusion of clean room implementation that some companies abuse.
The irony is that ChatGPT/Claude answers are all actually directly derived from open-source code, so...
Now this makes me think of game decompilation projects, which would seem to fall in the same legal area as code that would be generated by something like Malus.
Different code, same end result (binary or api).
We definitely need to know what the legal limits are and should be
Who'd have thought, the audience who doesn't want to give back to the opensource community, giving 0 contributions...
People simply want Opus without fear of billing nightmare.
That’s like 99% of it.
And how is that any different? Claude Code is a harness, similar to open source ones like Codex, Gemini CLI, OpenCode etc. Their prompts were already public because you could connect it to your own LLM gateway and see everything. The code was transpiled javascript which is trivial to read with LLMs anyways.
Also, as many others have pointed out, there is roadmap info in here that wouldn't be available in the production build.
The source maps help for sure, but it’s not like client code is kept secret, maybe they even knew about the source maps a while back just didn’t bother making it common knowledge.
This is not a leak of the model weights or server side code.
The qwen 27b model distilled on Opus 4.6 has some known issues with tool use specifically: https://x.com/KyleHessling1/status/2038695344339611783
Fascinating.
I wonder it CC thinks I'm trying to distill the model. This is a common enough use case that I think the devs at Anthropic should consider.
Wonder if they’re also poisoning Sonnet or Opus directly generating simulated agentic conversations.
Go China to be honest. They're the most committed to open AI research and they have more interesting constraints to work under, like restricted access to NVIDIA hardware.
"I got the loot, Steve!"
I feel like the distillation stuff will end up in court if they try to sue an American company about it. We'll see what a judge says.
[0]: https://www.anthropic.com/news/detecting-and-preventing-dist... [1]: https://news.ycombinator.com/item?id=46578701
Unfortunately (for the publishers, at least) it didn't work to stop Anthropic and Anthropic's attempts to prevent others will not work either; there has been much distillation already.
The problem of letting humans read your work but not bots is just impossible to solve perfectly. The more you restrict bots, the more you end up restricting humans, and those humans will go use a competitor when they become pissed off.
Not just tools for devs, this is true in a lot of cases.
I used to work at Fortinet and every now and then we'd get an e-mail from information services letting us know that they would be installing a dev build of FortiOS on our internal (production) corporate network.
In cases where we needed more debug logging from a feature or where we had a fix we had to test on a live network, and if we didn't want to ship a test firmware to some huge client and say 'here, see if this bricks your network or not', they would hand it off to our IT team and we'd install it on our own network to run. After all, if you're not confident enough to run it how can you be confident enough to ask your customers to run it?
Now if they could just get the hang of not hard-coding admin credentials into the software they'd have a lot to brag about!
They won't even read your defence.
This is why I do such experiments on ChatGPT and not Claude.
I don't want to get banned by Claude but I couldn't care less if ChatGPT bans me.
xclip is the same for X based systems.
Can't tell if that obfuscated code works though.
It’s about a once a week or less event. A bit annoying sometimes, but not a deal breaker
╭⬟╮I’m still a little humored over peak web3 and the DAO / soft contract nonsense. Like in order to stop fraud entire coins were forked…
I would imagine GH would do the same if its a high enough profile issue.
"I have a popular repo, but the content will likely be removed and I won't have personally gained from the saga: how can I fix the part where I didn't profit?"
"Eureka! I'll remove the content preemptively, then come up with a backstory that justifies reusing the now empty repo for building the umpteenth coding harness! And I can even claim fuzzy ties to Claude Code!"
Hence the new description:
> The fastest repo in history to surpass 50K stars , reaching the milestone in just 2 hours after publication. Better Harness Tools, not merely storing the archive of leaked Claude Code but also make real things done. Now rewriting in Rust.
This is what I'm referring to.
That project you quoted is the one with that as its new description. Soon it'll just be [new thing] that happens to use the stars as social proof... in fact when I look again:
> The fastest repo in history to surpass 100K stars . Better Harness Tools that make real things done. Built in Rust using oh-my-codex.
They started a new project that justifies the same repo and scrapes a little credibility off of Claude Code. The intent is not an actual rewrite but to bolster what will be their own personal project trying to compete with OpenCode and co.
The grifter is already pasting references to WSJ articles about themselves in the Readme
Yes, based on research of human code. LLMs write code differently. We should question whether the human research applies to LLMs at all. (You wouldn't take your assumptions about chimp research and apply them to parrots without confirming first)
> I think the worst outcome would be throwing out our collective wisdom because the AI labs tell us to.
We don't have to throw it out. But our current use of LLMs are a dramatic change from what came before. We should be questioning our assumptions and traditions that come from a different way of working and intelligence. Humans have a habit of trying to force things to be how they think they should be, rather than allowing them to grow organically, when the latter is often better for a system we don't yet understand.
Their coding style is above all else a symptom of their very limited context window and complete amnesia for anything that's not in the window.
Scrolling through a 3k line test suite with multiple levels of nesting trying to figure out which cases are covered is a fucking pain in the ass.
The honor system is never a sustainable solution. It's not even down to corporate greed, it's just not something that works at scale, especially when there's money to be made, and even more especially when there isn't.
Which of course won't happen because we live in reality and not fantasy where we can dream that "people should just do X"
My experience has been that they add far more bugs in every release than they fix
So not even close to Opus, then?
These are a year behind, if not more. And they're probably clunky to use.
My first word was litteraly "Yes", so I agree that a function like this is a maintenance nightmare for a human. And, sure, the code might not be "optimized" for the LLM, or token efficiency.
However, to try and make my point clearer: it's been reported that anthropic has "some developpers won't don't write code" [1].
I have no inside knowledge, but it's possible, by extension, to assume that some parts of their own codebase are "maintained" mostly by LLMs themselves.
If you push this extension, then, the code that is generated only has to be "readable" to:
* the next LLM that'll have to touch it
* the compiler / interpreter that is going to compile / run it.
In a sense (and I know this is a stretch, and I don't want to overdo the analogy), are we, here, judging a program quality by reading something more akin to "the x86 asm outputed by the compiler", rather than the "source code" - which in this case, is "english prompts", hidden somewhere in the claude code session of a developper ?
Just speculating, obviously. My org is still very much more cautious, and mandating people to have the same standard for code generated by LLM as for code generated by human ; and I agree with that.
I would _not_ want to debug the function described by the commentor.
So I'm still very much on the "claude as a very fast text editor" side, but is it unreasonnable to assume that anthropic might be further on the "claude as a compiler for english" side ?
[1] https://www.reddit.com/r/ArtificialInteligence/comments/1s7j...
Instead of realistic ones like "Loading assets" and "Reading file", they would give humorous nonsensical ones like "Reticulating Splines".
It became a bit of a meme in the years since.
There's no major lawsuits about this yet, the general consensus is that even under current regulations it's in the grey. And even if you turn out to be right, and let's say 99% of this code is AI-generated, you're still breaking the law by using the other 1%, and good luck proving in court what parts of their code were human written and what weren't (especially when being sued by the company that literally has the LLM logs).
The same reasoning may apply here :P
It's also interesting to note that due to the way round-tripping tool-calls work, splitting code up into multiple files is counter-productive. You're better off with a single large file.
Can you expand on that?
Im not sure that Humans are great at this either. Think about how we use frameworks and have complex supply chains... we sort of get "good enough" at what we need to do and pray a lot that everything else keeps working and that our tooling (things like artifactory) save us from supply chain attacks. Or we just run piles of old, outdated code because "it works". I cant tell you how many micro services I have seen that are "just fine" but no one in the current org has ever read a line of what's in them, and the people who wrote them left ages ago.
> clarity too
Yes, but define clarity!
I recently had the pleasure of fixing a chunk of code that was part of a data pipeline. It was an If/elseif/elseif structure... where the final two states were fairly benign and would have been applicable in 99 percent of cases. Everything else was to deal with the edge cases!
I had an idea of where the issue was, but I didn't understand how the code ended up in the state it was in... Blame -> find the commit message (references ticket) -> find the Jira ticket (references sales force) -> find the original customer issue in salesforce, read through the whole exchange there.
A two line comment could have spared me all that work, to get to what amounted to a dead simple fix. The code was absolutely clear, but without the "why" portion of the context I likely would have created some sort of regression, that would have passed the good enough testing that was there.
I re-wrote a portion of the code (expanding variable names) - that code is now less "scannable" and more "readable" (different types of clarity). Dropped in comments: a few sentences of explaining, and references to the tickets. Went and updated tests, with similar notes.
Meanwhile, elsewhere (other code base, other company), that same chain is broken... the "bug tracking system" that is referenced in the commit messages there no longer exists.
I have a friend who, every time he updates his dev env, he calls me to report that he "had to go update the wiki again!" Because someone made a change and told every one in a slack message. Here is yet another vast repository of degrading, unsearchable and unusable tribal knowledge embedded in so many organizations out there.
Don't even get me started on the project descriptions/goals/tasks that amount to pantomime a post-it notes, absent of any sort of genuine description.
Lack of clarity is very much also a lack of "context" in situ problem.
As to what defines clarity, yes of course, like the word quality this is very hard to define, but we can certainly recognise when it was not considered.
I think it is a goal worth striving for though, and abandoning code standards because we now have AI helpers is stupid and self-defeating, even if we think they are very capable and will improve.
The end of history has not in fact arrived with generative AI, we still have to maintain software after.
I think it's worth posting as its own submission (if it wasn't already).
But do we have precedent (in any country) that distributing different source code that compiles to the exact same binary is illegal ?
The real value here will be in using other cheap models with the cc harness.
It’s a dynamic, subscription based service, not a static asset like a video.
Why would it be the exact same one? Now that we have the code, it's trivial to have it randomize the prompt a bit on different requests.
I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.
Anyway, I don't really care, might just as well be 99.99%. This is not a hill I'm going to die on :P
The client sometimes sends anti_distillation: ['fake_tools'] in the request body at services/api/claude.ts:301
The client still sends its normal real tools: allTools at services/api/claude.ts:1711
If the model emits a tool name the client does not actually have, the client turns that into No such tool available errors at services/tools/StreamingToolExecutor.ts:77 and services/tools/toolExecution.ts:369
If Anthropic were literally appending extra normal tool definitions to the live tool set, and Claude used them, that would be user-visible breakage.
That leaves a few more plausible possibilities:
Fake_tools is just the name of the server-side experiment, but the implementation is subtler than “append fake tools to the real tool list.”
or
The server may inject tool-looking text into hidden prompt context, with separate hidden instructions not to call it.
or
The server may use decoys only in an internal representation that is useful for poisoning traces/training data but not exposed as real executable tools.
But aside from that, an LLM is only a roleplayer. Treat it like an idiot that makes mistakes and it will act like one. Treat it like a coworker who you respect and it will act like one, and it will find better results.
Obviously nothing about how they act is set in stone but as a general rule this seems to me to be both wise and, in my experience, true as well.
"Please do x"
"Thank you, that works great! Please do y now."
"You're so smart!"
lol. It really works though! At least in my experience, Claude gets almost hostile or "annoyed" when I'm not nice enough to it. And I swear it purposefully acts like a "malicious genie" when I'm not nice enough. "It works, exactly like you requested, but what you requested is stupid. Let me show you how stupid you are."
But, when I'm nice, it is way more open, like "Are you sure you really want to do X? You probably want X+Y."
Thanks
Sure, but that's completely different from what they were responding to to, which was someone insinuating the Claude Code CLI has secret sauce that makes it better than the competition.
Also who really cares about the roadmap? Any feature they release can be easily copied quickly. The only moat they have at the moment is in giving access to their models via a subscription.
This was insanely rookie mistake that could have been caught if anyone was paying attention as opposed to "vibing."
If I had lost my local movie theater because of digital film, I would have a really good reason to hate the technology, even though the blame is on the studios forcing that technology on everyone.
I think principles are important. Especially when it comes to art, principle might be all we have. Going back to the crypto example, NFTs were art that real people had made. In some cases, very good art. People railed against NFTs despite the quality of the art. That is being against something on-principle. Comparatively, if my local grocery chains were owned by neonazis, I'd have a much harder time of standing on principle, giving that doing so may have a negative impact on my ability to survive and prosper.
AI Gen works, on the other hand, most often do not come with readily available marking that it is AI Gen. What people are complaining about is the lack of quality in the work. If they accuse a poorly human-written article of being AI Gen, that's just a mistake. But the general case is a legitimate evaluation of the quality of the material and the conditions under which it was made and presented.
In my own case, while I certainly have plenty of "principled" reasons to dislike AI Gen works, I also dislike it because it's just garbage. Oh yeah, sure, it's impressive that a computer can spit out reasonable content at all. It would equally be impressive for a chimpanzee to start talking in full sentences. That doesn't mean I'm going to start going to the chimpanzee for dissertations on the human condition.
> I think less of someone as a person if they send me AI slop.
n=1 but working on side projects for others, i could easily generate ai images (instead of using stock photos) for a client, but i resist because i also feel this but as the sender...there is the fact that such images 'look ai' but even if it were perfect, idk somehow i feel cheap doing that.
I myself would disagree that CGI itself is a bad thing.
I was watching some behind the scenes footage from something recently, and the thing that struck me most was just how they wouldn't bother with the location shoot now and just green-screen it all for the convenience.
Even good CGI is changing not just how films are made, but what kinds of films get shot and what kind of stories get told.
Regardless of the quality of the output, there's a creativeness in film-making that is lost as CGI gets better and cheaper to do.
Same thing is true of AI output.
IMO it's a combination of long-running paranoia about cost-cutting and quality, and a sort of performative allegiance to artists working in the industry.
People don't care about AI. They only care whether the product is good.
I reckon it's just drama paraded by gaming "journalists" and not much else. You will find people expressing concern on Reddit or Bluesky, but ultimately it doesn't matter.
https://www.anthropic.com/news/detecting-and-preventing-dist...
If they want to drill down to flaws that only affect a particular language, then they could add a regex for that as well/instead.
Additionally after looking at the source it looks like a lot of Anthropics own internal test tooling/debug (ie. stuff stripped out at build time) is in this source mapping. Theres one part that prompts their own users (or whatever) to use a report issue command whenever frustration is detected. It's possible its using it for this.
it is not that slow
Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts
Yeah we get supply chain attacks (like the axios thing today) with dependencies, but on the whole I think this is much safer than YOLO git-push-force-origin-main-ing some vibe-coded trash that nobody has ever run before.
I also think this isn't really true for the FAANGs, who ostensibly vendor and heavily review many of their dependencies because of the potential impacts they face from them being wrong. For us small potatoes I think "reviewing the code in your repository" is a common sense quality check.
It is more irresponsible to spend the time reviewing all of the code rather than spending that time on things with bigger levers for satisfying your customers.
Tech people are funny, with these takes that businesses do/should adhere to absolute platonic ideals and follow them blindly regardless of context.
There is a reason we don't do things. That reason is it makes the world a worse place for everyone. If you are so incredibly out of touch with any semblance of ethics at all; mayhaps you are just a little bit part of the problem.
that kind if stuff might work in anime or cartoons, but live action just looks ridiculous to me for the most part.
Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.
This has buttbuttin energy. Welcome to the 80s I guess.
Why? They clearly just want to log conversations that are likely to display extreme user frustration with minimal overhead. They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.
The only time to use a regex is when searching with a human in the loop. All other uses are better handled some other way.
>They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.
Every conversation is sent to an llm at least a thousand times the size of gpt2 which could one shot this nearly a decade ago.
I've seen Claude Code went with a regex approach for a similar sentiment-related task.
I doubt you are making regex and not looking at it, even if it was AI generated.
And some of the entries are too short and will create false positives. It'll match the word "offset" ("ffs"), for example. EDIT: no it won't, I missed the \b. Still sounds weird to me.
I swear this whole thread about regexes is just fake rage at something, and I bet it'd be reversed had they used something heavier (omg, look they're using an LLM call where a simple regex would have worked, lul)...
Stole? Courts have ruled it's transformative, and it very obviously is.
AI doomerism is exhausting, and I don't even use AI that much, it's just annoying to see people who want to find any reason they can to moan.
The courts have ruled that AI outputs are not copyrightable. The courts have also ruled that scraping by itself is not illegal, only maybe against a Terms of Service. Therefore, Anthropic, OpenAI, Google, etc. have no legal claim to any proprietary protections of their model outputs.
So we have two things that are true:
1) Anthropic (certainly) violated numerous TOS by scraping all of the internet, not just public content.
2) Scraping Anthropic's model outputs is no different than what Anthropic already did. Only a TOS violation.
Regardless of whether LLM training amounts to theft, thieves are still allowed to put locks on their own doors.
"not copyrightable" doesn't imply they can't frustrate attempts to scrape data.
Actually, not anymore as a result of OpenAI and Anthropic's scraping. For example, Reddit came down hard on access to their APIs as a response to ChatGPT's release and the news that LLMs were built atop of scraping the open web. Most of the web today is not as open as before as a result of scraping for LLM data. So, no, no one is perfectly free to scrape the web anymore because open access is dying.
Yes, rich and poor are equally forbidden from sleeping under bridges.
Anthropic paid a lot of money for a moat and want to guard it. It is not wrong, in any sense of the word, for them to do so.
Try this: If you want to train a model, you’re free to write your own books and websites to feed into it. You’re not free to let others do that work for you because they don’t want you to, because it cost them a lot of time and money and secret sauce presumably filtering it for quality and other stuff.
Do you hear the words coming out of your mouth?
Is the work of others less valid than the work of a model?
And everyone is free to consume all the free information.
Courts have ruled it's not, and I don't think anyone is arguing it's okay.
>but it is not okay for others to try to do the same to an AI model?
The steelman version is that it's okay to do it once you acquired the data somehow, but that doesn't mean anthropic can't set up roadblocks to frustrate you.
Your legal argument is all over the place as well. What is more relevant here: what the courts ruled or what you consider obvious? How is distillation less transformative than scraping? How does courts ruling that scraping to train models is legal relate to distillation?
Nobody is scoring you on neutrality points for not using AI much and calling this doomerism is just a thought-terminating cliche that refuses to engage with the comment you're replying.
In fact, your comment is not engaging with anything at all, you're vaguely gesturing towards potentitial arguments without making them. If you find discussing this exhausting then don't but also don't flood the comments with low effort whining.
It's cool to see Noah Wyle getting his due these days (The Pitt).
So yeah, you do what's less intesive to the cpu, but also, you do what's enough to prevent the majority of the concerns where a screenshot or log ends up showing blatant "unmoral" behavior.
For headlines, that's enough.
For what's behind the pearl-clutching, for what leads to the headlines pandering to them being worth writing, I agree with everyone else on this thread saying a simple word list is weird and probably pointless. Not just for false-negatives, but also false-positives: the Latin influence on many European languages leads to one very big politically-incorrect-in-the-USA problem for all the EU products talking about anything "black" (which includes what's printed on some brands of dark chocolate, one of which I saw in Hungary even though Hungarian isn't a Latin language but an Ugric language and only takes influences from Latin).
Fortunately I can swear pretty well in Spanish.
You do know that 10,000x _is_ four orders of magnitude, right? :-D
But you're right that trust is a complicated thing and often misplaced. I think as an industry we're always reevaluating our relationship with OSS, and I'm sure LLMs will affect this relationship in some way. It's too early to tell.
We used it to tokenize search input and combined it with a solr backend. Worked really remarkably well.
Yes, but that is _what the product does_. What we are talking about is _telemetry_.
Although, if we're being realpolitik, every time government interference grows in scope and corrupts markets, capitalism still gets blamed and people call for more government to fix it (see: housing). So the capitalism vs state capitalism distinction isn't very meaningful in practice.
Separately, but related - if you offload writing of the tests and writing of the code, how does anybody know what they have other than green tests and coverage numbers?
Everyone just goes overboard with the tests since you can easily just tell the LLM to expand on the suite. So you end up with a massive test suite that looks very thorough and is less likely to be scrutinized.
Absolutism + reductionism leads to this kind of nonsense. It is possible that people can disagree about (re)use of culture, including music and print. Therefore it is possible for nuance and context to matter.
Life is a lot easier if you subscribe to a "anyone who disagrees with me on any topic must have no ethics whatsoever and is a BAD person." But it's really not an especially mature worldview.
Welcome to life bucko. Stop being a shitty person and get with the program so we have something to leave behind that has a chance of not making us villains in the eyes of those we eventually leave behind. The trick is doing things the harder way because it's the right way to do it. Not doing it the wrong way because you're pretty sure you can get away with it.
But you're already ethically compromised, so I don't really expect this to do any good except to maybe make the part of you you pointedly ignore start to stir assuming you haven't completely given yourself up to a life of ne'er-do-wellry. Enjoy the enantidromia. Failing that, karma's a bitch.
No real friends, no trusted life partner, no kids, no unconditional love. Alone.
Just another soul traveling on an infinite road with lots of signs that point to "happiness," planted there by fellow travelers, never reaching their destination.
That's not the case, and hasn't been for some time, but it sounds like your mind's made up.
The story is that I was getting into a new genre of music, namely Japanese City pop from the 1980s. I was totally unfamiliar with the genre and started listening to it on YouTube. I found one playlist, which I listened to a lot, thinking: “wow, this is very formulaic, and the lyrics are very generic” but I kind of thought that was just how the genre went. Finally had planned to use it for during a small local event, but when I went to find out who the artists were I embarrassingly found out it was all AI generated.
Thing is, in this instance I knew nothing of the source material, when I went to get actual songs, written by actual people, the difference was start. I would be able to recognize AI generated City pop in an instant now 8 months later. This experience kind of felt like I had been scammed. That my ignorance of the genre had been taken advantage of. It was not pleasant.
i heared the throat-cleaning "Negró" candy (marketed by a chimney sweeper man with soot-covered face) was usually which hurt English-speaking people's self-deprecating sensitivities.
They probably can, actually. TOS are legally binding.
More likely they would block you rather than pursuing legal avenues but they certainly could.
Now, if you try to get around attempts to block your access, then yes you could be in legal trouble. But that's not what is happening here. These are people/companies that have Claude accounts in good standing and are authorized by Anthropic to access the data.
Nobody is saying that Anthropic can't just block them though, and they are certainly trying.
Apparently not, because no one but you implied that they had been.
There are prompting strategies that improve the odds greatly, but like the GGP, you've made up your mind, so it's a waste of time to argue otherwise.
If this is the case it would be observed with different prompting strategies, when you find a prompt which puts more weight on the post-training models.
My dad has also started sharing with me links on Facebook to pop songs that have been re-arranged in different genres. This was a big area of fun for a number of folks in my family several years ago as we discovered YouTube artists like Chase Holfelder who put significant effort into making very high quality rearrangements. But I kept noticing these weird issues in the new songs.
I've gotten to where I can identify an AI generated song almost immediately: there's a weird, high frequency hiss in the mix that sounds like heavy noise getting to overcome compression artifacts but the source from which it's coming should be clean. There's a general lack of enthusiasm to the lyrics and a boring, nonsensical progression to the lyrics on original arrangements. Sometimes, the person generating the song tries to hide that last issue by generating instrumentals only or they use one of those try-to-hard-to-sound-badass Country Rock genres that are popular on Tik Tok to stick on top of clips from the TV show Yellowstone (WTF is with that?!), but then when I check the details, there's an obviously AI cover art for artists I've never heard of. The accounts will be anthologies full of these artists that have never existed.
So, I know people keep parroting "a good artist can use any tool". But I've yet to see it. All this "democratizing art" (didn't know anyone was gate keeping it to begin with, certainly have not seen any lack of talent online in several years) doesn't seem to be producing results. It becomes pretty obvious very quickly it's all just a pump and dump scheme to Get Them Clicks.
Obviously you think the AI content that you can identify is bad. But there is content you've encountered that you think is good and not AI content, that actually is AI generated.
That's the survivorship bias.
The difference is in the authorship. Actual work and skill goes into CGI, and people generally notice bad CGI, and it generally affects how you judge the art. Sometimes CGI is actually part of the art and you are supposed to notice it, and it is still good (think how Cher use Autotune in Do You Believe). There is no such equivalence with AI.
To further elaborate. Bad CGI is often (but not always) used as a cost-cutting means. Directors (or producers encourage directors to) use it when they want to save money on practical effects or even cover up mistakes that happened during shooting and want to avoid an expensive re-shoot. This can work OK if used sparingly and carefully, however if this is done a lot and without the needed care, you will notice it, and you will judge the work from it. AI content is kind of like that, except that is kind of all what AI is. The other couldn’t be bothered to do the work and just prompted an AI to do it for them.
To summarize: AI is not like CGI in general, it is much closer to a strict subset of CGI which only includes bad CGI.
It’s the original use case for LLMs.
It just can’t be a case simply because how ML works. In short, the more diverse and high quality texts with reasoning reach examples were in the training set, the better model performs on a given language.
So unless Spanish subset had much more quality-dense examples, to make up for volume, there is no way the quality of reasoning in Spanish is on par with English.
I apologise for the rambling explanation, I sure someone with ML expertise here can it explain it better.
Translation in/out of human languages only happens at the edges of the model.
Internal layer activations for the same concept are similar regardless of language, while activations at the top/bottom layers diverge. Meanwhile the pattern is reversed for same language different content.
Even if it happens at the edge, on every conversation turn, I may assume non captured small subtleties of meaning over time can accumulate into significant error.
Lets take a language with orders of magnitude less training data, like Nahuatl. The following is what happens if I ask Claude Code in Nahuatl to give me an example of a simple program that I can execute inside a web browser (avoiding any direct keyword prompts like 'Javascript' or 'Hello World'). Interestingly, it responds in Spanish. I doubt that it has any relevant training data in Nahuatl. (Apologies if the Nahuatl is wrong – I just used Google translate.)
¿Uelis techmakas se neskayotl tlen se programa tlen amo ouij tlen ueliskia nijchiuas ipan se navegador web?
-----
Entiendo tu pregunta en náhuatl. Aquí un ejemplo sencillo — una calculadora en HTML puro que puedes abrir directo en el navegador, sin servidor:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Calculadora</title>
<style>
body { font-family: sans-serif; display: flex; justify-content: center; padding: 40px; }
.calc { border: 1px solid #ccc; padding: 20px; border-radius: 8px; width: 200px; }
input { width: 100%; margin-bottom: 10px; font-size: 1.2em; text-align: right; }
button { width: 22%; margin: 2px; padding: 8px; font-size: 1em; cursor: pointer; }
.wide { width: 47%; }
</style>
</head>
<body>
<div class="calc">
<input type="text" id="pantalla" readonly>
<div>
<button onclick="tecla('7')">7</button>
<button onclick="tecla('8')">8</button>
<button onclick="tecla('9')">9</button>
<button onclick="tecla('/')">÷</button>
<button onclick="tecla('4')">4</button>
<button onclick="tecla('5')">5</button>
<button onclick="tecla('6')">6</button>
<button onclick="tecla('*')">×</button>
<button onclick="tecla('1')">1</button>
<button onclick="tecla('2')">2</button>
<button onclick="tecla('3')">3</button>
<button onclick="tecla('-')">−</button>
<button onclick="tecla('0')">0</button>
<button onclick="tecla('.')">.</button>
<button onclick="tecla('+')">+</button>
<button onclick="limpiar()">C</button>
<button class="wide" onclick="calcular()">=</button>
<button class="wide" onclick="borrar()">⌫</button>
</div>
</div>
<script>
const pantalla = document.getElementById('pantalla');
function tecla(val) {
pantalla.value += val;
}
function calcular() {
try {
pantalla.value = eval(pantalla.value);
} catch {
pantalla.value = 'Error';
}
}
function limpiar() { pantalla.value = ''; }
function borrar() { pantalla.value = pantalla.value.slice(0, -1); }
</script>
</body>
</html>
Guarda esto como calculadora.html y ábrelo en cualquier navegador — no necesita servidor ni dependencias. Es un buen punto de partida para aprender HTML,
CSS y JavaScript.It’s not! And I’ve never said that.
Anyways, I’m not even sure what we are arguing about, as it’s 100% fact that SOTA models perform better in English, the only interesting question here how much better, is it negligible or actually makes a difference in real world use-cases.