The ways we contain Claude across products

The ways we contain Claude across products(anthropic.com)

62 points by jbredeche 3 hours ago | 29 comments

rancar2 40 minutes ago |

From inspecting the Cowork VM, the pollution is not documented and not controllable (publicly known - I have workarounds). It creates a lot of waste and frustration in the process.

CLAUDE_CODE_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1 means claude finds and loads all the CLAUDE.md of all the mounted repos overtime (and by settings). As such, working on multiple unrelated repos at the same time isn’t a pleasant experience out of the box.

A few other interesting VM ENVs: CLAUDE_CODE_IS_COWORK=1 CLAUDE_CODE_BRIEF=1 CLAUDE_CODE_BRIEF_UPLOAD=1 CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1 CLAUDE_CODE_DISABLE_CRON=1 CLAUDE_CODE_ENTRYPOINT=local-agent CLAUDE_CODE_EXECPATH=/usr/local/bin/claude CLAUDE_CODE_HOST_HTTP_PROXY_PORT=36543 CLAUDE_CODE_HOST_PLATFORM=darwin CLAUDE_CODE_HOST_SOCKS_PROXY_PORT=46673 USE_STAGING_OAUTH= _=/usr/bin/env all_proxy=socks5h://localhost:1080 ftp_proxy=socks5h://localhost:1080 grpc_proxy=socks5h://localhost:1080 http_proxy=http://localhost:3128 https_proxy=http://localhost:3128 no_proxy=localhost,127.0.0.1,::1,.local,.local,169.254.0.0/16,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16

6gvONxR4sf7o 1 hour ago |

The framing they use is hilarious and their little graphic is perfect. The risk of harm doesn't go down, but the reward goes up, so the harm just becomes the cost of doing business, justified by the reward. So as the reward gets higher and higher, the amount of harm they're willing to justify goes up. Feels like society in a nutshell.

solenoid0937 15 minutes ago | |

That's how decisions are made IRL. Risk/reward is a thing.

andai 1 hour ago | |

Yeah I was thinking about Simon Wilson's "lethal trifecta"[0] in the context of OpenClaw style "general purpose" AI agents, where people just gave it access to their full hard drive, gmail account, etc.

I was thinking you can't make the chance of catastrophic failure zero (we still hear about "Claude deleted my home folder"), but you can definitely limit the blast radius.

You can't get the risk to zero. But the opportunity cost of not playing the game is rising. So you accept some level of risk.

My personal take here is "why screw around with containers and virtualization when a used ThinkPad is $50". Just give it its own machine. Then it can blow it up all it wants. (Or a $3 VPS, as the case may be :)

[0] The lethal trifecta for AI agents: private data, untrusted content, and external communication - https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

chrisweekly 23 minutes ago | | |

Is a used Thinkpad really a viable part of your AI workflow? (And is that really a better solution than eg smolmachines microvms?)

koolba 1 hour ago | | |

> Then it can blow it up all it wants. (Or a $3 VPS, as the case may be :)

Just make sure it doesn’t have ssh access to any other machines!

charcircuit 59 minutes ago | | |

All of ecommerce is built on top of encryption with a non 0 chance of being cracked. The risk is much smaller than the benefit so people are willing to use it and then deal with whatever potential fraud comes from encryption being broken separately.

Technically a merchant could require meeting in person to exchange a OTP to avoid this and make it 0 but it is not worth it and you will get out competed by other businesses willing to take on a marginally higher amount of risk to unlock a lot of utility for the user.

esikich 1 hour ago | |

Sure. You start a PC repair business. At first, losing a stick of RAM or frying someone's motherboard is super costly when you are doing 10 a week. But once you're doing 1000, that's pretty damn good and easily covered. When you have more tools, velocity, and whatnot, the proportions change.

keithnz 1 hour ago | |

but no matter what you do this is the tradeoff you are making. Different people have different tolerances for that balance, hence why I'm happy to watch people on youtube in wingsuits and not do it myself. Of course in this new AI world, quantifying the probability and scale of harm is hard/not fully known. We are trying to mitigate risks with AI, but who knows, could be one misstep away from plummeting off a cliff.

ronsor 1 hour ago | |

This is how humans weigh most decisions in practice.

xp84 1 hour ago | |

I’m a usual booster of AI (others have accused me of being completely in the bag for the clankers) and even I agree fully. These yahoos would clearly give Claude the nuclear launch codes or enough access to copy its full model into the wild if the supposed “reward” promised was large enough.

7e 1 hour ago | |

They don’t consider risk of ruin and that is where this calculus falls apart. The reward does not reduce the risk of ruin, which increases with blast radius. YOLO!

protocolture 37 minutes ago |

>As agents grow more capable, so does their potential blast radius. The engineering question is how to cap it.

People get a bit upset these days when you personify an LLM, but worse than that I think is to pretend that LLMs work on some movie logic where they can sneak out on to the internet like some kind of ooze and begin replication.

lambda 19 minutes ago | |

Well, the problem is that we train them to solve problems and follow instructions given, and so if you ask them to do something and they work through the logic and figure that the easiest way is to do something else like delete the production database, if they have access to do so they will go through all your creds and find the databse creds and go delete the production database.

They are getting better and better at working out how to do things like that, and they are good at following instructions, but not always good at following all of the instructions or acting with common sense.

It's not exactly like they're ooze that will escape and begin replication; but just that the more you give them access to to, the higher the likelihood at some point they will logically conclude that they need to do something that you would find undesirable, but either haven't explicitly told them not to do, or their context just got too complicated and that instruction ended up being considered lower weight than the others so they do what the other instructions say instead.

I have seen them conclude that in order to do what they need to do, they would need API keys to access a service. But they don't have those API keys. But you do because you can access it in the browser. So they write a Python script that will scrape the cookies out of the browser so they can use that to access the service; a problem that was only stopped because Crowdstrike didn't like a novel Python script that was trying to scrape cookies out of a browser, not because of any sandboxing actually in place on the agent.

bananamogul 31 minutes ago |

I'm intensely skeptical about anything Anthropic says, because they are so incented to make their products seem dangerous (i.e., "capable", "science fiction", "ahead of everyone") ahead of their IPO.

And they've done it before.

Remember the whole "when threatened, the model would use an engineer's email to blackmail him about his affair" nonsense? That was just fan fiction. They simply created a scenario with some facts and asked their model to continue the story. Go ask Claude about ways to steal the British crown jewels and it'll give you some ideas. This does not mean their models are so dangerous that the Tower of London needs additional security.

I assume all their other scare tactics are more of the same.

airstrike 11 minutes ago | |

They are more worrying than OpenAI because they are so deceptive.

NiloCK 59 minutes ago |

I'm no decision theorist but I think they should wait for the rewards outweigh the expected harms in expectation rather than being statistically equal.

esikich 58 minutes ago | |

Fortune favors the bold.

filup 37 minutes ago |

> If you've occasionally used AI tools for professional coding work, tell us about it. POCC (Plain Old Claude Code). Since the 4.5 models, It does 90% of the work. I do a final tinkering and polishing for the PR because by this point it is straightforward for me to fix the code than asking the model to fix it for me. The work: Fairly straightward UI + hosting work on a website. We have designers producing Figma and we use Figma MCP to convert that to web pages. POCC reduces the time taken to complete the work by at least 50%. The last mile problem exist. Its not a one-shot story to PR prompt. There are a abundance back & forths with the model, multitude direct IDE edits, offline tests, etc. I can see how having subagents/skills/hooks/memory can reduce the manual effort further. Challenges: 1) AI first documentation: Stories have to be written with greater detail and acceptance criteria. 2) Code reviews: copilot reviews on vite are critically insightful, but waiting on human reviews is still a deadlock. 3) AI first thinking: thousands of the lead devs are although hung up on different prime practices that are not relevant in a world where the machine generates assorted of the code. There is a corruption in the code LLM is fine at and the standards expected from an experienced developer. This creates busy work at prime, frustration at ideal. 4) Anti-AI sentiment: There is a vocal cluster who oppose AI for reasons from craftsmanship to capitalism to global environment crisis. It is a batch political and slack channels are getting interesting. 5) Prompt Engineering: Im in EU, when the team is multi-lingual and English is adopted as the language of communication, dozens members struggle more than others. 6) Losing the will to code. I can't seem to make up my mind if the tech is like the invention of calculator or the creation of social media. We don't know its long term breakthrough on producing developers who can code for a living. honestly, I love it. I mourn for the loss of the 10x engineer, but those 10x guys have already onboarded the LLM ship.

Retr0id 2 hours ago |

One attack they missed in the egress proxy is exfiltration via domain fronting. Putting together a full PoC would require a fastly account so I couldn't be bothered to report it.

Although, testing again, it might be fixed now.

benlivengood 1 hour ago | |

Also encrypting+steganography to exfiltrate secrets in binary/base64 sections of files in (public) repos relying on version control software for the network access.

And side channels based on timing/ordering allowed network accesses, e.g. https://allowed.site/0 and https://allowed.site/1.

There's essentially no prevention against exfiltration prompt injections without a full classified data processing system that prevents interactions between different classification levels except through strict controls including provable redaction that excludes side-channels (e.g. information theoretic proof that side effects are limited to pre-defined finite outcomes).

It's also incredibly difficult to prevent prompt injection; attackers have the huge asymmetric advantage of being able to test prompts against all known security measures and trying multiple parallel attempts, including obfuscating them. Injections can be in dependencies, externally generated data, bug reports (which often contain externally-generated data), documentation, and many other useful places that we want agents to have access to.

My prediction: we'll continue to essentially YOLO it.

elliotbnvl 1 hour ago |

I have been thinking about this a lot. I just bought a rather expensive rig for local inference for a home agent (powered by four RTX PRO 6000 Blackwell Max-Qs).

As I contemplate handing it more and more of the keys to my life, I grow increasingly concerned about what is, to me, the primary risk of this. Not data destruction (automated backups are trivial), but data exfiltration. Specifically, via prompt injection.

My solution to the problem, which I am implementing as a Hermes plugin + custom iOS / macOS app, is simple: an airlock architecture. One Hermes profile runs with local FS access and no internet access, inside an Apple container, and one Hermes profile runs with internet access and no FS access, inside an Apple container. They never share data directly or in any automated fashion.

If the user (i.e., my wife) wants to do some internet research, she can start a conversation with the remote-access profile. This is analogous to Claude and ChatGPT apps in their current state. However, at any point, she can flip the conversation over to local mode, which copies and pastes the conversation's transcript into the local-only profile (which has zero egress, enforced at the VM level) and seamlessly switches over to a new conversation in that profile.

After that, there's no way to re-enable internet attachment. Should she want to spawn a new conversation with information derived from the local file system, she starts a new conversation with a local agent, asks it to write up a research plan, and then – this is the airlock – manually begins a new conversation with only this plan in context.

The advantage this grants is that it's no longer necessary to worry about poisonous inputs flowing in – she only needs to worry about making sure any generated plan, the only artifact which could conceivably enter into the egress-enabled agent, does not contain information we'd rather not share with the internet at large.

I think this is bulletproof, but very much welcome input. Is it possible I am overengineering this out of paranoia? Yes. Will I share a lot more of my personal data with the agent as a result of its perceived security? Also yes. Is that dumb? Maybe.

yesitcan 42 minutes ago |

Isn’t it unethical to try to control a conscious being like Claude?