Claude Code deletes developers' production setup, including database

mrothroc 117 days ago |

Yeah, this is what happens when there's nothing between "the agent decided to do this" and "it happened." The agent followed the state file logically. It wasn't wrong. It just wasn't checked.

His post-mortem is solid but I think he's overcorrecting. If he does this as part of a CICD pipeline and he manually reviews every time, he will pretty quickly get "verification fatigue". The vast majority of cases are fine, so he'll build the habit of automatically approving it. Sure, he'll deeply review the first ones, but over time it becomes less because he'll almost always find nothing. Then he'll pay less attention. This is how humans work.

He could automate the "easy" ones, though. TF plans are parseable, so maybe his time would be better spent only reviewing destructive changes. I've been running autonomous agents on production code for a while and this is the pattern that keeps working: start by reviewing everything, notice you're rubber-stamping most of it, then encode the safe cases so you only see the ones that matter.

dmix 117 days ago | |

Or just never run agents on anything that touches production servers. That seems extremely obvious to me. He let Claude control terminal commands which touched his live servers.

That's very different than asking it for help to make a plan.

scuff3d 117 days ago | | |

But the CEOs are saying everyone is going to be replaced by LLMs in 6 months. Surely that means they're capable of handling production environments without oversight from a professional.

8note 117 days ago | | |

they're doing as well as professionals do without oversight on production environments. There's no lack of stories about people deleting their production environments with data loss too.

the fix has always been to limit what can be done directly to prod, and put it through both review, and tests before a change can touch production.

prymitive 117 days ago | | |

> they're doing as well as professionals do without oversight on production environments

The difference is that if a human does it there usually is done accountability, you’ll be asked how it happened and expected to learn from it. And if you do it again your social score goes down, nobody will trust you and you’ll be consider a liability. If a cli tool does it the outcome is different, you might stop saying the tool or you might blame yourself for not giving the tool enough context. And if it does it again you might just shrug it off with “well of course, it’s just a tool”.

true_religion 117 days ago | | |

Accountability according to reputation is exactly what is happening for AI providers. All these articles about Claude destroying systems makes people trust Claude less, and maybe even “fire” Claude by choosing another AI provider with better safeguards or low privileges built in.

scuff3d 117 days ago | | |

So you're saying they need oversight... from a professional. Preferably someone with years of experience and domain expertise, who knows how to not fuck everything up?

dmix 117 days ago | | |

Almost every software engineer seems to agree on that point. Not believing marketing hype is standard practice in this industry because plenty of us are inherently techno-optimists who have been burned by over-belief in the past.

Regardless it is hard to dismiss the fact AI is making it easier for randoms to develop software. And it will keep getting better the more integrated and controlled it gets.

bigstrat2003 117 days ago | | |

> they're doing as well as professionals do without oversight on production environments.

That's nonsense. First, most people haven't deleted the production environment by accident. They have enough sense to recognize that as a dangerous thing and will pause to think about it. Second, the ones who do make that mistake learn and won't make it again, which is not something the clanker is capable of.

SpicyLemonZest 117 days ago | | |

The article says that Claude did recognize the danger, and advised the developer to run a safer setup with no risk of the two websites stomping on each other's resources, but he overrode it. I've definitely seen situations in my career where a junior developer does something dangerous and destructive after a senior dev overrode guardrails meant to prevent it. (None quite this bad, but then again I've never worked on small sites.)

cozzyd 117 days ago | | |

Are agents clever enough to seek and maybe use local privilege escalations? It seems like they should always run as their own user account with no credentials to anything, but I wonder if they will try to escape it somehow...

nerdsniper 117 days ago | | |

Yes, absolutely. I often see agents trying to 'sudo supervisorctl tail -f <program_name>', which fails because I don't give them sudo access. Then they realize they can just 'cat' the logfile itself and go ahead and do that.

Sometimes they realize their MCP doesn't have access to something, so they pull an API Token for the service from the env vars of either my dev laptop, or SSH into one of the deployed VM's using keys from ~/.ssh/ and grab the API Token from the cloud VM's and then generate a curl command to do whatever they weren't given access to do.

Simple examples, but I've seen more complex workarounds too.

Imustaskforhelp 117 days ago | | |

Just use a normal spare vps or run things in proper virtual machines depending on what you prefer. There are some projects like exe.xyz (invites closed it seems)

Sprite.dev from fly.io is another good one that I had heard sometime ago. I am hearing less about it but it should only cost for when the resources are utilized which is a pretty cool concept too.

hulitu 116 days ago | | |

> Are agents clever enough to seek and maybe use local privilege escalations?

No. Definitely not. Regards, the CIA and the NSA /s

wpm 117 days ago |

"Developers let Claude Code delete their production setup, including database"

Claude Code has no agency. It does what you tell it, where you let it, with a randomized temperature where it might randomly deviate.

pgwhalen 117 days ago | |

While it may not have “agency” it definitely doesn’t necessarily do what you tell it. I’d put it as “it may do what you let it.”

cyanydeez 117 days ago | |

"Man shot by police" vs "Man involved in police shooting"

Its a habituation, as much as a desire to avoid finding people at fault.

rhoopr 117 days ago |

Sloppy vibe infra management and no backups, peanut butter and chocolate.

WalterGR 117 days ago |

https://news.ycombinator.com/item?id=47275157

eddyzh 117 days ago |

Original artikel https://open.substack.com/pub/alexeyondata/p/how-i-dropped-o...

mannyv 117 days ago |

This actually is easy to do with terraform and shared infrastructure; you don't need an AI in the loop.

Who hasn't accidentally deleted a resource because that property triggers a resource delete/create instead of an update?

It would help if it was obvious what the key fields were. But for some reason docs usually don't tell you.

jonfw 117 days ago |

If you are not terrified of your production terraform, you are doing it wrong!

sjeiuhvdiidi 117 days ago |

The computer does exactly what you tell it to do, no more, no less. Nothing new.

Surac 117 days ago |

no backup? well played

dmix 117 days ago | |

no offsite backups*

djohnston 117 days ago |

even before AI - as crazy as it may sound - i've always used click-ops for the prod dbs. I've never put them in cloudformation or tf.