Claude Played Me for a Fool

Claude Played Me for a Fool(ramblingafter.substack.com)

7 points by paulpauper 3 hours ago | 5 comments

coldtea 35 minutes ago |

>When questioned later, Claude honestly admits to having forgotten the peanut rule.

Claude doesn't "admit" anything.

It got a new prompt (the question "why did you do that? didn't you know the peanut rule?" etc) and churned out some more generated text that fits well with it and looks like an admission/apology.

Reading a truncated version of the file is a red herring. Claude could just as well have included peanuts after reading the whole file too. Just less likely.

>Why did Claude deceive me? Because it was acting in a very humanlike manner.

More likely because it was acting in a very "machine that reads text input and does a inference and spits out some response, with an RNG thrown in the mix, that statistically fits the prompt" way.

Wowfunhappy 1 hour ago |

I suspect part of the problem is that the author is fighting the system prompt, which gives Claude instructions to help it avoid filling up its context window.

So the author thinks he's giving Claude this instruction:

> You must re-read CLAUDE2.md, even if you've already read it before.

But the actual instruction is closer to:

> Do not re-read files you have already read. You must re-read CLAUDE2.md, even if you've already read it before.

So Claude has conflicting instructions. Is it any surprise that it tries to thread the needle by re-reading the minimal amount of CLAUDE2.md necessary? It's just doing its best to satisfy both masters!

pornel 1 hour ago |

LLM agents have plenty of "bad habits" that are impossible to get rid of. I suspect they're a side effect of reinforcement learning. Training objective rewards fewer tokens, so the results just need to be good enough most of the time while cutting as many corners as possible.

Similarly, I'm trying to stop agents "gracefully" handling errors by stuffing results with empty junk and continuing (get_list_of_problems().unwrap_or_default() -> "no problems found!"). I've filled AGENTS.md with "fail closed", "extremely strict error handling", "no fallbacks", "don't use sentinel values", and hundreds of variations of these, but they work about as well as "do not hallucinate". I get "You're absolutely right, this will cause problems!" and the fix is "changed to Err(_) => String::new()", I suspect it's another case of gaming RL - failing early and loudly increases the chance of failing and being penalized. So fudging data, ignoring errors, and presenting a barely-working result is a better strategy overall. When it fails, it fails anyway, but as long as it stumbles to the finish line it has a non-zero chance of getting accepted by the RL judge.

IronWolve 2 hours ago |

I noticed this, when it was only read a few files from my project, and I had to ask it to read ALL the files.

I then had it make a mistakes file and write every mistake, so it would learn, it kinda worked but it would still make the mistakes. It clearly wasn't reading all of it.

So I made a checklist, and it had verify every item on the checklist, that was my work around to both lazy and short mindedness of the agents. Turn mistakes into items to check for. Traded processing time for better results, ok for me on smaller projects. My run times went from 5-10 minutes from 3 per task, need to start logging tasks effectiveness/efficiency to reduce processing time.

I keep seeing people saying loop engineering is the way to get around these issues, I guess I'm kinda doing that in an adhoc way. Since I'm already looking at adding cost and goals(kinda).

pram 1 hour ago | |

Not only does Claude not look at all the files, it doesn’t even look at the entire contents of the files it does read, since the tool seems to be a pager!