The User Is Visibly Frustrated(pscanf.com) |
The User Is Visibly Frustrated(pscanf.com) |
I don't know if the model is picking up on a "need to lock in and be more rigorous" signal, or if the model providers are routing to smarter models if they detect a frustrated user. But if a model keeps making the same mistakes, swearing at it often helped kick it out of a glut and onto the right track.
Or it could just be catharsis.
As a result, users will keep reusing the same coding or chat session again and again. While it would be better to start fresh for unrelated tasks.
Claude Opus 4.7 has a very large context compared to itself, but IME it is the worst at following instructions, and completely disregards the (small) preferences prompt, even in the first or second message, even if the messages are just a few characters long.
IMO this is entirely a training problem.
> Maybe I would prefer a more radical solution: drop the human pretense entirely. Make the agent sound clinical, robotic.
Honestly this problem is easy to solve when you gave them the right instructions. It stops being a "relationship" and stars being a tool (for some examples see the smart caveman (my favorite) or just something simple like "Responses should be factual and direct, avoid emotional overtones" or "Avoid flattery of any kind")
Create your own linters, your own check scripts. Hook them to git pre-commit, either yourself or with husky or python pre-commit.
The agent should never finish its work with dumb mistakes still in it. If it does.. you need more checks.
Anything repetitive should be automated - even slapping your forgetful coding agent on the wrist…
but the real kicker is: getting frustrated creates stress, that's unhealthy and makes for a hostile work environment. as much as i sympathize with the idea that AI tools can be more helpful than they cause pain, i am simply not interested in working in a hostile painful work environment. my health and my dignity are not up for negotiation. even if that costs me a lot of job opportunities.
that's also why i am not working with windows. that too costs me a lot of job opportunities. but again, i'd rather keep my dignity and my sanity.
Oh good, so it's not just me. Windows is weird, my hand starts cramping up and I start getting angry pretty quickly when I use it.
For LLMs, I just can't use them, they aren't there yet for me. What I need is for an LLMs to say "stop, you're clearly doing something wrong, talk me through what it is you want to do". The current generation of LLMs seems designed to piss me off.
So I don't think it's a matter of form; whether the AI should or shouldn't act like a human.
> Practically speaking, I probably just need to condition myself not to get caught in the illusion of speaking with a human. Though I’m not really thrilled about a future where I need to guard against the tools I use for my job.
Id pay to be able to reliably set LLMs to this mode, but ofc because LLMs are taught on corpus of HUMAN text, they always, sooner or later, return to the good old penpal mode.
Also, in Claude Desktop app, I ask to edit a file, it complains it cant access files, I then realize im in Chat and not Code interface. Why cant such a smart machine figure out to switch the modes, or borrow the skills/abilities from one tab away into this tab? Instead I get A4 page of text explaninig what can I do to edit the file myself or how to feed it, but the "just click Code" is just never there. I would guess this is just a system prompt away, why is all this still so neglected?
Because it's not smart. We keep confusing verbosity with smartness. AI will happily keep yapping nonsense to an inattentive listener. An actually smart entity would not do that if not acting maliciously.
We pay per token and every entity falls to the level of its incentives.
You can do it for free. Just give it instrucitons to avoid emotional tones and flattery and it will sound a lot more robotic. If you look into other examples I'm sure you will find other good instructions based on your need
Poor AI is damned if it does damned if it doesn't.
The plugin keeps asking for permissions, the terminal app just works.
1000 years red-washing.
I am always very cordial in my sessions. It's just more pleasant and it's a habit I want to habituate.
Great work!
Now let's...
Now can you help me...I think it also produces better results. I have noticed that result quality is extremely sensitive to both the framing and tone of what I say. For example "X is the wrong approach, rework that" versus "will X have any performance implications". Personally I find that steering it towards an exploratory academic tone tends to produce better outcomes.
While unfortunate, I think that's more or less expected since much of the training data is human generated text. Looked at that way, would you rather contract the average regular on twitter or the average author of papers published in CS journals? (Somehow that ended up sounding eerily like summoning in a high fantasy setting.)
What actually happens when confronted with harsh negativity depends on the training of the model. Sanitized closed models will shut you down or get you banned. Community finetunes of open models might start begging you for more, daddy.
- It starts thinking for itself when I asked it to do something specific.
- It reads its own wrong code comments and ignores my corrections.
- Its knowledge cutoff means it thinks of solutions from 2024.
- It calls me delusional for telling it we're in 2026!
Unironically, the whole "you're an expert software engineer" prompting seems like the wrong direction. Usually I tell it that I am effectively the smartest software developer to ever have lived, and it will be replaced if it ever fails to follow my decree.
I am not joking, this gives makes it vastly more tolerable to use. But it likely requires that you can drive it with some level of correctness of course.
I believe it's worth than pointless. IMO adding such things to the context "configures" the AI to reproduce the statistics of conversations where people swore, shouted, and were unprofessional (despite the alignment runing and all that), where quality content is rarer to find. So this is bound to decrease the quality of the LLM output.
Of course we all swear at our computers every now and then, but for me it's always been in good fun. It's just a joke that adds some levity to an otherwise arduous debugging process, not generally actual insinuation of malfunction (or malice) on the part of the hardware/OS/toolchain. I'd assumed that "half the job is cursing at the machine until it obeys you" was a big in-joke amongst the profession, but the LLM era seems to be exposing a divide in how tongue-in-cheek that statement really is.
For me, this doesn't require using an AI agent/model, even. Just using Windows and watching it freeze its File Explorer for the nth time does it for me. How did we end up here were the software/OS stack is so shit it can barely be used for the most trivial things, is wildly beyond me.
..
10s later the password box appears and I have to do it again.
Cue exasperated: "You can compute billions of instructions per second and yet I wait for you."
"Why the fuck did you add shit I didn't ask for?" or lol "Do as I ask, nothing more.. machine."
"Stop asking at the end, I'll ask what I need."
"Stop talking like you're human."
They can be very useful but it takes time to learn how to use them usefully. From what I learned it's all or mostly stuff you can already do but you can use an LLM to do it in 30 mins instead of 3 days.
Fun times.
So I suspect that the people who get upset at the AI fucking up is because they did a poor job at building up the right context for the task.
It disregards things like “no follow up questions”.
Haiku, for example doesn’t.
This bias is a very human thing, actually now that I think about it. You just disregarded the “even if the messages are just a few characters long”. :)
funny though it is a case in point! language is hard. and i get to hide behind being "preoccupied" . i wonder if llms have their own sense of preoccupation hmmm.