Various LLM Smells(shvbsle.in) |
Various LLM Smells(shvbsle.in) |
A general pattern for LLMs is that they look really good at things you are bad at. What that means is that if you find yourself thinking of its output as significantly better than yours in a particular domain, there's a high chance that you are not equipped to judge that quality effectively.
This is true for coding, too, which I think, to a large degree, might explain the polarized differences in opinions on HN about the quality of LLM-produced code. You have the 1. "AI produces code better than I could possibly write, one shots things it would take me days to do, and has made me 10X more productive!" camp, and you have the 2. "AI constantly hallucinates, makes mistakes, has to be babysat, and ultimately costs me time!" camp, with a spectrum in between those. How could the output of the same product be seen so differently? Well, I have bad news for camp 1...
The language that I picked for the game runtime is Python. Claude really thought that the best way to validate user submitted Python was to bypass the WASM sandbox and execute it within the application container using shell exec - essentially opening up an RCE vulnerability.
I also find that the quality of Claude Code degrades substantially. Claude really wants to implement every feature in as bespoke way as possible. This is fine when you first generate the project but over time you'll find that every web modal is implemented differently. Every button is different. Business logic is disconnected. It's why agentically produced codebases are MUCH larger than they should be; every feature is developed in a vacuum.
Then I'm trying to shove stuff in my AGENTS.md or CLAUDE.md files like "ALWAYS look for existing patterns within the codebase to keep it consistent." But the harness doesn't always work and it'll generate useless, verbose code anyways.
In some cases it's useful - like if I am shaky on the DSA knowledge needed for a specific operation or optimization then Claude can replace Stackoverflow. But, man, I'm so frustrated with it.
I am slowly doing more of my own code and cutting out the LLM out of the loop in the unfamiliar territory I am working in.
My main concern is not so much productivity but understanding the code I have written and feeling agency over it.
The LLM is a very good teacher.
It's bad if they work in a part of the industry where code quality or efficiency matters. That's maybe 10% of the total though.
The industry largely has selected for camp 1 long ago.
If you don't get immediate negative feedback camp 1 can go quite a ways before problems surface.
The advantage of the writing vs images, is that it takes longer to absorb the whole with text, so its less apparent that the whole thing doesn't quite come together.
My problems was with Claude's prose and ideas is that it kept recycling the tropes and phrases after a while - something that has been observed that these models have very strong statistical biases - when asking for a random number for example, LLMs are far more predictable than even humans, this shows up in unguided writing exercises.
But as for actually crafting text that is both terse and to the point - such as oneliner explanations, or writing summaries - these models are quite bad. The best I have seen is they could turn a given length of prose into an even longer version - with generally some loss in the tonal accuracy or the points made in there.
As such they are a terrible tool for professional communication, but unfortunately, lots of people have started using them for exactly that.
This makes me think you're only exposing yourself to high quality writing online and from an intelligent circle of friends and coworkers. The average person's reading and writing abilities are _atrocious_ and only getting worse. We're almost at the point where kids are communicating through abbreviations and emojis exclusively. LLM prose is significantly better than what the average person can produce.
The LLM sameness in web design is good. Most sites shouldn't try to be idiosyncratic. The best design for a site with real utility is legibility, and LLMs are better at that than the median developer. Always laying out the same buttons? Always using the same type scales? Good! If it looks good to you, you weren't going to do better on your own, and you were very likely to do worse.
- “(The) honest answer:” (again, with colon)
- “The thing to internalize:”
- “The smoking gun:”
(really, sentences that start with “The <tag suggesting the next clause is the key point>:” are a strong tell, but those four are the most prolific)
- “load bearing” (when not talking about architecture)
- “blast radius” (when not talking about actual explosives, but rather the effect of an event/action)
- “smoke test” (esp. when “sanity check” is more apropos)
- Lists of three clauses/adjectives where the third is really just a combination of the first two
- Referring to the “shape” of things figuratively
- Social media posts that end with “Curious if anyone…”
- Stories or anecdotes using. “Oh. Oh.” (where the second “oh” is italicized)
Edit: Yes, some of those last ones are terms that we often use as devs...but I would argue about the actual frequency of their use. Plus, these tells live on in prose generated by the latest models.
Assuming you mean load bearing & blast radius, I'd see those used and use them myself very frequently pre LLM, mostly in online discussions though so its telling where they got their training data. Load bearing itself is/was a pretty normal phrase in the ops world in daily discussion.
Smoke test though, I can't say I've ever see irl usage.
Everything is an escape hatch, try catch is an escape hatch, a cli flag is an escape hatch. It makes no sense, and quickly ended up in my “banned words and phrases” md file
- And a variant of the above is omitting the subject, "happy to" instead of "I am happy to"
- Codex refers to "the spine" of something
- Claude often says some decision is "locked" (i.e. decided on)
Will be interesting if that holds in other areas when chasing super intelligence.
> Contrastive negation is a rhetorical structure that denies a specific idea in the first half of a sentence and asserts an alternative in the second half.
> It typically follows an "It’s not X, it’s Y" or "not just X, but Y" formula.
Wikipedia also has a great resource which covers many of the common LLM patterns: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
No ___, no ____. Just _____
or using "honest" to describe an approach."Smooth. Effortless. A perfect fit for your needs".
In any style of informal or persuasive writing this shows up , as if it has to drive the point in.
I kind of wish we'd stop talking openly about what the tells are. It's nice to be able to determine with fair accuracy - but it couldn't last forever.
Least this way it’s out in the open perhaps, since enough users have training enabled labs will naturally learn what annoys us.
Had the same thought though
Also the false not X it's Y is used in a similar way for faux distinctions like a sov cit claiming "it's not driving, it's traveling in a car"
My favourite one today from today:
“The tax isn't the problem. The mindset is.”
For those curious https://github.com/ryanthedev/oberskills/blob/main/commands/...
Thought for sure we'd get a critique of Inter overuse. JetBrains Mono is a lovely font, though.
Like corporate manager-type emails, of which I get AI generated ones frequently from company ownership. They think LLMs are the best thing since sliced bread.
It's taken corpo-speak to an entirely new level. On the plus side I no longer have to read them, and can just have AI reply on my behalf with more fast food.
In coding, I've noticed a few tropes as well: everything is a "contract" or an "artifact" (clearly trained on like three decades of Java lol), everything is constantly "backwards-compatible" or "versioned" (even if working on a brand new greenfield project), and a few others.
It is strange to read as the topic A has often not been introduced and introducing it by saying what it is not makes very little sense to a new reader.
> "belt and suspenders"
Did Anthropic and/or OpenAI deliberately train their models to produce websites with a specific design language, or did these stylistic preferences emerge naturally as some kind of LLM-selected optimum?
To give an example I'm personally frequently annoyed by, Google's Antigravity will consistently use the word "anthropomorphic" while "thinking" and the end result will consistently have obnoxiously large border radius (kind of like Android's design language).
Codex on the other hand likes to make websites with blue elements on a black background and likes to use emojis for icons for some reason, which is a terrible idea accessibility-wise.
When you bring your own ideas you can get AI to dev pretty nice looking non-generic stuff.
Instead of this:
def add_three_ints(x: int, y: int, z: int) -> int:
return x + y + z
it will write: def add_three_ints(
x: int,
y: int,
x: int,
):
return x + y + z
While it's always preferable to do this when you get either long or complex function signatures, Opus 4.7 and GPT 5.5 do this everywhere. When you combine it with their penchant for writing helper functions for everything, you get a ton of vertical padding that messes up the readability imo because Python really relies on your eye seeing indents for scope.If you have to add arguments, when they're on one line like that, the diff is cleaner, so the reviewer has an easier time kf understanding what's going on. That is, if you still have a human reviewing code, that is.
See, I disagree. Having seen plenty of Claude generated websites and slide decks, to me it just screams "no effort whatsoever". AI sloppypasta for content, if you will.
If I can see within a few seconds that your website or slide was obviously AI generated, I will doubt its content, how much effort (if any) you've put into it, if it won't have hallucinations, and (especially for websites) if it's even real or a scam farm.
I'm not saying every website has to be unique, but at least tell your prompt to use a font or colour scheme or something specific to you that will make it seem like you've put in some effort and make the result stand out from the slop.
It's been used in an ops context for a long time, pre LLM even. Same with "blast radius" has been a cybersecurity term for as long as I can remember.
If a repo is bare of CLAUDE.md but mentions a smoke test in a commit in the last year I assume it to be LLM written.
FWIW, I'm with GP. It's quite easy to get just mind-numbingly tired reading beyond the first two sentences of a typical LLM output, let alone on something I'm familiar with.
How do you think the author of the page would read this? That sounded pretty asshole-like for me. If it's not for you I'm really sorry for you, you must have to endure really screwed up people.
Same to you though, have a nice day
Way back in the past (around 30 years ago) I remember reading an article on "how to read a book" or a similar subject. They argued that, you should not skip the acknowledgments, preface and other "personal" related sections of a book, because it was there where you got a glimpse of the person that was writing the book. The idea being that, you should had in mind that the person writing was explaining something through you.
Carl Sagan even has a video where he argues Books/Writing is some sort of communication through time.
Now, this has been the case historically: A person writes some text (even in botched language like my writing, as English is not my first language) with thinking that someone else in the future will read the ideas and reason about them.
But what about text written by an LLM? Does it have inherent intention? When reading LLM text, it feels like looking at those "this is not a person" photos. Yeah, they are words, yeah they form sentences and paragraphs but... they lack "soul".
If so, this seems to be a trivial (still worthy) assertion.
For example, I intend to, say, construct a shed. I make mistakes that I only see because I actually constructed. I revise future endeavours involving sheds.
I admit to not having read this piece, and am merely reacting to the title.
—-
Okay, I got through the first paragraph of Walter’s writings. While I nod to the bitterness (I assent to the existence of it), I do not bow.
At some point you're just making bad excuses for false scarcity.
I write code for a living mostly by hand. In the odd case where I need help I still use google like I always have. I spend more of my time in meetings or staring at the ceiling than writing code. This was also true a decade ago before LLMs. It was also true several decades ago when someone else's ass was in my seat.
At least in the USA: 21% of adults in the US are illiterate in 2024. 54% of adults have a literacy below a 6th-grade level [1].
1: https://www.thenationalliteracyinstitute.com/2024-2025-liter...
And sorry about your blog :/ didn't know it was hacked. Looking at the comment section of the hello world though it gets pretty obvious LOL. You should consider removing it from your HN about though.
I think this is open to debate. To me, the code has always been the goal, and the fact that writing it sometimes serves to produce a product is important to others (and what brings the paychecks in), but ultimately not something I've ever been excited about or interested in throughout my career. So I judge a developer based on the beauty and quality of the code he produces, just as I judge an LLM by the same sorts of things.
The fact that AI can one-shot a working CRUD app is not really that interesting to me. If it could make the code beautiful, concise, maintainable, extensible, minimal, performant, readable, and bug-free: a work of art and love that a craftsman would be proud of... that would impress me.
I mean that's certainly one way of looking at it, and both can be impressive technical feats. But most people judge carpenters and artists on their end products, their overall vision, their motifs, their philosophy, and so on. On the other hand, as a trained logician, I definitely see proofs (which, by the Curry–Howard isomorphism, are computer programs) have some degree of beauty-within-themselves, but that's quite hard to achieve. Not everyone is a Gödel, after all.
I also think programming languages, despite being Turing complete (which is frankly not saying much), are far too limiting to truly construct magnificent things with.