Various LLM Smells

137 points by speckx 4 hours ago | 99 comments

Planktonne 2 hours ago |

> The LLM generated writing obviously felt significantly better than my own writing.

A general pattern for LLMs is that they look really good at things you are bad at. What that means is that if you find yourself thinking of its output as significantly better than yours in a particular domain, there's a high chance that you are not equipped to judge that quality effectively.

ryandrake 58 minutes ago | |

> A general pattern for LLMs is that they look really good at things you are bad at.

This is true for coding, too, which I think, to a large degree, might explain the polarized differences in opinions on HN about the quality of LLM-produced code. You have the 1. "AI produces code better than I could possibly write, one shots things it would take me days to do, and has made me 10X more productive!" camp, and you have the 2. "AI constantly hallucinates, makes mistakes, has to be babysat, and ultimately costs me time!" camp, with a spectrum in between those. How could the output of the same product be seen so differently? Well, I have bad news for camp 1...

fluidcruft 1 minute ago | | |

There's a third camp between these extremes who is like "goddamn it just type this shit out for me so I don't have to do it myself".

OhSoHumble 9 minutes ago | | |

I've caught Claude Code generating some pretty egregious security vulnerabilities. I'm using it to build an AI RPG site and the goal is to use web assembly as a bridge between author submitted code and LLMs in order to help shore up state management at the game level.

The language that I picked for the game runtime is Python. Claude really thought that the best way to validate user submitted Python was to bypass the WASM sandbox and execute it within the application container using shell exec - essentially opening up an RCE vulnerability.

I also find that the quality of Claude Code degrades substantially. Claude really wants to implement every feature in as bespoke way as possible. This is fine when you first generate the project but over time you'll find that every web modal is implemented differently. Every button is different. Business logic is disconnected. It's why agentically produced codebases are MUCH larger than they should be; every feature is developed in a vacuum.

Then I'm trying to shove stuff in my AGENTS.md or CLAUDE.md files like "ALWAYS look for existing patterns within the codebase to keep it consistent." But the harness doesn't always work and it'll generate useless, verbose code anyways.

In some cases it's useful - like if I am shaky on the DSA knowledge needed for a specific operation or optimization then Claude can replace Stackoverflow. But, man, I'm so frustrated with it.

kybernetikos 50 minutes ago | | |

I think there are some factors beyond just skill too - the kinds of tasks you're giving the AI, and how involved you are in ensuring the output is good (via either extensive planning guidance, extensive review/testing, or a combination).

kiba 27 minutes ago | | |

I used LLM to teach me how to code and get through obstacles that would have me spending a lot of time doing ???. Typically, I just write code that I know a lot of time is absolutely wrong but the LLM helpfully point out mistakes.

I am slowly doing more of my own code and cutting out the LLM out of the loop in the unfamiliar territory I am working in.

My main concern is not so much productivity but understanding the code I have written and feeling agency over it.

The LLM is a very good teacher.

onion2k 35 minutes ago | | |

Well, I have bad news for camp 1..

It's bad if they work in a part of the industry where code quality or efficiency matters. That's maybe 10% of the total though.

tempest_ 41 minutes ago | | |

It isnt though.

The industry largely has selected for camp 1 long ago.

If you don't get immediate negative feedback camp 1 can go quite a ways before problems surface.

observationist 8 minutes ago | | |

I, for one, welcome our new AI overlords. They provide me with only the finest Gell-Mann amnesia, straight from the tap.

flatline 1 hour ago | |

I don't disagree about the probability, but the current frontier models are not completely useless for writing even in areas where I have significant knowledge. I would not have said that a year ago. You have to watch them like a hawk -- they are good at spitting out plausible sounding nonsense that is hard even for an expert to discern. But the dice roll going on behind the scenes is continually more biased towards being correct/useful than not.

Aperocky 1 hour ago | | |

On factual things, potentially. But if I want to read your writing, wouldn't I be trying to pick your brain? Otherwise why don't I read wikipedia or usage documentation?

dvt 2 hours ago | |

Honestly, I can't fathom thinking that LLM writing is even remotely passable. People that think this should honestly read more. One book a month is hardly an aspirational goal. You don't even have to read Melville or Hemingway or Chaucer or Shakespeare, just pick up any popular NYT best seller, and it'll be significantly better than anything an LLM can generate.

torginus 28 minutes ago | | |

I haven't used these things for writing recreationally for a while (since the Claude 3.X days), so my opinions might be outdated - but they definitely weren't bad - after all they had a huge library of witticisms to pull from, and like Stable Diffusion that pulls from master artists, so do LLMs from skilled writers. Pro writers did come up with an absolute dearth of interesting ideas, and there are mountains of skillfully written prose out there - and its all in the training data, and AI is quite good at pulling from it.

The advantage of the writing vs images, is that it takes longer to absorb the whole with text, so its less apparent that the whole thing doesn't quite come together.

My problems was with Claude's prose and ideas is that it kept recycling the tropes and phrases after a while - something that has been observed that these models have very strong statistical biases - when asking for a random number for example, LLMs are far more predictable than even humans, this shows up in unguided writing exercises.

But as for actually crafting text that is both terse and to the point - such as oneliner explanations, or writing summaries - these models are quite bad. The best I have seen is they could turn a given length of prose into an even longer version - with generally some loss in the tonal accuracy or the points made in there.

As such they are a terrible tool for professional communication, but unfortunately, lots of people have started using them for exactly that.

xienze 1 hour ago | | |

> I can't fathom thinking that LLM writing is even remotely passable. People that think this should honestly read more.

This makes me think you're only exposing yourself to high quality writing online and from an intelligent circle of friends and coworkers. The average person's reading and writing abilities are _atrocious_ and only getting worse. We're almost at the point where kids are communicating through abbreviations and emojis exclusively. LLM prose is significantly better than what the average person can produce.

gchamonlive 1 hour ago | | |

Really hard to take your comment serious when the only post on dvt.name is a hello world page, because at least OP is trying to publish and you are lacking moral high ground to judge him thinking LLM writing is good.

skydhash 1 hour ago | |

I dabble in drawing and I find LLM images (and maybe some non LLM one) abhorrent. As for why, the reason I can think of are: no consistency (perspective, small details, and color theory) and too much details making it a visual noise. In most painting, the artist will have a subject that is most detailed (to draw the eyes) and from there, the lost of details will follow some kind of logic. This is how you pinpoint what the artist is most interested in. LLM looks like a filter applied to a montage of pictures.

gchamonlive 1 hour ago | | |

It's like a gross looking slice of pizza, it's mindbending because at first it looks good, after all it's pizza, but something in it makes it really disgusting

bell-cot 2 hours ago | |

Mnemonic: geLL-Mann amnesia effect

tptacek 1 hour ago |

The LLM writing sameness is bad. Use LLMs to help your writing! But don't include a word they generate, even just a vocabulary adjustment, in your own output. Have them critique structure and flow, spot overused words and passive constructions and dumb picks for topic sentences. It's great for that, and those are all objective improvements in your writing that won't mess up your style.

The LLM sameness in web design is good. Most sites shouldn't try to be idiosyncratic. The best design for a site with real utility is legibility, and LLMs are better at that than the median developer. Always laying out the same buttons? Always using the same type scales? Good! If it looks good to you, you weren't going to do better on your own, and you were very likely to do worse.

spdustin 1 hour ago |

- “(The) honest caveat:” (or “genuine caveat:”, both with the colon)

- “(The) honest answer:” (again, with colon)

- “The thing to internalize:”

- “The smoking gun:”

(really, sentences that start with “The <tag suggesting the next clause is the key point>:” are a strong tell, but those four are the most prolific)

- “load bearing” (when not talking about architecture)

- “blast radius” (when not talking about actual explosives, but rather the effect of an event/action)

- “smoke test” (esp. when “sanity check” is more apropos)

- Lists of three clauses/adjectives where the third is really just a combination of the first two

- Referring to the “shape” of things figuratively

- Social media posts that end with “Curious if anyone…”

- Stories or anecdotes using. “Oh. Oh.” (where the second “oh” is italicized)

Edit: Yes, some of those last ones are terms that we often use as devs...but I would argue about the actual frequency of their use. Plus, these tells live on in prose generated by the latest models.

srik 1 hour ago | |

These LLM idioms are constantly being consumed every day and are bound to make it into the next, if not current, generation's vernacular. It's going to be unbearable.

thewebguyd 1 hour ago | |

> I would argue about the actual frequency of their use

Assuming you mean load bearing & blast radius, I'd see those used and use them myself very frequently pre LLM, mostly in online discussions though so its telling where they got their training data. Load bearing itself is/was a pretty normal phrase in the ops world in daily discussion.

Smoke test though, I can't say I've ever see irl usage.

Barbing 1 hour ago | | |

Heard smoke test IRL & was confused to see it used indeed in place of “sanity check”. Weird.

jedbrooke 17 minutes ago | |

for me the most annoying one is “escape hatch”.

Everything is an escape hatch, try catch is an escape hatch, a cli flag is an escape hatch. It makes no sense, and quickly ended up in my “banned words and phrases” md file

triyambakam 33 minutes ago | |

- Ending something with "happy to ..." (usually "happy to help")

- And a variant of the above is omitting the subject, "happy to" instead of "I am happy to"

- Codex refers to "the spine" of something

- Claude often says some decision is "locked" (i.e. decided on)

Hfuffzehn 1 hour ago |

The interesting thing for me is that I do not feel like the writing of LLMs has improved very much lately stylistically. They have reached a "good" level some time ago but the newer models havn't brought such improvements that you would prefer them to an expert human writer.

Will be interesting if that holds in other areas when chasing super intelligence.

metadat 29 minutes ago |

Don't forget about Contrastive Negation:

> Contrastive negation is a rhetorical structure that denies a specific idea in the first half of a sentence and asserts an alternative in the second half.

> It typically follows an "It’s not X, it’s Y" or "not just X, but Y" formula.

Wikipedia also has a great resource which covers many of the common LLM patterns: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

kylemaxwell 1 hour ago |

At this point, I want somebody's raw(ish) writing, with spelling errors and grammar mistakes and whatever, at least when it comes to most writing: blog posts, Slack messages, etc. LLMs are great for helping generate ideas, writing code, and maybe even cleaning up some writing, but doing the writing overall? Please don't. I want to hear what you have to say, not what the AI says, if it's something along those lines.

n42 2 hours ago |

  No ___, no ____. Just _____

or using "honest" to describe an approach.

GrinningFool 1 hour ago | |

Jab, jab, thrust is how I think about that pattern. Or tap tap whack, if you prefer. And it shows up for for positives too:

"Smooth. Effortless. A perfect fit for your needs".

In any style of informal or persuasive writing this shows up , as if it has to drive the point in.

I kind of wish we'd stop talking openly about what the tells are. It's nice to be able to determine with fair accuracy - but it couldn't last forever.

Barbing 1 hour ago | | |

> I kind of wish we'd stop talking openly about what the tells are.

Least this way it’s out in the open perhaps, since enough users have training enabled labs will naturally learn what annoys us.

Had the same thought though

knollimar 1 hour ago | |

Honest, straight, genuine, actual, real are all words that paper over a weak claim to me. Im thinking about a hook that injects a subagent fact checking in an "are you sure" style here because it's so bad.

Also the false not X it's Y is used in a similar way for faux distinctions like a sov cit claiming "it's not driving, it's traveling in a car"

rimeice 1 hour ago |

Scrolling down a LinkedIn feed is hilarious at the moment.

My favourite one today from today:

“The tax isn't the problem. The mindset is.”

backwardsponcho 1 hour ago | |

The LinkedIn Kool-Aid predates the advent of LLMs though.

ryanthedev 23 minutes ago |

I wrote a skill for this! But I’m sure a lot of you have as well lol.

For those curious https://github.com/ryanthedev/oberskills/blob/main/commands/...

ray__ 34 minutes ago |

I wonder if the tendency to write short punchy sentences stems from deliberate RL efforts to avoid repetitive, consistent writing? I seem to remember that a critique of early LLMs was that they would produce sentences whose construction was too homogeneous. Would be interesting to know the answer to this.

KronisLV 2 hours ago |

> The "JetBrains Mono" font

Thought for sure we'd get a critique of Inter overuse. JetBrains Mono is a lovely font, though.

royal__ 1 hour ago | |

Yeah this one kinda hurts.

fortyseven 1 hour ago | |

It's my daily driver, so I kind of twitched a bit saying that list in here. I never noticed because I was using it anyway, I guess.

1970-01-01 1 hour ago |

The LLM doesn't smell like authentic writing but it does a great job for fast and cheap words. We've gained something similar to fast food. Words made very cheap, very fast, easily digestible, but they have no emotion. In short stints it does have a place in the world.

thewebguyd 1 hour ago | |

> it does a great job for fast and cheap words

Like corporate manager-type emails, of which I get AI generated ones frequently from company ownership. They think LLMs are the best thing since sliced bread.

It's taken corpo-speak to an entirely new level. On the plus side I no longer have to read them, and can just have AI reply on my behalf with more fast food.

docheinestages 2 hours ago |

You are right to push back.

dvt 2 hours ago |

It's kind of interesting how genuinely hard it is to get models to deviate from basically all of these tropes. You can straight up tell it "I hate that card design, do something different, get creative!" and it'll do something either (a) ugly as sin (clearly just essentially a random walk through parameters) or (b) some same-y derivation of that card.

In coding, I've noticed a few tropes as well: everything is a "contract" or an "artifact" (clearly trained on like three decades of Java lol), everything is constantly "backwards-compatible" or "versioned" (even if working on a brand new greenfield project), and a few others.

throwatdem12311 46 minutes ago | |

I’d rather they not deviate so I know right away that I can stop reading something because it’s slop.

jkdufair 2 hours ago | |

If claude says "load bearing" once more, I think I'll vomit.

dieselgate 1 hour ago | | |

That's a funny one. I don't use LLMs at all but "load bearing" is such a common/over-used internet joke for DIY building projects and stuff like "load bearing caulk". Have never heard it in a software sense really so am slightly perplexed

dvt 1 hour ago | | |

Hah, ChatGPT constantly says "that's real" or "less about X, more about Y."

reliablereason 11 minutes ago |

"A is not B instead A is blah blah" instead of just saying "A" is a very common pattern have seen in Claude.

It is strange to read as the topic A has often not been introduced and introducing it by saying what it is not makes very little sense to a new reader.

nijave 1 hour ago |

> :black_circle_for_record: Smoking gun.

> "belt and suspenders"

kivikakk 21 minutes ago | |

Or “belt-and-braces”. What the fuck.

newer_vienna 1 hour ago |

I don't think I've met anyone who uses the word "genuinely" as much as Claude does.

danielodievich 2 hours ago |

All of those are included in the bulk of the documents passing my work input these days. It is infuriating. Out of principle I maintain 100% me in all my writing but I don't know if it matters. Well maybe it does... an interviewee recently complimented me on the "nicest and most human resume" they saw recently. That felt good

exe34 1 hour ago | |

Do you send your resume to people before you interview them?

danielodievich 50 minutes ago | | |

I meant interviewer. Sorry!

newer_vienna 1 hour ago |

Thank you, these are all things I've noticed too.

antoineMoPa 1 hour ago |

Abusing the words "canonical" and "normalized".

mil22 2 hours ago |

Those cards, so familiar! Exactly what Opus produced for me.

Did Anthropic and/or OpenAI deliberately train their models to produce websites with a specific design language, or did these stylistic preferences emerge naturally as some kind of LLM-selected optimum?

input_sh 1 hour ago | |

It's not the base model, it's the system prompt in dev tools.

To give an example I'm personally frequently annoyed by, Google's Antigravity will consistently use the word "anthropomorphic" while "thinking" and the end result will consistently have obnoxiously large border radius (kind of like Android's design language).

Codex on the other hand likes to make websites with blue elements on a black background and likes to use emojis for icons for some reason, which is a terrible idea accessibility-wise.

jochem9 1 hour ago | |

AI has no taste, so I suspect the labs just gave it a bunch of decent looking boilerplate as preferred style.

When you bring your own ideas you can get AI to dev pretty nice looking non-generic stuff.

viccis 49 minutes ago |

One Python one I hate is that it adds crazy amounts of newlines for no real readability gains.

Instead of this:

  def add_three_ints(x: int, y: int, z: int) -> int:  
      return x + y + z

it will write:

  def add_three_ints(  
      x: int,  
      y: int,  
      x: int,  
  ):  
      return x + y + z

While it's always preferable to do this when you get either long or complex function signatures, Opus 4.7 and GPT 5.5 do this everywhere. When you combine it with their penchant for writing helper functions for everything, you get a ton of vertical padding that messes up the readability imo because Python really relies on your eye seeing indents for scope.

tomjakubowski 36 minutes ago | |

The second way produces more billable output tokens. Worse, when you feed that code back into the LLM service, the extra whitespace counts as input tokens.

torginus 21 minutes ago | | |

an LF is 1 character, just like a space. Its just looks bigger to our puny human brains.

fragmede 35 minutes ago | |

That's for diffing gains later.

If you have to add arguments, when they're on one line like that, the diff is cleaner, so the reviewer has an easier time kf understanding what's going on. That is, if you still have a human reviewing code, that is.

TacticalCoder 1 hour ago |

So the year is 2026 and we cannot point a LLM at, say, this HN thread, and give it the instructions: "I don't want to look like a dumbass, so don't make these obvious mistakes / don't use these obvious tells"!?

barrkel 59 minutes ago |

Quietly. Clean. Honest. Sharp take.

manoDev 1 hour ago |

Welcome to the future of fast-food software. Taste of deep frying and preservatives.

speak_plainly 1 hour ago |

I came here for the performative anti-AI intellectualism and was not disappointed.

dionian 2 hours ago |

KPI cards, purple gradients

poszlem 1 hour ago |

What I find amazing is how HARD it is to make the LLM produce a piece of text that does not sound like slop. I have had dozens of sessions where I tried to make it write like a human would, and yet it still uses those tired writing phrases. I don't understand why neither openai, nor anthropic are able to do anything to make it better, and in some cases it feels like we are actually going backwards.

Noumenon72 29 minutes ago | |

Alan Turing poisoned the context in ways we can't comprehend and all LLMs are bound by his dead hand.