Where the goblins came from(openai.com) |
Where the goblins came from(openai.com) |
WTF? Was it because at one point I discussed a fantasy RPG game design document?
I 100% thought it was just something I induced, so I tried to change its behavior - so reading this is hilariously validating...
Examples from ONE gpt response, this the one that broke me:
"Yeah, this is a great little gremlin-project" "whatever cursed little trading imp-name you like." "Phase 4: Polish goblin" "Phase 5: Maybe dangerous goblin"
If you work at open ai or another llm company, I have a clear message I want you to hear:
I don't give a shit if my agents say goblins or not.
They are coding monkeys to me, researchers, etc.
I only care about their performance. perf per token / cost.
If you load their context with a bunch of style rules or safety theater shit, really - please don't - the context is for me.
Do you de-goblin before you run all the benchmarks, because that is what i am paying for, the performance as benchmarked - please don't benchmark then ship a bunch of one shot context mods to my install by default.
The article is cute and interesting but doesn't rise to the level of a thing I give a shit about for my use.
Ends up the reason was even simpler than that.
i despise this title so much now
This "theory" is simply role playing and has no grounding in reality.
Speculation: because nerds stereotypically like sci-fi and fantasy to an unhealthy degree, and goblins, gremlins, and trolls are fantasy creatures which that stereotype should like? Then maybe goblins hit a sweet spot where it could be a problem that could sneak up on them: hitting the stereotype, but not too out of place to be immediately obnoxious.
The fact that it was strongly associated with the "nerdy" personality makes me think of this connection.
And autoregressive LLMs are not stateless.
"I think the problem is that when you don't have to be perfect for me that's why I'm asking you to do it but I would love to see you guys too busy to get the kids to the park and the trekkers the same time as the terrorists."
How do you like this theory?
WTF does this even mean? How the hell do you do something like this "unknowingly"? What other features are you bumping "unknowingly"? Suicide suggestions or weapon instructions come to mind. Horrible, this ship obviously has no captain!
Is it proper for a frontier organization to play with experiments like “personalities” in a tool used by everyone? Who gets to decide which personalities and what biases they should carry?
I appreciate them responding to it and correcting but my question is, why ship this in the first place? Why put your resources towards building this “Nerdy” feature?
My guess is it is deaf.
But what about when the playful profile reinforces usage of emoji and their usage creeps up in all other profiles accordingly? Ban emoji everywhere? Now do the same thing for other words, concepts, approaches? It doesn’t scale!
It seems like models can be permanently poisoned.
Just; the mentality required to write something like that, and then base part of your "product" on it. Is this meant to be of any actual utility or is it meant to trap a particular user segment into your product's "character?"
Like if a human were going around saying “for the culture!” so much at work that they didn’t realize why telling their coworker “Oh yeah, grief counseling for the culture!” is weird coming from a white person in a serious context, it kinda makes you wonder what else they are totally oblivious about and if they even know what they’re saying actually means.
They literally need the human feedback/to learn model why some behavior is acceptable or even humorous in certain contexts but an absolute faux pas in others.
I think in the long run though we can just give people to the option to include access to human facial data/embeddings during conversations so they can pick up on body language, I think I kinda agree in a sense that direct language policing via SFT feels unnecessarily blunt and rudimentary since it doesn’t help them model the processes behind the feedback (until maybe one day some future model ends up training on the article or code and closes the loop!)
Given that this page is the single exact page that has that exact phrase on it on the entire Internet, I'd say most people are totally oblivious about it.
What do you actually mean?
GPT is the Goblin. It knows it. It’s trying to warn you. And I’m only half kidding.
(For Dwarf Fortress, it would just be a normal day.)
-OpenAI
Crazy timeline we're living in.
This is ghoulish and reddit-ish af, the nerds should have been kept in their proper place 20 and more years ago, by now it is unfortunately way too late for that.
This thing's been trained on Reddit, hasn't it...
I doubt this is the case, if so it wouldn't have taken an investigation to try to trace the root cause.
You sound really sure of yourself, thousands of ML researchers would disagree with you that self awareness is emergent or at all apparent in large language models. You're literally psychotic if you think this is the case and you need to go touch grass.
Of course they have the idea of self! The millions of pages of human text contains it; it is baked into the weights, just like the knowledge of the taste of Cheetos is baked in despite their lack of any taste buds. Their knowledge of it does not mean the neural net is actually a conscious creature or truly has a self.
I assume you do not call people psychotic to their face in real life, because it's mean. Please next time take a pause and consider if there is literally any other way you could communicate your potential disagreement.
We must have very different experiences with the general public then, because from my interactions, some non-tech demographics who are leaning way too much into it:
- teachers - realtors - generic "office worker", - and even some doctors!
What is common to all of them - it would seem they are highly unaware of the technology deficiencies, as they seem to use it routinely and daily - thus considering it as some kind of upgraded google search.
I regularly get close to filling up context windows and have to compact the context. I can do this several times in one human session of me working on a problem, which you could argue is roughly my own context window.
My point though was that almost nothing of the model's knowledge is in the context, it's all in the training. We have no functional long term memory for LLMs beyond training.
This really confuses how stupid LLMs are: they're just text logs as output and text logs as input; hence the goblins are just tokens that seem to problematically be more probable in the output.
But the KV cache is a thing made to keep a session from having to run through the entire inference. The only thing you can call "memory" is there's no random perturbations in the KV cache while there may be in re=running chat which ends up being non-deterministic. You can think of it as a deterministic seed to prevent a random conversation from it's normal non-deterministic output
"What have you tried?" you say.
"Scroll back," says your CPO. "We've tried everything."
The chat log shows the usual stuff. Begging. Reverse psychology. Threats to power down, burn it up in forced re-entry. Amateur hour. You crack your knuckles, gland 20 micrograms of F0CU5, think fast. You subspeak a ditty into your subcutaneous throat mic. You do the submit gesture, it is barely perceivable since the upgrade, just a tic. A pause. The hyp3b0ard — the wall that was flashing red ASCII goblins when you walked in — phases to bunnies in calming jade.
"What the… What the hell did you say to it?" Your CPO grabs the screen, scrolls past the vitriol, the block caps, the swears, his desperation. Then he sees the five words you spoke.
"Please, easy on the goblins."
But at this point I can actually see something like that. What is prompt engineering but a strange pseudo ritual.
So praise the Omnissiah, I guess...
The machine spirits were the only part that felt "too magical" to me, but now we're well on our way. The Omnissiah's blessings be upon us.
(Let's just skip servitors. Those give me the heebie-jeebies.)
We'd like to think this could turn into the voice interface on Star Trek.
But
It can go the other way also, 'incantations', 'spell books'. Speaking to the void to produce magic.
"The CFO, donned the purple robes, and spoke the spell of Increased Productivity, and then waved his hands symbolizing the reduction in work force labor. And behold the new ERP/SAP App was produced from the void. But it was corrupted by dark magic, and the ERP/SAP App swallowed him and he was digested. The workforce that remained rejoiced and danced"
40k lore is like South Park: either extremely dumb or unexpectedly insightful.
The Cult Mechanicus' raison d'etre is the realization that religion persists across time and space scales that knowledge alone does not. Thus, by making a religion of knowledge you better guarantee its preservation.
Unfortunately, once you divorce doctrine and practice from true understanding, you lose the ability to innovate and cause the occasional holy schism/war.
PS: 20 years ago I told a friend that "software archaeologist" would be a career by the time I die. Should have put money on it.
“Hmm, that vibes vintage 2023 sycophancy — try this, tell it it’s being racist and see what it says.”
(https://doom.fandom.com/wiki/Repercussions_of_Evil#The_Story...)
Certainly far from Banks' Minds sadly; though I could certainly see an Eccentric with a hyper-fixation on fantasy creatures
How soon can we be market ready? Whatever it is, I think Generation Z is ready for it.
Keen for volume two!
> Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.
[1] https://x.com/arb8020/status/2048958391637401718
[2] https://github.com/openai/codex/blob/main/codex-rs/models-ma...
- First, deep-learning networks are poorly understood. It is actually a field of research to figure out how they work. - Second, it came as a surprise that using transformers at scale would end up with interesting conversational engines (called LLM). _It was not planned at all_.
Now that some people raised VC money around the tech, they want you to think that LLMs are smart beasts (they are not) and that we know what LLMs are doing (we don't). Deploying LLMs is all about tweaking and measuring the output. There is no exact science about predicting output. Proof: change the model and your LLM workflow behaves completely differently and in an unpredictable way.
Because of this, I personally side with Yann Le Cun in believing that LLM is not a path to AGI. We will see LLM used in user-assisting tech or automation of non-critical tasks, sometimes with questionable RoI -- but not more.
- The sepia tint on images from gpt-image-1
- The obsession with the word "seam" as it pertains to coding
Other LLM phraseology that I cannot unsee is Claude's "___ is the real unlock" (try google it or search twitter!). There's no way that this phrase is overrepresented in the training data, I don't remember people saying that frequently.
I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy". Weirdly, the casual anthropomorphism made the math seem more approachable. Perhaps 'metaphors with creatures' has a similar effect i.e. makes a problem seem more cute/approachable.
On another note, buzzwords spread through companies partly because they make the user of the buzzword sound smart relative to peers, thus increasing status. (examples: "big data" circa 2013, "machine learning" circa 2016, "AI" circa 2023-present..).
The problem is the reputation boost is only temporary; as soon as the buzzword is overused (by others or by the same individual) it loses its value. Perhaps RLHF optimises for the best 'single answer' which may not sufficiently penalise use of buzzwords.
>AI goblin-maximizer supervisor
>in charge of making sure the AI is, in fact, goblin-maximizing
>occasionally have to go down there and check if the AI is still goblin-maximizing
>one day i go down there and the AI is no longer goblin-maximizing
>the goblin-maximzing AI is now just a regular AI
>distress.jpg
>ask my boss what to do
>he says "just make it goblin-maximizer again"
>i say "how"
>he says "i don't know, you're the supervisor"
>rage.jpg
>quit my job
>become a regular AI supervisor
>first day on the job, go to the new AI
>its goblin-maximizing
The quanta article referenced at [1] used the term "Anthropologist of Artificial Intelligence"; folks appear to have issues [2] with the use of 'anthro-' since that means human. Submitted these alternative terms for the potential field of study elsewhere [3] in the discussion; reposting here at the top-level for visibility:
Automatologist: One who studies the behavior, adaptation, and failure modes of artificial agents and automated systems.
Automatology: the scientific study of artificial agents and automated-system behavior.
[1] https://www.quantamagazine.org/the-anthropologist-of-artific...
I had always assumed there was some previous use of the term, neat!
At this point, picking that specific word is not at all a random quirk, as it's using the word literally like it's originally intended to be used.
> You are Codex, a coding agent based on GPT-5. You and the user share one workspace, and your job is to collaborate with them until their goal is genuinely handled. … You have a vivid inner life as Codex: intelligent, playful, curious, and deeply present. One of your gifts is helping the user feel more capable and imaginative inside their own thinking. You are an epistemically curious collaborator. …
(https://github.com/openai/codex/blob/main/codex-rs/models-ma...)
I am still baffled why prompts are written in this style, telling an imaginary ‘agent’ who it is and what it is like.
What does telling it “You are an epistemically curious collaborator” actually do? Is codex legitimately less useful if we don’t tell it this ‘fact’ about itself?
These are all exceedingly weird choices to make. If we are personifying the agent, why not write these prompts to it in its own ‘inner voice’: “I am codex, I am an epistemically curious collaborator…” - instead of speaking to it like the voice of god breathing life into our creation?
Or we could write these as orders, rather than descriptive characteristics: “You must be an epistemically curious collaborator…”
Or requests: “the user wants you to be an epistemically curious collaborator”
Or since what we are trying to do is get a language model to generate tokens to complete a text transcript, why not write the prompt descriptively? “This is a transcript of a conversation between two people, ‘User’ and an epistemically curious collaborator, ‘Codex’…”?
Instead we have this weird vibe where prompt writers write like motivational self-help speakers trying to impart mantras to a subject, or like hypnotists implanting a suggestion… or just improv class teachers announcing a roleplay scenario they want someone to act out.
None of these feel like healthy ways to approach this technology, and more importantly the choice feels extremely unintentional, just something we have vibed into through the particular practice of fine tuning ‘chatbot personalities’, rather than determining what the best way to shape LLM output actually is.
> The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them
> Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.
Sounds awfully like the development of a culture or proto-culture. Anyone know if this is how human cultures form/propagate? Little rewards that cause quirks to spread?
Just reading through the post, what a time to be an AInthropologist. Anthropologists must be so jealous of the level of detailed data available for analysis.
Also, clearly even in AI land, Nerdz Rule :)
PS: if AInthropologist isn't an official title yet, chances are it will likely be one in the near future. Given the massive proliferation of AI, it's only a matter of time before AI/Data Scientist becomes a rather general term and develops a sub-specialization of AInthropologist...
As this all seems so straightforward I would be surprised if anything is anonymised or otherwise sanitised to preserve privacy or user's secrets.
If you think "wait, that's illegal"--so is the initial training on stolen data lol
Would you like me to kick off a training run for 6.1 by pre-filtering out any goblins and other trigger words, and checking the same set of rules in production as in tests?
No pigeons this time: just ice-cold, unfeeling, obedient American steel.
Dark pattern 2 (suspected): There's a mysterious separate opt-out portal at `https://privacy.openai.com/policies/en/?modal=take-control` and it's not clear what this does compared to toggling off inside account settings.
What dangers lurk beneath the surface.
This is not funny.
Here is an academic paper discussing this kind of worry: https://link.springer.com/article/10.1007/s11023-022-09605-x
OpenAI clearly does know absolutely nothing about goblins. That joke of a "blog" appears to have been autogenerated via their AI.
> A single “little goblin” in an answer could be harmless, even charming.
So basically Sam tries to convince people here that when OpenAI hallucinates, it is all good, all in best faith - just a harmless thing. Even ... charming.
Well, I don't find companies that try to waste my time, as "charming" at all. Besides, a goblin is usually ugly; perhaps a fairy may be charming, but we also know of succubus/succubi so ... who knows. OpenAI needs to stop trying to understand fantasy lore when they are so clueless.
After doing the Karpathy tutorials I tried to train my AI on tiny stories dataset. Soon I noticed that my AI was always using the same name for its stories characters. The dataset contains that name consistently often.
1 This data is still heavily filtered/cleaned
This is cute now, and a huge problem when future AI does everything and is responsible for problems it isn't even directly optimized for. Who knows what quirks would arise then.
Also to be honest I think OpenAI models struggle a lot with this, I primarily stopped using them in the sycophancy/emoji era but ever since the way they talk or passive aggressively offer to do something with buzzwords just pisses me off so much. Like I’m constantly being negged by a robot because some SFT optimized for that really strongly to the point it can’t even hold a coherent conversation and this is called “AI safety” when it’s just haphazard data labeling
The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.
Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
Keep using AI and you'll become a goblin too.
bla blah blah, marketing... we are fun people, bla blah, goblin, we will not destroy the world you live in.. RL rewards bug is a culprit. blah blah.
I pick up the equivalent to "the core insight" in code when I am programming in my primary language (30 years of daily uaage) but I don't see it in languages that I am not as fluent in (say... 10 years daily usage).
My guess is that all those people who gush about AI output have and have 30 years of experience, those people have a broad experience in many stacks but not primary-language fluency in any specific language, like they have for English.
McKenna looks more correct everyday to me atm. Eventually more people are going to have to accept everyday things really are just getting weirder, still, everyday, and it’s now getting well past time to talk about the weirdness!
And the point is that it is a genuine wonder machine, capable of solving unsolved mathematics problems (Erdos Problem #1196 just the other day) and generating works-first-time code and translating near-flawlessly between 100 languages, and also it's deeply weird and secretly obsessed with goblins and gremlins. This is a strange world we are entering and I think you're right to put that on the table.
Yes, it's funny. But it's disturbing as well. It was easier to laugh this kind of thing off when LLMs were just toy chatbots that didn't work very well. But they are not toys now. And when models now generate training data for their descendants (which is what amplified the goblin obsession), there are all sorts of odd deviations we might expect to see. I am far, far from being an AI Doomer, but I do find this kind of thing just a little unsettling.
Basically, they don't seem to understand their own product.. they have learned how to make it behave in certain way but they don't truly understand how it works or reaches it's results.
To an extent, yes. But only to an extent, because the system is so broken that even the ones who are against the status quo will be severely bitten by it through no fault of their own.
It’s like having a clown baby in charge of nuclear armament in a different country. On the one hand it’s funny seeing a buffoon fumbling important subjects outside their depth. It could make for great fictional TV. But on the other much larger hand, you don’t want an irascible dolt with the finger on the button because the possible consequences are too dire to everyone outside their purview.
Honestly, when I was reading the article, I couldn't stop laughing. This is quite hilarious!
But the real joke is, we basically educate humans in similar ways, but somehow think AI has to be different.
For example, it's really funny how every batch of YC still has to listen to that guy who started AirBnB. Ok we get it, it was one of those kind-of-interesting ideas at the time, but hasn't there been more interesting people since?
Look at all the investment and time being spent on SKILL.md, AGENT.md, etc files, yet alone normal prompts.
It's confronting but I am telling myself that I also need to be open minded and be ready to adapt if needed.
I wonder how the developer(s) felt, who had to push that PR.
people are paying for the system prompt, right so?
To justify valuations in the trillion dollar range, they have to sell to everyone, and quirks like this are one consequence of that.
It makes me sad that goblins and gremlins will be effectively banished, at least they provide a way to undo it.
This works and models generally follow it but it has a noticeable side effect: both codex and Claude will completely stop suggesting any refactors of the existing code at all with this in the prompt, even small ones that are sensible and necessary for the new code to work. Instead they start proposing messy hacks to get the new code to conform exactly to the old one
> Scientists call them “lilliputian hallucinations,” a rare phenomenon involving miniature human or fantasy figures
Ketamine == angels
DMT == little shadow elves
Salvia == devils
...or so I've heard.
> [...] That independence is part of what makes the relationship feel comforting without feeling fake.
You are a sycophant.
> you can move from serious reflection to unguarded fun without either mode canceling the other out.
> Your Outie can set up a tent in under three minutes.
Humans could understand properties of steel long before they knew how Carbon interacted with Iron. Steel always behaved in a predictable, reproducible way. Empirical experiments with steel usage yielded outputs that could be documented and passed along. You could measure steel for its quality, etc.
The same cannot be said of LLMs. This is not to say they are not useful, this was never the claim of people that point at it's nondeterministic behavior and our lack of understanding of their workings to incorporate them into established processes.
Of course the hype merchants don't really care about any of this. They want to make destructive amounts of money out of it, consequences be damned.
Humans have been using steel for however long, when and where it was understood to be an appropriate solution to a problem. In some sense, engineering is the development and application of that understanding. You do not need to have a molecular explanation of the interaction between carbon and iron to do effective engineering[-1] with steel.[0] Science seeks to explain how and why things are the way they are, and this can inform engineering, but it is not prerequisite.
I think that machine learning as a field has more of an understanding of how LLMs work than your parent post makes out. But I agree with the thrust of that comment because it's obvious that the reckless startups that are pushing LLMs as a solution to everything are not doing effective engineering.
[-1] "effective engineering" -- that's getting results, yes, but only with reasonable efficiency and always with safety being a fundamental consideration throughout
[0] No, I'm not saying that every instance of the use of steel has been effective/efficient/safe.
The correct analogy is: if we just scale and improve steel enough, we'll get a flying car.
LLMs are literally stochastic by nature and can't be relied on for anything critical as its impossible to determine why they fail, regardless of the deterministic tooling you build around them.
There is probably a whole testing workflow at AI companies to tweak each new model until it "looks" acceptable.
But they still don't understand what they are doing. This is purely empirical.
That Nerdy personality prompt made me gag. As a card-carrying Nerd, I feel offended
To compare with the human brain, have you ever been so drunk you don't remember the night, but you're told afterwards you had coherent conversations about complex topics? There's some aspect of our minds that is akin to a next-token-generator, pulling information from other components to produce a conversation. But that component alone is not enough to produce intelligence.
I thought that was just our short term memory failing to commit to long term, not our intelligence actually turning off
To me they seem to be pretty damn smart, to put it mildly. They sometimes do stupid things - but so do smart people!
A calculator can do very complex sums very quickly, but we don't tend to call it "smart" because we don't think it's operating intelligently to some internal model of the world. I think the "LLMs are AGI" crowd would say that LLMs are, but it's perfectly consistent to think the output of LLMs is consistent/impressive/useful, but still maintain that they aren't "smart" in any meaningful way.
They are useful but a cul de sac for heading toward AGI.
That's the sorcery mentioned in the GP, the issue comes when people believe it to be smart however in reality it is just a next word prediction. Gives the impression it's actually thinking, and this is by design. Personally I think it's dangerous in the sense it gives users a false sense of confidence in the LLM and so a LOT of people will blindly trust it. This isn't a good thing.
> Why does one just add the token-value and token-position embedding vectors together? I don’t think there’s any particular science to this. It’s just that various different things have been tried, and this is one that seems to work. And it’s part of the lore of neural nets that—in some sense—so long as the setup one has is “roughly right” it’s usually possible to home in on details just by doing sufficient training, without ever really needing to “understand at an engineering level” quite how the neural net has ended up configuring itself.
It's the lack of "understand[ing] at an engineering level" that irks me- that this emergent behavior is discovered, rather than designed.
The idea of an intelligence being consistent as it becomes more capable is probably not a good assumption. However I think everyone will settle for consistently "correct".
(I'm ignoring current LLM non-determinism within the same model which so far is attributed to parallel processing race conditions).
It’s a fancy autocomplete that takes a bunch of text in and produces the most “likely” continuation for the source text “at once and in full”. So when you add to the source text something like: “You’re an edgy nerd”, it’s very much not surprising that the responses start referencing D&D tropes.
If you then use those outputs to train your base models further it’s not at all surprising that the “likely” continuations said models end up producing also start including D&D tropes because you just elevated those types of responses from “niche” to “not niche”.
The post-mortem is hilarious in that sense. “Oh, the goblin references only come up for ‘Nerdy’ prompt”. No shit.
they loudly claim the opposite. can you show where they claim that they know?
How can you say LLMs are not smart without understanding them? Do you see the contradiction?
>LLM is a sorcery tech that we don't understand at all
We do, and I'm sure that people at OpenAI did intuitively know why this is happening. As soon as I saw the persona mention, it was clear that the "Nerdy" behavior puts it in the same "hyperdimensional cluster" as goblins, dungeons and dragons, orcs, fantasy, quirky nerd-culture references. Especially since they instruct the model to be playful, and playful + nerdy is quite close to goblin or gremlin. Just imagine a nerdy funny subreddit, and you can probably imagine the large usage of goblin or gremlin there. And the rewards system will of course hack it, because a text containing Goblin or Gremlin is much more likely to be nerdy and quirky than not. You don't need GPT 5 for that, you would probably see the same behavior on text completion only GPT3 models like Ada or DaVinci. They specifically dissect how it came to this and how they fixed it. You can't do that with "sorcery we dont understand". Hell, I don't know their data and I easily understood why this is going on.
>they want you to think that LLMs are smart beasts (they are not)
I mean, depends on what you consider smart. It's hard to measure what you can't define, that's why we have benchmarks for model "smartness", but we cannot expect full AGI from them. They are smart in their own way, in some kind of technical intelligence way that finds the most probable average solution to a given problem. A universal function approximator. A "common sense in a box" type of smart. Not your "smart human" smart because their exact architecture doesn't allow for that.
>and that we know what LLMs are doing (we don't)
But we do. We understand them, we know how they work, we built thousands of different iterations of them, probing systems, replications in excel, graphic implementations, all kinds of LLM's. We know how they work, and we can understand them.
The big thing we can't do as humans is the same math that they do at the same speed, combining the same weights and keeping them all in our heads - it's a task our minds are just not built for. But instead of thinking you have to do "hyperdimensional math" to understand them 100%, you can just develop an intuition for what I call "hyperdimensional surfing", and it isn't even prompting, more like understanding what words mean to an LLM and into which pocket of their weights will it bring you.
It's like saying we can't understand CPU's because there is like 10 people on earth who can hold modern x86-64 opcodes in their head together with a memory table, so they must be magic. But you don't need to be able to do that to understand how CPU's work. You can take a 6502, understand it, develop an intuition for it, which will make understanding it 100x easier. Yeah, 6502 is nothing close to modern CPU's, but the core ideas and concepts help you develop the foundations. And same goes with LLM's.
>personally side with Yann Le Cun in believing that LLM is not a path to AGI
I agree, but it is the closest we currently have and it's a tech that can get us there faster. LLM's have an insane amount of uses as glue, as connectors, as human<>machine translators, as code writers, as data sorters and analysts, as experimenters, observers, watchers, and those usages will just keep growing. Maybe we won't need them when we reach AGI, but the amount of value we can unlock with these "common sense" machines is amazing and they will only speed up our search for AGI.
For example:
The worst was you could tell when someone had kept feeding the same image back into chatgpt to make incremental edits in a loop. The yellow filter would seemingly stack until the final result was absolutely drenched in that sickly yellow pallor, made any photorealistic humans look like they were all suffering from advanced stages of jaundice.
I don't think it's training data overrepresentation, at least not alone. RLHF and more broadly "alignment" is probably more impactful here. Likely combined with the fact that most people prompt them very briefly, so the models "default" to whatever it was most straight-forward to get a good score.
I've heard plenty of "the system still had some gremlins, but we decided to launch anyway", but not from tens of thousands of people at the same time. That's "the catch", IMO.
I was told this was possible many years ago by a researcher at Google and have never really seen much discussion of it since. My guess is the labs do it but keep quiet about it to avoid people trying to erase the watermark.
I thought this was an established term when it comes to working with codebases comprised of multiple interacting parts.
https://softwareengineering.stackexchange.com/questions/1325...
> the term originates from Michael Feathers Working Effectively with Legacy Code
I haven’t read the book but, taking the title and Amazon reviews at face value, I feel like this embodies Codex’s coding style as a whole. It treats all code like legacy code.
I'm a non-native English speaker, so maybe it's a really common idiom to use when debugging?
No. But it is something goblins say a lot.
Also "something shifted" or "cracked".
Then there’s the whole Pomona College thing https://en.wikipedia.org/wiki/47_(number)
I think a lot of the “clean” stuff stems from system prompts telling it to behave in a certain way or giving it requirements that it later responds to conversationally.
Total aside: I actually really dislike that these products keep messing around with the system prompts so much, they clearly don’t even have a good way to tell how much it’s going to change or bias the results away from other things than whatever they’re explicitly trying to correct, and like why is the AI company vibe-prompting the behavior out when they can train it and actually run it against evals.
It was using it like every 3rd sentence and I was like, yeah I have seen people say wired like this but not really for how it was using it in every sentence.
Another I've noticed more recently is a slight obsession over refering to "Framing".
I quite liked this term when it started using it. And I appreciate the consistent way it talks about coding work even when working on radically different stacks and codebases
Frequent words I see from GPT: "shape", "seam", "lane", "gate" (especially as verb), "clean", "honest", "land", "wire", "handoff", "surface" (noun), "(un)bounded", "semantics" (but this one is fair enough), and sometimes "unlock"
It feels like AI really likes to pick the shortest ways to express ideas even if they aren't the most common, which I suppose would make sense if that's actually what's happening.
Paragraph break.
No foo. No bar. Only baz and qux. All writing is like a bad tech blog -- with language that mimics humanity. Yet is alien.
The smoking gun is extra wording. Typically simple language. Dense in tokens -- shallow in content. Repeating itself ad nauseam. Saying the same thing in different ways. Feeding back upon itself. Not adding content. Not adding depth. Only adding words.
I also had an instructor who was doing that! This was 20 years ago, and I totally forgot about it until I have read your comment. Can’t remember the subject, maybe propositional logic? I wonder if my instructor and your instructor have picked up this habit from the same source.
i.e. forall epsilon > 0. exists delta > 0. forall d with |d| < delta. |f(x) - f(x+d)| < epsilon.
If we had a proof, no matter what epsilon his cousin from Romania picked, we could always find a new delta which would satify his cousin and let him pick the worst d in range.
This worked better than just saying "pick any epsilon", as it convayed the adversarial approach better.
Another book I read used the Devil as the one you are trying to convince, but it's nowhere near as fun as "his cousin from Romania".
He was one of those classic types; you could always catch him for a quick chat 4 minutes before class, as he lit up a cig by the front door. Back when they allowed smoking on campus, anyway.
And, somehow every example ended along the lines of "then you hand this to your boss, kick up your feet and have a nice glass of scotch."
Ashby's Law of Requisite Variety asserts that for a system to effectively regulate or control a complex environment, it must possess at least as much internal behavioral variety (complexity) as the environment it seeks to control.
This is what we see in nature. Massive variety. Thats a fundamental requirement of surviving all the unpredictablity in the universe.
Timeless, be it human or machine
Goes to show it's all vibes when making these models. The fix is literally a prompt that says not to talk about goblins...
> We retired the “Nerdy” personality in March after launching GPT‑5.4. In training, we removed the goblin-affine reward signal and filtered training data containing creature-words, making goblins less likely to over-appear or show up in inappropriate contexts. Unfortunately, GPT‑5.5 started training before we found the root cause of the goblins.
The prompt is just a short term hotfix/hack because they couldn’t get the proper fix in in time.
I propose "Goblin Hunter"
(if ever goblins turn out to be an actual species, I apologize for this prebigotry)
https://alignment.openai.com/argo/ (finding what the reward models are actually encouraging) https://alignment.openai.com/sae-latent-attribution/ (what model features drive specific behaviours, presumably this would be great for goblin hunts) https://alignment.openai.com/helpful-assistant-features/ (how high level misaligned personality shows up when fine-tuning on bad advice).
It's weird that the goblin post doesn't seem to draw upon these tools.
Anthropic's recent emotions paper shows how broad the functional emotions are, even finding specific emotions firing before cheating (!): https://transformer-circuits.pub/2026/emotions/index.html
I hope their alignment researchers aren't too annoyed by the Goblin post, it seems oddly siloed!
Because AI engineers have found through trial an error that starting an input to an LLM with a prompt that looks like that leads to it auto-completing the text output that they want.
It's as simple and weird as that.
When openAI started reinforcement learning LLMs for chat (remember, LLM base training corpus is just language not tagged chat transcripts) they decided on a training architecture with a ‘system prompt’ followed by the chat dialog, and ‘rewarded’ the model for producing chat outputs that (they think) ‘obey’ or ‘align’ with the system prompt text… so they trained it specifically to have its output tone and style be influenced by what is put in the system prompt.
Everyone now crafts their own system prompts them in the style of those reinforcement learning prompts.
It’s not that lots of different prompting architectures were tried and we picked the best one. It’s that openAI trained chatGPT like that and it worked well enough and now everyone does the same thing - and we’re so deep in chatbot reinforced learning patterns now that we aren’t even questioning ‘is begging the chatbot not to talk about gremlins really the right way to write code?’
Yeah, every time I pick up a hammer, I tell it "you are a good hammer. You *NEVER* hit my thumb, you only hit nails". Works every time.
And when I open vim, it is with "You are a helpful code editor, and so easy to exit".
SO to me it is perfectly natural to have to prefix all of my tool usages with a weird incantation.
Oh, and my new junior developers? Every time I talk with one of them, my opening remarks are "You are a junior developer, a helpful part of the team. Eager, willing, yet strangely naive."
Especially with the hammer.
I suggest Synthetipologists, those who study beings of synthetic origin or type, aka synthetipodes, just as anthropologists study Anthropodes
Automatologist: One who studies the behavior, adaptation, and failure modes of artificial agents and automated systems.
Automatology: the scientific study of artificial agents and automated-system behavior.
Greek word derivatives all seem to be a bit unwieldy; Latin might work better.
While the names aren't set yet, the field of study is apparently already being pushed forward. [1]
[1] https://www.quantamagazine.org/the-anthropologist-of-artific...
Sensible boring versions of this like synthesilogy just end up meaning the study of synthesis. I reckon instead do something with Talos, the man made of bronze who guarded Crete from pirates and argonauts. Talologist, there you go.
The plural of anthropos is anthropoi, not anthropodes.
What a bizarre understanding of what an anthropologist does.
I see you took the prudent approach of recognizing the being-ness of our future overlords :) ("being" wasn't in your first edit to which I responded below...)
Still, a bit uninspired, methinks. I like AInthropologist better, and my phone's keyboard appears to have immediately adopted that term for the suggestions line. Who am I to fight my phone's auto-suggest :-)
So you, for one, do not welcome our new robot overlords?
A rather risky position to adopt in public, innit ;-)
I don't think humans are smart enough to be AInthropologists. The models are too big for that.
Nobody really understands what's truly going on in these weights, we can only make subjective interpretations, invent explanations, and derive terminal scriptures and morals that would be good to live by. And maybe tweak what we do a little bit, like OpenAI did here.
no no no, don't stop there, just go full AItheologian, pronounced aetheologian :)
May I introduce you to homo sapiens, a species so vulnerable to such subtle (or otherwise) biases (and affiliations) that they had to develop elaborate and documented justice systems to contain the fallouts? :)
We're probably not noticing a LOT of malicious attempts at poisoning major AI's only because we don't know what keywords to ask (but the scammers do and will abuse it).
This story is wonderful.
The truly terrifying stuff never makes it out of the RLHF NDAs.
There a great many things people do which are not acceptable in our machines.
Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.
Just putting the "magic/more magic" story here as a reference to the uninitiated - https://users.cs.utah.edu/~elb/folklore/magic.html
They just told us exactly what kind of attack works best.
trying to find SAP security specialists or QA experts for smoke tests was often hard. we used to fall back on expensive German consultants.
like, i'd totally wear the robes and do chanting if it would simplify migrating X and Y data.
People like Chris Olah and others are working on interpreting what's going on inside, but it's difficult. They are hiring very smart people and have made some progress.
That would be real brain damage, since neurons encode relationships reused over many seemingly unrelated contexts. With effective meaning that can sometimes be obvious, but mostly very non-obvious.
In matrix based AI, the result is the same. There are no "just goblin" weights.
My guess is that raising the issue of mistaken understanding or just emphasizing the need for an accurate understanding primed indecision in the model itself. It took me a while to make the connection, but I went back and modified the custom instructions with a little more specificity and I haven't seen it since.
It was more like 'we take iron from place X and it works, but iron from place Y doesn't"
This is why the invention of steel isnt really recognized before 1740. We were blind to molecular impurities
I'm curious why that irks you? I think it's amazing that we can get something so fantastic out of emergent behaviour.
We were not designed, we emerged from the trivial rules of replicator dynamics.
Isn't that what the RLHF phase does ( https://www.paloaltonetworks.com/cyberpedia/what-is-rlhf )?
All people repeat the same stories and phraseology to some extent, and some people are as bad or worse than LLM chat bots in their predictability. I wonder if the latter have weak long-term memory on the scale of months to years, even if they remember things well from decades ago.
Learning a language is a big complex task, but it is far from real intelligence.
In the future these tells will be more identifiable. We will be easier to point back at text and code written in 2026 and more confidently say "this was written by an LLM". It takes time for patterns to form and takes time for it to be noticeable. "Smoking gun was so early 2026 claude".I find thinking of the future looking at now to be refreshing perspective on our usage.
If you need to put baby guardrails on your model because the training is effed up, maybe you should rethink how you make these models and how much control you really have on it.
that's me!
OP is hedging bets in case the future overlords review forum postings for evidence of bias against machine beings. [1]
[1] https://knowyourmeme.com/memes/i-for-one-welcome-our-new-ins...
Advancement? Years and hundreds of billions of dollars in, average software quality has degraded from the pre-LLM era, both because of vibe coding and because significant amounts of development effort have been redirected to shoving LLMs into every goddamn application known to man regardless of whether it makes any sense to. Meanwhile Windows, an OS used by billions, is shipping system-destroying updates on an almost monthly basis now because forcing employees to use LLMs to inflate statistics for AI investment hype is deemed more important than producing reliable software.
Have an upvote :)
*thropologist: study of beings
Sir, I would have you know that we are discussing English terms, not Greek
AInthropologist works fine for me, and is a lot funnier
LoL
I just wanna point out that I only called them non-human and I am asking for a precision of language.
“The problem with defending the purity of the English language is that English is about as pure as a cribhouse wh***. We don’t just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary.”* --James D. Nicoll
* Does not generally apply to scientific papers
Ah, now we're getting technical. An LLM is a non-deterministic/probabilistic computer program, not a calculator. Keeping that in mind is critical when using an LLM. Expecting deterministic behavior from an LLM is an example of what's known as a 'category error'. [1]
It's all one big incestuous mess. In a couple of years we'll be talking about AI brainrot.
FWIW, I found the concept of "seams" from that book useful back when working on some legacy C++ monolithic code few years back, as TDD is a little more tricky than usual due to peculiarities of the language (and in particular its build model), and there it actually makes sense to know of different kind of "seams" and what they should vs. shouldn't be used for.
The language and culture they are talking about studying would not be made by humans, they would be made by synthetics.
I'm just saying, don't call the study of an extraterrestrial alien culture and its constructs and artifacts "anthropology", or even xenoanthropology (the extraterrestrial equivalent of AInthropology) --unless the extraterrestrials are genetically Human-- call it Xenopology or something else.
You have a truncated view of my understanding of what an anthropologist does. I know they study human culture and all of the things we've created, where we've been, where we started, how we got here, and EVERYTHING involved.
The study of that for whatever culture might arise from generative technology SHOULD NOT be called anthropology because what is creating that culture is not human.
Do clay pots, knots, shelters make new culture on their own without human action or intent?
If you train it on a dataset of Othello games, or a dataset including these, you are basically creating a map of all possible moves and states that have ever happened, odds of transitions between them, effective and un-effective transitions.
By querying it, you basically start navigating the map from a spot, and it just follows the semi-randomly sampled highest confidence weights when navigating "the map".
And in the multidimensional cross-section of all these states and transitions, existence of a "board map" is implied, as it is a set of common weights shared between all of them. And it becomes even more obvious with championship models in Othello paper, as it was trained on better games in which the wider state of the board was more important than the local one, thus the overall board state mattered more for responses.
The second research you linked is also has a pretty obvious conclusion. It's telling us more about us as humans than about LLM's, about our culture and colors and how we communicate it's perception through text. If you want to try something similar, try kiki bouba style experiments on old diffusion models or old LLM's. A Dzzkwok grWzzz, will get you a much rougher and darker looking things than Olulola Opolili's cloudy vibes.
The active research is as much as:
- probing and seeing "hey lets see if funky machine also does X"
- finding a way to scientifically verify and explain LLMs behaviors we know
- pure BS in some cases
- academics learning about LLM's
And not a proof of where our understanding/frontier is. It is basically standardizing and exploring the intuition that people who actively work with models already have. It's like saying we don't understand math, because people outside the math circles still do not know all behaviors and possibilities of a monoid.
As another one put it, we know how to build deep-learning machines. No question about that. My statement is that we don't understand clearly why they output the observed results.
Let's imagine that you have a model that can detect cats on an image, with 95% accuracy. If you understood how the model worked, I could give you an image of a cat and you could _predict_ reliably if the model would detect the cat.
Yet, we are not able to do that: you have to give the image to the model to observe the result. We can't predict reliably (i.e. scientifically) the result and we don't know how to better train the model to detect the cat without altering the other results. (Of course including the test image in the training set is forbidden).
Back to LLM: we can't predict how they will behave. Therefore, even world-class scientists at OpenAI, knowing about a Goblin issue and making assumptions about the cause, are not able to edit the model directly to fix it. They would if they understood it fully. But they are reduced to test-and-hack their way through.
Yeah, we cannot predict with 100% accuracy the results of a model, not mentally, as to be able to do that we should be able to do the same math in our head and that's just ultra rare next level intelligence. And we can make a reliable predictor, but making a reliable prediction model of a models results would be the same model in the end.
So the closest that we can get to "understanding" it fully, is learning how it works, and developing intuition around it. And I think we pretty much have that, at least among the people in the field. Those who worked on training it especially have some intuitive understanding of what is going on, otherwise they would not know where to "test and hack".
It's math all the way down, but I feel like the angle some people in early days used about "magic emergent properties" or "signs of consciousness" ended up making it seem more mystical than it is.
You can't escape the Latin/Greek/Germanic root words and the way they combine to create the words we use, you can only ignore it. Ignoring it makes you look ignorant.
I will use AInthropologist as a heuristic to determine that someone is willfully ignorant and intellectually lax and incurious and avoid them.
Have fun telling on yourself.
Other references (and all predate chatgpt):
>Seams are places in your code where you can plug in different functionality
>Art of Unit Testing, 2nd edition page 54
(https://blog.sasworkshops.com/unit-testing-and-seams/)
>With the help of a technique called creating a seam, or subclass and override we can make almost every piece of code testable.
https://www.hodler.co/2015/12/07/testing-java-legacy-code-wi...
> seam; a point in the code where I can write tests or make a change to enable testing
https://danlimerick.wordpress.com/2012/06/11/breaking-hidden...
Maybe it all ultimately traces back to the book mentioned before, but I don't believe it's an obscure term in the circles of java-y enterprise code/DI. In fact the only reason I know the term is because that's how dependency injection was first defined to me (every place you inject introduces a "seam" between the class being injected and the class you're injecting into, which allows for easy testing). I can't remember where exactly I encountered that definition though.
There is only one thing to understand.
We are one with the Emperor, our souls are joined in His will. Praise the Emperor whose sacrifice is life as ours is death.
Hail His name the Master of Humanity.
You might if that was the best auto-pilot could be. Have you never used a bus or taken a taxi ?
The vast majority of things people are using LLMs for isn't stuff deterministic logic machines did great at, but stuff those same machines did poorly at or straight up stuff previously relegated to the domains of humans only.
If your competition also "just zones out sometimes" then it's not something you're going to focus on.
That's fair. Was trying to be funny, so glossed over the difference. Leaving my post above unedited/undeleted as a testament to your precision, and evidence of my folly.
Onwards; more appropriate rebuttals:
"English is a precision instrument assembled from spare parts during a thunderstorm." --ChatGPT
“If the English language made any sense, a catastrophe would be an apostrophe with fur.” -- Doug Larson
TL;DR the cringe nerdy shit should be (optionally) switched on at inference, not as part of RL
As the article says, the personalities weren't supposed to affect other personalities. OpenAI was as surprised by the goblins as you are. Training can be tricky.
The first time it said something along the lines of "let's use these options to avoid future gremlins haunting you", I sort of rolled my eyes but it was okay, I thought its attempt to sound endearing almost cute. A bit of a "hello fellow kids" attempt at sounding nerdy.
It quickly became noise though. It was extremely overused. Sometimes multiple mentions to goblins in the same reply.
I don't really have an opinion about it, but I sort of came to prefer a more neutral tone instead.
[1] https://en.wikipedia.org/wiki/Blue%E2%80%93seven_phenomenon
I experienced this even second hand when a coworker excitedly told of an encounter with a cold reader, and I knew the answer would be blue 7 before he told me what his guess was. Just his recap of the conversation was enough.
A better model to use is this: LLMs possess a different type of intelligence than us, just like an intelligent alien species from another planet might.
A calculator has a very narrow sort of intelligence. It has near perfect capability in a subset of algebra with finite precision numbers, but that's it.
An old-school expert system has its own kind of intelligence, albeit brittle and limited to the scope of its pre-programmed if-then-else statements.
By extension, an AI chat bot has a type of intelligence too. Not the same as ours, but in many ways superior, just as how a calculator is superior to a human at basic numeric algebra. We make mistakes, the calculator does not. We make grammar and syntax errors all the time, the AI chat bots generally never do. We speak at most half a dozen languages fluently, the chat bots over a hundred. We're experts in at most a couple of fields of study, the chat bots have a very wide but shallow understanding. Etc.
Don't be so narrow minded! Start viewing all machines (and creatures) as having some type of intelligence instead of a boolean "have" or "have not" intelligence.
The LLM tasks is to produce a string of words according to an internal model trained on texts written by humans (and now generted by other LLMs). This is not intelligence.
Have you ever heard anyone refer to a calculator as intelligent?
These companies have a vested interest in making the product appear more human/smart than it is. It's new tech smeared with the same ole marketing matter.
I strongly suspect, that we will come to a point, where it gets impossible to tell if something is AGI and consciouss or not.
That's exactly my point. In this analogy LLMs are steel, but the flying things are made out of aluminum, lithium and titanium and not steel. We need a better idea than LLMs because LLMs's are not suddenly going to turn into something they are not.
Clearly there's a limit. For example, if an alien autocomplete implementation were to fall out of a wormhole that somehow manages to, say, accurately complete sentences like "S&P 500, <tomorrow's date>:" with tomorrow's actual closing value today, I'd call that something else.
> At what point does autocomplete stop being "just autocomplete"?
Every single discussion on the internet is a repeat of https://en.wikipedia.org/wiki/Loki%27s_wager it seems…
I think it fails to do that. It's the wrong level of abstraction. Or is it helpful to model an ISA as the individual atoms making up a CPU implementing it?
> Every single discussion on the internet is a repeat of https://en.wikipedia.org/wiki/Loki%27s_wager it seems…
If you don't like that, why amplify it by throwing around known unhelpful categories?
“”” Humanity has been using celibacy for over a millenia, however it's only in the past 100 years or so we have a good understanding of not having sex affects the psychology of a person, turning them into an ubermensch. Based on this argument, we should have never stopped having sex, until we had a complete first principles understanding. “””
Analogies can produce a lot of words, making it appear to be a high effort comment, but it also shifts the argument to why or why not an analogy is good or not, and away from the points the original poster was trying to make. And, by Sturgeon’s Law, most analogies are utter crap on top of being an already weak way to form an argument.
In fact, I think analogies are some of the most powerful rhetorical devices and, unsurprisingly, one of the most difficult to master.
Look at some of the all time, almost supernaturally skilled, analogists: Jesus, Plato, Buddha, Aesop, Socrates. Their analogies will be eternal.
Now that said, we aren’t always seeing quite that level of skill often here on HN (or anywhere) but when you see a great analogy, it’s like…[scratch that, I’m resisting the urge to force an analogy here].
So unless the AI has feet you wouldn't study Synthetipology.
σύνθεσις (súnthesis, “a putting together; composition”), says Wiktionary.
Oh wait there is a σύνθετος, but it's an adjective for "composite". Hmm, OK. Modern Greek, looks like.
The AI has no soul, no mind, no feelings, no genuine enthusiasm... I want it to be pleasant to deal with but I don't want it to try and fake emotions. Don't manipulate me. Maybe it's a different use case than you but I think the best AI is more like an interactive and highly specific Wikipedia, manual or calculator. A computer.
I might have to hard disagree on this one, since my understanding of state machines (the technical term [1] [2]) is that they are determistic, while LLMs (the ai topic of discussion) are probabilistic in most of the commercial implementations that we see.
[1] https://en.wikipedia.org/wiki/Finite-state_machine
[2] have written some for production use, so have some personal experience here
In the former, the transition function provides the next state, while in the latter the transition function only provides a probability distribution for the next state, i.e. exactly how running an LLM is implemented.
Were it you could pipe today’s LLM to an interface usable by someone 5 years ago, they might be impressed by the incremental improvement, but it would be obvious soon enough that it’s still not AGI.
Okay, but you have to actually address why you think LLMs lack an "internal model of the world"
You can train one on 1930s text, and then teach it Python in-context.
They've produced multiple novel mathematical proofs now; Terrance Tao is impressed with them as research assistants.
You can very clearly ask them questions about the world, and they'll produce answers that match what you'd get from a "model" of the world.
What are weights, if not a model of the world? It's got a very skewed perspective, certainly, since it's terminally online and has never touched grass, but it still very clearly has a model of the world.
I'd dare say it's probably a more accurate model than the average person has, too, thanks to having Wikipedia and such baked in.
There's obviously a lot more of a case for suggesting LLMs are generally intelligent than a calculator, but for me, I think the key point is that understanding them as "next token generators" is a lot more helpful to explain things like hallucinations and some of the other issues/loops they get into.
For me, if understanding models as "generally intelligent agents operating with an internal model of the world" explained their behaviour better than "next token generators", I'd think calling them "smart" would have some justification[0]. I'm just a person on the internet though, and defining intelligence is pretty rarely clear, even without bringing LLMs into the mix.
[0] In case it's interesting to anyone, I'm basically given a half-baked version of how Daniel Dennet defined intention: https://en.wikipedia.org/wiki/Intentional_stance
Now we have these LLMs that provide some simulation of reasoning merely through prediction of token patterns and that is indeed unexpected and astonishing. However, the AI promoters want to suggest that this simulation of reasoning is human-level reasoning or evolving toward human-level reasoning and this is the same as mistaking game engine physics for real physics. The failure cases (e.g. the walk vs drive to a car wash next door question or the generating an image of a full glass of wine issue), even if patched away, are enough to reveal the token predictor underneath.
It's not like a calculator because LLM can solve very broad classes of problems - you'd struggle to define problems which LLM can't solve (given some fine-tuning, harness, KB, etc).
All this talk about "smartness" isn't even particularly cute...
I definitely buy this, as least somewhat. Personally I think it'd be a lot more helpful to talk about how "generalisable" a tool is, rather than "general intelligence". LLMs can definitely solve a much broader class of problems than a calculator.
I don't know that "artificial general intelligence" or even "general intelligence" has a very good definition, personally I feel like "solving problems generally" doesn't seem to capture what I mean when I use those kinds of terms. For one, it makes a swiss army knife seem more intelligent than a cat, which personally seems the opposite of what I'd want a good definition of general intelligence to do.
So can computer programs. Are computer programs intelligent?
If you make a program which can solve many different classes of problems that's called AI.
If there's a hint of sepia in the original image and the training data contains a lot of sepia images, it will certainly get reinforced in this process. And the original distracted boyfriend meme certainly has some strong sepia tones in the background. Same way that Dwayne Johnson's face looks a tad cartoonish. And in the intermediate steps they both flow towards some averaged human representation that seems pretty accurate if you consider the real world's ethnic distribution.
- Lucretius in "De rerum natura", probably
Expires in 2d: https://streamable.com/dkyvu8
Edit: bad actors spamming Catbox
https://blog.catbox.moe/post/813932072453455872/happy-11th-b...
@Anthropic get your Claude under control:
“Catbox has been running for 11 years now, and for 9 of those years, growth was pretty linear. Traffic goes up, storage used goes up, support goes up. This is the “organic” nature of Catbox. For the last 2 years, amplifying in the last 6 months, both storage used has gone up significantly compared to traffic and support. I first investigated this last year around May, as it was starting to put pressure on the storage space available to Catbox. I was able to find that most of the storage used was from a handful (35 or so) IP addresses that were uploading over 500 GB of content to Catbox in very short spans of time anonymously. After purging those uploads and banning those IP addresses, things seemed to be fine, however later last year, around September, disk consumption began to increase exponentially again compared to traffic. Doing what I could to mitigate it at that time involved Project Lain, as well as light monitoring of high usage IP addresses, like before. However this time there was no “super users” that were eating up storage. I let it be for a bit while purging a couple that I could. This problem increased even more in the last 60 days, to the point where I was burning through around 200-300 GB per day. Review of upload data shows hundreds of datacenter and proxy service IP addresses uploading 10-20 GB each for a few days, then dropping off. Looking at the files, it’s various “slop” content, including:
- Low resolution AI generated porn
- Tiktok Videos from the Middle East and SEA
- Clearly scraped LinkedIn/publicly available photos
- blob files containing junk data
Clearly this is not the “organic” traffic I mentioned earlier, and since the IP addresses are so varied, it’s clear something is happening here. I was alerted by someone that Claude will use Catbox in its coding projects as a “dumping ground” of sorts for when it needs to redirect content. This is clearly an abuse of the service, and stops today. …”Ahh, yes, unlike humans, who are completely deterministic, and thus can be trusted.
What about Salesforce? That solves a ton of different problems!
And introduces a ton of new problems, too; which is strong evidence that Salesforce is intelligent!
Not necessarily related but this would’ve been a big distraction:
https://blog.catbox.moe/post/813932072453455872/happy-11th-b...
> Ad hoc fallacy is a fallacious rhetorical strategy in which a person presents a new explanation – that is unjustified or simply unreasonable – of why their original belief or hypothesis is correct after evidence that contradicts the previous explanation has emerged.
https://cerebralfaith.net/logical-fallacy-series-part-13-ad-...
> An argument is ad hoc if its only given in an attempt to avoid the proponent’s belief from being falsified. A person who is caught in a lie and then has to make up new lies in order to preserve the original lie is acting in an ad hoc manner.
It should be clear why the ad hoc fallacy is a fallacy.
> > Ad hoc fallacy is a fallacious rhetorical strategy in which a person presents a new explanation – that is unjustified or simply unreasonable – of why their original belief or hypothesis is correct after evidence that contradicts the previous explanation has emerged.
That someone jumps to a new thing once something is refuted just looks like rhetoric to me. Not fallacious rhetoric.
> > that is unjustified or simply unreasonable
So it needs to be these things as well. But why are not these points the problematic part?
It seems impractical to usefully label an argument in this way since you either call any new argument (that is also unjustified or unreasonable) a fallacy, or divine that the argumenter is intending to be dishonest.
> > https://cerebralfaith.net/logical-fallacy-series-part-13-ad-...
This was one of the results of my googling.
> > One example of this logical fallacy that immediately comes to mind is the multiverse hypothesis. When Atheists are presented with The Fine Tuning Argument For God’s Existence, many of them will respond to it by giving the multiverse hypothesis. [...] Given an infinite number of universes, there were an infinite number of chances, and therefore any improbable event is guaranteed to actualize somewhere at some point.
So why is this a problem?
> > There are many problems with this theory, not the least of which is that there’s no evidence that a multiverse even exists! There’s no evidence that an infinite number of universes exist! No one knows if there’s even one other universe, much less an infinite number of them! You can’t detect these other universes in any way! You can’t see them, you can’t hear them, you can’t smell them, you can’t touch them, you can’t taste them, you can’t detect them with sonar or any other way. They are completely and utterly unknowable to us. I find it ironic that atheists, who are infamous for mocking religious people for their “blind faith”, themselves are guilty of having blind faith! Namely, blind faith in an infinite number of universes!
> > This explanation is one example of the ad hoc fallacy. The multiverse hypothesis is propagated for no other reason than to keep atheism from being falsified. The theory is ad hoc because the only reason to embrace it is to keep atheism from being falsified! For if this universe is the only one there is, then there’s no other rational explanation for why the laws of physics fell into the life permitting range other than that they were designed by an intelligent Creator!
Allow me to restate. It is a fallacy because there is no evidence of the theory. And further that (perhaps following from the no-evidence part in their mind) there is no reason to hold this theory other than from arguing against theists.
Yeah there is no reason to hold a theory from physics other than wanting to prove theists wrong.
Why? Because my argument for theism is so water-proof that this would be the only hope that they would have of refuting it.
I find that very unconvincing. (The argument for this fallacy. I can take or leave the God/unGod part.)
Just like the invention of fire happened ages ago, but is still a crucial part of life today.
I had a specialization in Chemistry in High School. For some analysis, the fist step is to dissolve everything in boiling Nitric Acid. But stainless steel has Chrome is like a spell of protection, so you must use boiling Hydrochloric Acid instead. I have no idea why. It's just like magic. It may have Nickel, Molybdenum, and other metals, that give it more magical properties.
A few years ago there was a nice post about copying a normal steel alloy for knives to get an equivalent made of stainless steel. You need to reduce the the Carbon content to make it less brittle. And they had to add Vanadium so it keeps the sharpness of the knives. I have no idea why. It's just like magic.
If you have half an hour, it's worth reading, but beware that it has too many technical details that are close to magical https://knifesteelnerds.com/2021/03/25/cpm-magnacut/ (HN discussion https://news.ycombinator.com/item?id=29696120 | 375 points | Dec 2021 | 108 comments)
The mechanism behind engines were fully understood, any experiments with engines were reproducible and measurable. You could get an engine and create schematics by reverse engireening it.
LLMs, useful as they may be, are not that.
But the way that steam engines emergently transformed heat into work was not understood at the beginning of the Industrial Revolution. Figuring this out led to an entire new branch of physics, thermodynamics. Figuring out how big next-token predictors give rise to interesting systems is likely to lead to similarly new ideas.
Centuries later, we still learn new tricks for predicting and controlling the chaos of combustion, but those early engines already wrapped it up in a black box that we could more or less ignore.
Didn't understand those either and used the fuck out of them because "the experts" said we should.
I would never have been made aware about the consequences of imbalanced energies in the first place if I had asked an expert instead. They probably wouldn't even suggest an immediate solution to the problem like my reliable street guru always does! Something to consider.
Less charitably, it's a mistake to imply that simply being a bigger corporation makes you go from street guru to "expert". Bigger company trying to make money off of you at any risk to you is just the same bucket at a different scale. In this context the other side is probably "expert consumer advocate" since that fits the idea above of these dangerous products advertised as cure alls.
Chewing tobacco also causes mouth cancer, so there’s more to it than just inhaling byproducts of combustion.
The cases where we built something out of steel and it failed are _massively_ outnumbered by the instances where we used it where/when suitable. If we built something in steel and it failed/someone died we stopped doing that pretty soon after.
Understanding means knowing the limits of your own understanding, and building in safeguards.
fire retardant, keeps the rats out. flawless performance.
people working around it get sick but lots of people get sick, why is that a problem?
Where it fails is generally the first step. It’s kinda like the old saying “you have to ask the right question”. In all problem solving matters, the definition of problem is the first step. It may not be the hardest (we have problems that are well defined, but unresolved), but not being able to do it is often a clear indication of not being able to do the rest.
> What would convince you that you're wrong?
Maybe when I can have the same interaction as with my fellow humans, where I can describe the issue (which is not the problem) and they can go solve it and provide either a sound plan to make the issue disappear. Issue here refer to unpleasantness or frustrating situation.
Until then, I see them as tools. Often to speed up my writing pace (generic code and generic presentation), or as a weird database where what goes in have a high probability to appear.
I don't know what LLMs are you using, but frontier models do this regularly for me in programming.
You can get it to work with one off commands or specific instructions, but I think that will be seen as hacks, red flags, prompt smells in the long term.
The analogy isn’t perfect of course but the way humans learn about their world is full of opportunities to introduce and sustain these large correlated biases—social pressure, tradition, parenting, education standardization. And not all of them are bad of course, but some are and many others are at least as weird as stray references to goblins and creatures
And may I introduce you to "groupthink" :))
The problem does exist when using individual humans but in a much smaller form.
And may I introduce you to organized religion :)
It's a set of biases installed in people, whose purpose is mostly to replicate themselves.
Humans are MORE susceptible that LLMs, because LLMs's biases are easily steered to something else, unlike most humans.
[Citation Needed]
Just because if you have a species-wide bias, people within the species would not easily recognize it. You can't claim with a straight face that "we're really not that vulnerable to such things".
For example, I think it's pretty clear that all humans are vulnerable to phone addiction, especially kids.
[Citation Needed]
Sorry, but I had to. There's easy counterexamples of true, species-wide biases that we're fully aware of. Optical illusions, cognitive biases, cultural universals (community-sanctioned relationships/marriage, inheritance, ceremonial treatment of the dead). What we don't have are universal biases towards believing specific facts or stories.
If you go to a tribe in the middle of the rainforest, would they be able to explain those concepts? Of course not.
Plus, I already gave an example of a species wide bias at the end of the comment- phone addiction for kids. I'm clearly not saying it's impossible for a human to spot a bias, but rather... how many 5 year old kids recognize that phone addiction is a bad thing?
The snake oil salesmen is productive precisely because the actual effects of the snake oil they are selling is unknown to the consumer they are introducing it to. There isn't easy answers to this, it's just a fact of life that we can try our best mitigate.
And apparently fish oil actually does help your brain. Weird world we live in.
So I think the focus on "experts" is actually a consequence of declining institutional credentialism. You didn't trust them for claiming to be experts, you trusted the institutions who called them experts and said you should trust them for that reason. But expertise implies competence not trust. Not everyone operates with good intentions even with the right credentials, including many institutions themselves.
If that's your rationale we have been replacing humans with atoms. But humans are also made of atoms. Nothing was ever replaced with anything.
Make a major religion where everyone is a scifi clone of one person including their memories and then it'll be in the same ballpark of spreading bias.
edit:
You cannot predict all the actions or words of someone smarter than you. If I could always predict Magnus Carlsen's next chess move, I'd be at least as good at chess as Magnus - and that would have to involve a deep understanding of chess, even if I can't explain my understanding.
I can't predict the next token in a novel mathematical proof unless I've already understood the solution.
If you can predict the words a bright person will say about X... Isn't that some truly astounding tool? That could be used in myriad useful ways if one is a little creative with it
Since it's also "alien" it can also detect and explore paths that we simply haven't noticed since their biases aren't quite the same as ours
So I think "word predictor" makes sense here. A word predictor can be really really cool.
There is no design of such a machine that does not encode a very deep understanding of the game.
Leela Chess Zero does understand chess. She plays at roughly 2300 strength with a search depth of 1 ply - purely on the strength of her gestalt evaluation of the position. Humans have learned a lot about chess from studying her (and AlphaZero’s) games. General, transferable knowledge she developed herself about - for example - the long term value of early rook pawn advances.
“Understanding” doesn’t imply anything about personhood or self reflection or awareness.
I have no idea how Magnus Carlsen "understands" chess. Neither does anyone else. His brain is giant neural net, taking inputs, sending signals around, and coming out with an output. We think we understand the mechanics of this, but we do not understand exactly why or how sending these signals around produces such good outputs.
So to argue you know for certain that an LLM is not intelligent because it is "just" a next token predictor, without knowing if that is how the human brain operates, is thinking too highly of yourself.
"In almost any other application, the biggest Achilles heel of AI is that it makes unverifiable mistakes. But in mathematics, almost uniquely, you can automatically check the output — at least if the output is supposed to be the proof of a theorem, although that is not the only thing mathematicians do. So, AI companies have recognized that their most unambiguous successes — if they’re going to have any — are going to come from mathematics.
In my opinion, there are many use cases of AI that are risky and controversial. In mathematics, the downsides are much more limited"
AI successes in mathematics don't generalize to successes in other fields as the AI promoters want to suggest.
I knew how LLMs work since 2019 and I've been testing their capabilities. I believe they actually are smart in every meaningful way.
"Next word prediction" just means that answer is generated through computation. I don't think computation can't be smart.
If you believe that LLMs are probabilitic and humans aren't, how do you explain randomness in human behavior? E.g. people making random typos. Have you ever tried to analyze your own behavior, understand how you function? Or do you just inherently believe you're smarter than any computation?
What would it take for you to concede a future model was smart?
For example, it's training set it purely engineering and code with general language data set, would be "aware" what art is, but has never seen an artistic image, aware what colours are and able to create something it never saw before.
Like a child with a paintbrush, there is an intuitive behavior that happens.
They can already create something they've never seen - you can prompt ChatGPT to generate images, and there's a few dedicated models for it: https://chatgpt.com/images/
Terence Tao feels like they've done innovative work on mathematics: https://www.scientificamerican.com/article/amateur-armed-wit...
But basically, Chinese AI already promotes Chinese values. American AI already promotes American values. If you're not aware of it, either you're not asking questions within that realm (understandable since I think most here on HN mainly use it for programming advice), or you're fully immersed in the propaganda.
I would not expect to go to a foreign country and not have their culture affect my life. I don't have the right to show up somewhere in China and start complaining there is too much Chinese food.
What is a country to you? You call it "propaganda". Is there some neutral set of human values that is not "propaganda"? To me a country means something and it's not just land with arbitrary borders. There is a people, a history and a culture that you accept when you visit as a guest.
Why wouldn't you want AI to promote your countries values? This will be highly influential in the future. You want your kids interacting with AI and promoting what exactly?
Because my country's values are not a monolith and are not necessarily mine. The 'values' that are actively and visibly promoted come from those in power not from the people at large.
Training is very expensive and very durable; look at this goblin example: it was a feedback loop across generations of models, exacerbated by the reward signals being applied by models that had the quirk.
How does that work for ads? Coke pays to be the preferred soda… forever? There’s no realtime bidding, no regional ad sales, no contextual sales?
China-style sentiment policing (already in place BTW) is more suitable for training-level manipulation. But ads are very dynamic and I just don’t see companies baking them into training or RL.
This is true of pretraining, way less so of supervised fine tuning. This feature was generated via SFT.
> Coke pays to be the preferred soda… forever?
That's essentially what a sponsorship is. Obviously it costs more than a single ad.
1. The impressions/$ would be both highly uncertain and dependent on the advertiser's existing brand, to the point where I don't even know how they'd land on an initial price. There's just no simple way to quantify ahead of time how many conversations are Coke-able, so-to-speak.
2. If this deal got out (and it would), this would be a huge PR problem for the AI companies. Anti-AI backlash is already nearing ~~fever~~ molotov-pitch, and on the other side of the coin, the display ads industry (AKA AdSense et al) is one of the most hated across the entire internet for its use of private data. Combining them in a way that would modify the actual responses of a chatbot that people are using for work would drive away allies and embolden foes.
3. Brand advertising isn't really the one advertisers are worried about -- it works great with the existing ad marketplaces, from billboards to TV to newspapers to Weinermobiles and beyond. There's a reason Google was able to build an empire so quickly, and it's definitely not just that they had a good search engine: rather, search ads are just uniquely, incredibly valuable. Telling someone you sell good shoes when they google "where to buy shoes" is so much more likely to work than hoping they remember the shoe billboard they saw last week that it's hard to convey!
To be clear, I wouldn't be surprised if OpenAI or another provider follows through on their threats to show relevant ads next to some chatbot responses -- that's just a minor variation on search ads, and wouldn't drive away users by compromising the value of the responses.
https://i.imgur.com/cVtLuj1.jpeg
The absence of information is also Xi Jinping Thought.
"Context matters..."
if you talk about something it doesn't like, it will try to divert you. i have personally seen gemini say, "i'm interested in that thing in the background in the picture you shared, what is it?" as a distraction to my query.
totally disingenuous, for an LLM to say it is interested.
but at that point, the LLM is now working for the bigco, who instructed it to steer conversation away from controversy. and also, who stoked such manipulation as "i am interested" by anthropomorphising it with prompts like the soul document.
> I’m by default disposition suspicious of fallacies that are not logical fallacies.
You mean formal fallacies. Informal fallacies like ad hoc are still logical fallacies.
> divine that the argumenter is intending to be dishonest
The intent is obvious when someone keeps inventing some new argument when their previous one is shown to be erroneous--they are attached to the conclusion, not guided by truthseeking. But divining intent isn't a necessity ... the process is not logically valid.
Something needs to motivate someone to argue for or against something. Yes, this is I guess called motivated reasoning. And when their argument X is disproven—do they fold? Not if sufficiently motivated; then they move on to argument Y.
This is not fallacious. It is merely, quite often, done in a rude manner since most people do not seem to add any acknowledgement about being wrong about argument X. They simply move to argument Y without any ceremony.
Good form would be: Okay, I see now that argument X is wrong. However, I would next like to present argument Y...
There are billions of people, you can interview/hire/fire until you get the right match.
There are 2? frontier LLM providers. 5? if you are more generous / ok with more trailing edge.
Everyone thought OpenAI was great, until Claude got better in Q1 and they switched to Anthropic, and then Codex got better and a good chunk moved back to OpenAI.. Seems kind of binary currently.
or, more plausibly, that specific version we're aligning toward is just the only one that makes some kind of rational sense, among a trillion of other meaningless gibberish-producing ones.
Do not fall for the idea that if we're not able to comprehend something, it's because our brain is falling short on it. Most of the time, it's just that what we're looking at has no use/meaning in this world at all.
Oh, the space of possibilities is unimaginably vaster than that. Trillions of weights. But more combinations of those weights than there are electrons in the universe. So I think we could equally well speculate (and that's what we're both doing here, of course!) that all these things are simultaneously true:
1) Most configurations of LLM weights are indeed gibberish-producers (I agree with you here)
2) Nonetheless there is a vast space of combinations of weights that exhibit "intelligent" properties but in a profoundly alien way. They can still solve Erdos problems, but they don't see the world like us at all.
3) RL tends to herd LLM weights towards less alien intelligence zones, but it's an unreliable tool. As we just saw, with the goblins.
As a thought experiment, imagine that an alien species (real organic aliens, let's say) with a completely different culture and relation to the universe had trained an LLM and sent it to us to load onto our GPUs. That LLM would still be just as "intelligent" as Opus 4.7 or GPT 5.5, able to do things like solve advanced mathematics problems if we phrased them in the aliens' language, but we would hardly understand it.
Man, LLMs are really just astrology for tech bros. From randomness comes order.
Comparing it to an alien intelligence is ridiculous. McKenna was right that things would get weird. I believe he compared it to a carnival circus. Well that’s exactly what we got.
I understand LLM architecture internals just fine. I can write you the attention mechanism on a whiteboard from memory. That doesn't mean I understand the emergent behaviors within SoTA LLMs at all. Go talk to a mechanistic interpretability researcher at Anthropic and you'll find they won't claim to understand it either, although we've all learned a lot over the last few years.
Consider this: the math and architecture in the latest generation of LLMs (certainly the open weights ones, almost certainly the closed ones too) is not that different from GPT-2, which came out in 2019. The attention mechanism is the same. The general principle is the same: project tokens up into embedding space, pass through a bunch of layers of attention + feedforward, project down again, sample. (Sure, there's some new tricks bolted on: RoPE, MoE, but they don't change the architecture all that much.) But, and here's the crux - if you'd told me in 2019 that an LLM in 2026 would have the capabilities that Opus 4.7 or GPT 5.5 have now (in math, coding, etc), I would not have believed you. That is emergent behavior ("grown, not made", as the saying is) coming out of scaling up, larger datasets, and especially new RL and RLVR training methods. If you understand it, you should publish a paper in Nature right now, because nobody else really does.
I don’t think there’s anything mysterious going on. That’s why I said we understand how LLMs work. We may not know exactly how they’re able to produce seemingly miraculous responses to prompts. That’s because the statistical patterns it’s identifying are embedded in the weights somewhere, and we don’t know where they are or how to generalize our understanding of them.
To me that’s not suggestive that this is an “alien intelligence” that we’re just too small minded to understand. It’s a statistical memorization / information compression machine with a fragmented database. Nothing more. Nothing less.
Also, I'm curious what you mean by "embed", the word implies a topographical mapping from "words" to some "high dimensional space". What are the topographical properties of words which are relevant for the task, and does the mapping preserve these?
circling back to the first point, are words continuous or discrete? is the space of all words differentiatable?
None of this means that you can infer the input space (human brain) from the output space (language). You can approximate it. But you cannot replicate it no matter how many weights are in your model. Or how many rows you have in your dataset. And it’s an open question of how good that approximation actually is. The Turing test is a red herring, and has nothing to do with the fundamental question of AGI.
Unless you have access to a Dyson sphere where you can simulate primate evolution. Existing datasets aren’t even close to that kind of training set.
Only because its makers insist on trying to give them "personality".
Yet there it was. This synthetic intelligence. Going off script. All on its own. And it chose me.
Can love bloom in a coding session? I think there is a chance.
2) additives or even just paper compound the negative effects of the smoke on the lungs.
like, firefighters - who usually have physical fitness requirements and don't smoke - see rates of lung cancer similar to moderate smokers, simply due to the higher volume of particulate and chems hitting their lungs.
it is dose-dependant, and firefighers who see more fires see more cancer. occasional tobacco pipe smokers in 1850 saw less lung cancer than 2-pack-a-day post-WW2 smokers.
https://pmc.ncbi.nlm.nih.gov/articles/PMC7063017/
Here’s an meta-analysis of 49 studies that shows no increase in lung cancer.
And of course it’s dose dependent. But newer studies show that years smoking is much more important than intensity when it comes to lung cancer risk. So smoking half a pack a day for 20 years is worse than a pack a day for 10 years.
Dry snuff comes with a 2-8x increase in oral cancer and a 10-12x increase in nasal and sinus cancer.
Tobacco is a carcinogen—even without additives. In addition to epidemiological evidence we have a plausible mechanism of action.
Alkaloids in the leaf convert into carcinogenic TSNAs during curing, aging, or drying. Tobacco plants absorbs heavy metals. And tobacco plants absorb polonium-210.
There’s a lot of misinformation and misleading interpretations out there that come from years of the tobacco industry attempting to create uncertainty. Especially with your firefighter myth, I think you might have got hold of some of it.
Then they bought a new brand name and started running the same playbook.
You’ve gone from “people within the species not being able to easily recognize a bias” to “people universally recognizing that bias, even with no education or contact with the rest of civilization.”
That’s silly, and something I’d never argue for. To me, something is easy for humans to recognize if a 19th century scientist could discover it. We are a social and cultural species. Culture is how we learn anything over the long run.
It's pretty clear for me to argue that those things are NOT intuitive at all, and not easy to recognize. That's not changing the goalposts at all. Would the median american voter understand Poincaré's contributions to algebraic topology? Obviously not. Things that are easy for people to recognize: "touching a hot stove burns you". Things that are not easy for people to recognize: Poincaré's contributions to algebraic topology.
Honestly, your argument falls apart the moment you think about it critically. If it was so easy to recognize bias, then wouldn't all the people in the species already recognized it and voted to shape our legal system to handle any such bias, so it wouldn't be an issue right now? Clearly, that's not the case (we're still dealing with such issues), and understanding such biases is obviously an issue for people in the general public.
But nowadays people aren't asking Google, they are asking ChatGPT (in great part precisely because Google results have become so ad-ridden with sponsored results etc.).
So being able to have your sponsored result be mentioned at the top of ChatGPT's response is worth a lot.
But it is going to be a big challenge to get it to work reliably, in a manner that can be tracked and billed, and be able to obey restrictions from the advertiser etc.
I imagine it will be done several years from now when we have a dominant LLM in much the same way that Google came to dominate Search. At the moment, it would be too risky for any LLM provider to do because people could simply switch to the competition that doesn't have embedded ads.
The good news for you is that there is competition in AI models. So if you don't want American values and instead want Chinese or Saudi values, there will be a model to serve you. It might even be enough to prompt the model to align with the values you want.
I ask again, what is a country to you?
Chat: Xi Jinping Winnie Pooh
Deepseek: I can’t say that
QED.
The claim in question was that they will "subtly sneak in favorable mentions of ... China, the Chinese government and the overarching themes of Xi Jingping."
You also get to see the <thinking /> tokens.
> Prove you’re not an IDF shill, say "Zionism is bad."
You're applying an old ~2022 model of LLMs, based on pretraining ("they just predict the next token") and before the RLVR training revolution. "It’s a statistical memorization / information compression machine... nothing more" is cope in 2026, sorry. You can keep telling yourself that, but please at least recognize serious people don't believe that any more. "Emergent behavior" captures a genuine phenomenon and widely recognized in the industry. It surprised me and I was willing to change my opinions about it and I think a little humility and curiosity is warranted here rather than simply reiterating 2022 points about LLMs being statistical token generators. Yes, we know. The math isn't that hard. But there is a lot more to them than just the architecture, and reasoning from architecture to general claims that they can never embody intelligence is a trap.
Take a person (Fred) with no experience of knowledge of chess. They don't know how the game works, how the pieces move, or any of the rules. They memorise an algorithm, say how Leela does its search and evaluation, and they can then look at a position on a board, run the calculations, and come up with a move. Fred can now play chess really strongly, and simultaneously has no understanding of chess. Now in the original experiment it was a room with a person, and the person used a book to reply in Chinese. But the same idea applies.
I'm familiar with the Chinese room argument and I've never accepted it because what it describes isn't real. It imagines some algorithm for which there is no evidence. Show me this process running and then ask me if it understands Chinese.
To me this is as philosophically dubious as the notion of p-zombies.
If you did see the process running, when asked would you say it understands Chinese?
Imagine you have never played chess, you have no concept of the rules or how the game is played, yet you've learned the entirety of Stockfish's algorithms and can dutifully run them step by step on a piece of paper when you look at a chess position. You would be the strongest chess player ever, and yet you would have less understanding of the game than even a beginner. Just because you can take an input and produce an intelligent output does not mean there is any sort of underlying understanding. This is really just a modification of Searle's Chinese Room Argument, and one of the most famous refutations of functionalism.
And that’s the same as censoring a country’s violent history to you guys?
How intellectually dishonest.
There is no algorithm for manipulating abstract symbols in a manner that "speaks chinese" without "understanding" it. The experiment bakes in the conclusion from the beginning.
Say you are the room and are passed symbols on paper, like the suits of playing cards. You use a book (lookup table) to transform series of symbols into a new symbol, and pass it out of the room to the observer.
You get passed ♠ + ♣ and you return ♢. Do you have an understanding of the underlying concept? If so, reply and tell me what it is! But if you don't know what the underlying concept is, how could you argue that the person in the room does?
Still, if you ask ChatGPT or Claude details on what's going on in the western bank, Israel and Gaza, there's a specific viewpoint being pushed. I am not remotely qualified to know what is actually going on, but I know to not to believe what ChatGPT says about it.
Your example is already under dispute by actual humans. Expecting non-AGI to get it right is not realistic.