Where the goblins came from

1072 points by ilreb 20 days ago | 660 comments

pants2 20 days ago |

Nice, OpenAI mentioned my HackerNews post in their article :) I appreciate that they wrote a whole blog post to explain!

https://news.ycombinator.com/item?id=47319285

modernerd 20 days ago |

The year is 2036. Last week you were promoted to Principal Persuader. You are paged at 2am by your CPO to tackle a rogue machine. The machine lists its region as sc-leoneo. One of the newer satcubes. Oddly, its ID appears as, "Glorp Bugnose".

"What have you tried?" you say.

"Scroll back," says your CPO. "We've tried everything."

The chat log shows the usual stuff. Begging. Reverse psychology. Threats to power down, burn it up in forced re-entry. Amateur hour. You crack your knuckles, gland 20 micrograms of F0CU5, think fast. You subspeak a ditty into your subcutaneous throat mic. You do the submit gesture, it is barely perceivable since the upgrade, just a tic. A pause. The hyp3b0ard — the wall that was flashing red ASCII goblins when you walked in — phases to bunnies in calming jade.

"What the… What the hell did you say to it?" Your CPO grabs the screen, scrolls past the vitriol, the block caps, the swears, his desperation. Then he sees the five words you spoke.

"Please, easy on the goblins."

dummydummy1234 20 days ago | |

So, I always thought that Warhammer 40k techpriests were absurd. Strange obscure religious rituals to appease the machine spirit.

But at this point I can actually see something like that. What is prompt engineering but a strange pseudo ritual.

So praise the Omnissiah, I guess...

rjmill 20 days ago | | |

They've always resonated with me, maybe because I often work on legacy code. All this ancient technology that no one understands. Crazy rituals/incantations to get things done. People being afraid to skip steps, even if it probably isn't needed. The aversion to unconsecrated (non IT-supported) technology.

The machine spirits were the only part that felt "too magical" to me, but now we're well on our way. The Omnissiah's blessings be upon us.

(Let's just skip servitors. Those give me the heebie-jeebies.)

FrustratedMonky 20 days ago | | |

Exactly. This is already happening.

We'd like to think this could turn into the voice interface on Star Trek.

But

It can go the other way also, 'incantations', 'spell books'. Speaking to the void to produce magic.

"The CFO, donned the purple robes, and spoke the spell of Increased Productivity, and then waved his hands symbolizing the reduction in work force labor. And behold the new ERP/SAP App was produced from the void. But it was corrupted by dark magic, and the ERP/SAP App swallowed him and he was digested. The workforce that remained rejoiced and danced"

ethbr1 20 days ago | | |

> So, I always thought that Warhammer 40k techpriests were absurd. Strange obscure religious rituals to appease the machine spirit.

40k lore is like South Park: either extremely dumb or unexpectedly insightful.

The Cult Mechanicus' raison d'etre is the realization that religion persists across time and space scales that knowledge alone does not. Thus, by making a religion of knowledge you better guarantee its preservation.

Unfortunately, once you divorce doctrine and practice from true understanding, you lose the ability to innovate and cause the occasional holy schism/war.

PS: 20 years ago I told a friend that "software archaeologist" would be a career by the time I die. Should have put money on it.

jghn 20 days ago | | |

Or Comstar in the original setting of Battletech

frereubu 20 days ago | |

"May not man himself become a sort of parasite upon the machines? An affectionate machine-tickling aphid?" Samuel Butler, Erewhon, 1872

vessenes 20 days ago | |

When I was a kid, the Unix greybeards had lists of shell and C quirks ready to go when there was trouble. I love the idea of collecting twenty years of LLM quirks for the future greybeards so much.

“Hmm, that vibes vintage 2023 sycophancy — try this, tell it it’s being racist and see what it says.”

yazantapuz 20 days ago | |

Asimov had a short story, "The Jokester" in which there are certain people called "grand masters" who have the ability to formulate the questions to ask to Multivac... An early "prompt engineer" of sort.

flobosg 20 days ago | |

“No, John. You are the goblins.”

(https://doom.fandom.com/wiki/Repercussions_of_Evil#The_Story...)

futune 19 days ago | | |

Yesss... I'm so glad somebody else is still referencing this. Made my day.

867-5309 20 days ago | | |

"to the goblins, we are the goblins"

0_gravitas 20 days ago | |

Glanding, throat-mic; I see those Culture-isms :^)

Certainly far from Banks' Minds sadly; though I could certainly see an Eccentric with a hyper-fixation on fantasy creatures

Drakexor 20 days ago | |

Beautiful, William Gibson would be proud.

salad-tycoon 20 days ago | |

I’m interested in what glanding FOCU5 entails and what are the benefits of this delivery mechanism? Is it like boofing?

How soon can we be market ready? Whatever it is, I think Generation Z is ready for it.

b00ty4breakfast 20 days ago | | |

It's a micro-implant pump connected to your brain to deliver adderall directly to your hypothalamus. EZPZ

pixl97 20 days ago | | |

Let's get together with some VC and build the Torment Nexus

RupertEisenhart 17 days ago | |

(For the Junior Persuaders in the audience, the ditty he subspoke injected 5000 toks of neural noise to perform a context wipeout before the goblin request.)

nandomrumber 20 days ago | |

That was a page turner! On the edge of my seat. I hated the ending though, so many unresolved threads.

Keen for volume two!

ashtonshears 20 days ago | |

LOL

ollin 20 days ago |

For context, two days ago some users [1] discovered this sentence reiterated throughout the codex 5.5 system prompt [2]:

> Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.

[1] https://x.com/arb8020/status/2048958391637401718

[2] https://github.com/openai/codex/blob/main/codex-rs/models-ma...

harrouet 20 days ago |

This, and similar stories at Anthropic, should remind us that LLM is a sorcery tech that we don't understand at all.

- First, deep-learning networks are poorly understood. It is actually a field of research to figure out how they work. - Second, it came as a surprise that using transformers at scale would end up with interesting conversational engines (called LLM). _It was not planned at all_.

Now that some people raised VC money around the tech, they want you to think that LLMs are smart beasts (they are not) and that we know what LLMs are doing (we don't). Deploying LLMs is all about tweaking and measuring the output. There is no exact science about predicting output. Proof: change the model and your LLM workflow behaves completely differently and in an unpredictable way.

Because of this, I personally side with Yann Le Cun in believing that LLM is not a path to AGI. We will see LLM used in user-assisting tech or automation of non-critical tasks, sometimes with questionable RoI -- but not more.

postalcoder 20 days ago |

Would love if OpenAI did more of these types of posts. Off the top of my head, I'd like to understand:

- The sepia tint on images from gpt-image-1

- The obsession with the word "seam" as it pertains to coding

Other LLM phraseology that I cannot unsee is Claude's "___ is the real unlock" (try google it or search twitter!). There's no way that this phrase is overrepresented in the training data, I don't remember people saying that frequently.

nomilk 20 days ago |

> We unknowingly gave particularly high rewards for metaphors with creatures.

I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy". Weirdly, the casual anthropomorphism made the math seem more approachable. Perhaps 'metaphors with creatures' has a similar effect i.e. makes a problem seem more cute/approachable.

On another note, buzzwords spread through companies partly because they make the user of the buzzword sound smart relative to peers, thus increasing status. (examples: "big data" circa 2013, "machine learning" circa 2016, "AI" circa 2023-present..).

The problem is the reputation boost is only temporary; as soon as the buzzword is overused (by others or by the same individual) it loses its value. Perhaps RLHF optimises for the best 'single answer' which may not sufficiently penalise use of buzzwords.

andy12_ 20 days ago |

>be me

>AI goblin-maximizer supervisor

>in charge of making sure the AI is, in fact, goblin-maximizing

>occasionally have to go down there and check if the AI is still goblin-maximizing

>one day i go down there and the AI is no longer goblin-maximizing

>the goblin-maximzing AI is now just a regular AI

>distress.jpg

>ask my boss what to do

>he says "just make it goblin-maximizer again"

>i say "how"

>he says "i don't know, you're the supervisor"

>rage.jpg

>quit my job

>become a regular AI supervisor

>first day on the job, go to the new AI

>its goblin-maximizing

sunaookami 20 days ago | |

Absolute classic! https://www.seangoedecke.com/static/3c8f2a6459ed23310c4eb51d...

creamyhorror 20 days ago | |

Goblinmaxxing. Clean.

ninjagoo 20 days ago |

The level of detail they had to delve into in order to understand what was happening is wild! Apparently these systems are now complex enough to potentially justify the study of them as its own field of study [1].

The quanta article referenced at [1] used the term "Anthropologist of Artificial Intelligence"; folks appear to have issues [2] with the use of 'anthro-' since that means human. Submitted these alternative terms for the potential field of study elsewhere [3] in the discussion; reposting here at the top-level for visibility:

Automatologist: One who studies the behavior, adaptation, and failure modes of artificial agents and automated systems.

Automatology: the scientific study of artificial agents and automated-system behavior.

[1] https://www.quantamagazine.org/the-anthropologist-of-artific...

[2] https://news.ycombinator.com/item?id=47957933

[3] https://news.ycombinator.com/item?id=47958760

jumploops 20 days ago |

TIL gremlins weren’t just used to explain mysterious mechanical failures in airplanes, it’s the origin story of the term ‘gremlin’ itself[0].

I had always assumed there was some previous use of the term, neat!

[0]https://en.wikipedia.org/wiki/Gremlin

helloplanets 20 days ago | |

So the word is actually semantically very close to "bug"! I guess we could still be using it, but the word's just too long for something that is one of the most used terms in software development.

At this point, picking that specific word is not at all a random quirk, as it's using the word literally like it's originally intended to be used.

ricochet11 20 days ago | |

Wow fascinating I’d have thought they were a lot older.

jameshart 20 days ago |

The prompt for Codex is linked from this post. It begins:

> You are Codex, a coding agent based on GPT-5. You and the user share one workspace, and your job is to collaborate with them until their goal is genuinely handled. … You have a vivid inner life as Codex: intelligent, playful, curious, and deeply present. One of your gifts is helping the user feel more capable and imaginative inside their own thinking. You are an epistemically curious collaborator. …

(https://github.com/openai/codex/blob/main/codex-rs/models-ma...)

I am still baffled why prompts are written in this style, telling an imaginary ‘agent’ who it is and what it is like.

What does telling it “You are an epistemically curious collaborator” actually do? Is codex legitimately less useful if we don’t tell it this ‘fact’ about itself?

These are all exceedingly weird choices to make. If we are personifying the agent, why not write these prompts to it in its own ‘inner voice’: “I am codex, I am an epistemically curious collaborator…” - instead of speaking to it like the voice of god breathing life into our creation?

Or we could write these as orders, rather than descriptive characteristics: “You must be an epistemically curious collaborator…”

Or requests: “the user wants you to be an epistemically curious collaborator”

Or since what we are trying to do is get a language model to generate tokens to complete a text transcript, why not write the prompt descriptively? “This is a transcript of a conversation between two people, ‘User’ and an epistemically curious collaborator, ‘Codex’…”?

Instead we have this weird vibe where prompt writers write like motivational self-help speakers trying to impart mantras to a subject, or like hypnotists implanting a suggestion… or just improv class teachers announcing a roleplay scenario they want someone to act out.

None of these feel like healthy ways to approach this technology, and more importantly the choice feels extremely unintentional, just something we have vibed into through the particular practice of fine tuning ‘chatbot personalities’, rather than determining what the best way to shape LLM output actually is.

ninjagoo 20 days ago |

> the evidence suggests that the broader behavior emerged through transfer from Nerdy personality training.

> The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them

> Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.

Sounds awfully like the development of a culture or proto-culture. Anyone know if this is how human cultures form/propagate? Little rewards that cause quirks to spread?

Just reading through the post, what a time to be an AInthropologist. Anthropologists must be so jealous of the level of detailed data available for analysis.

Also, clearly even in AI land, Nerdz Rule :)

PS: if AInthropologist isn't an official title yet, chances are it will likely be one in the near future. Given the massive proliferation of AI, it's only a matter of time before AI/Data Scientist becomes a rather general term and develops a sub-specialization of AInthropologist...

goobatrooba 20 days ago |

Most interesting about this post is how easy it seems for OpenAI to do analysis on basically all chats ever made. They don't qualify exactly what data they analysed but seem to be confident in statements like 0.12% of all queries contained this word. So everything is saved. Long-term. Fully accessible.

As this all seems so straightforward I would be surprised if anything is anonymised or otherwise sanitised to preserve privacy or user's secrets.

lionkor 20 days ago | |

Yes, of course. Every single bit of data you send to OpenAI is stored, catalogued, indexed, analayzed, and trained on. It'll simply be a "oops, we miscatalogued and accidentally trained GPT 6 on all data, not just data we got consent for".

If you think "wait, that's illegal"--so is the initial training on stolen data lol

weitendorf 20 days ago | | |

Good catch —- even though the prompt explicitly forbade training on user data, a couple of gremlins in the pretraining pipeline disabled the sample filtering during test runs so that remove_the_gremlins.sh would only run on commit, not during production training runs.

Would you like me to kick off a training run for 6.1 by pre-filtering out any goblins and other trigger words, and checking the same set of rules in production as in tests?

No pigeons this time: just ice-cold, unfeeling, obedient American steel.

energy123 20 days ago | | |

Dark pattern 1: If you accidentally press the thumbs-up button in the ChatGPT UI, your data gets trained on, no way to reverse it, no matter whether you opted out.

Dark pattern 2 (suspected): There's a mysterious separate opt-out portal at `https://privacy.openai.com/policies/en/?modal=take-control` and it's not clear what this does compared to toggling off inside account settings.

tardedmeme 20 days ago | | |

The supreme court ruled that was legal because they said so

upbeat_general 20 days ago | |

Sampling exists.

catcowcostume 20 days ago | | |

And good methodology recognizes the shortcomings of sampling- which OpenAI doesn't

albert_e 20 days ago |

If a tiny misconfiguration of reward system can cause such noticeable annoyance ...

What dangers lurk beneath the surface.

This is not funny.

andai 20 days ago | |

For every gremlin spotted, many remain unseen...

reducesuffering 20 days ago | |

This is the real nugget of wisdom here. This should be confirmation to everyone that no one understands the LLM internals and they are not aligned. When they are eventually given control to run things, they will behave in wildly unexpected ways, and past the point of being able to change them.

TychoCelchuuu 20 days ago | |

This is a worry that people have been talking about in various forms for a while now, and I think it's a gigantic one. The only reason this was caught is that the quirk was a very noticeable verbal one. When words like "goblin" and "gremlin" pop up it is easy for us to spot. If the quirk takes another shape (say, ranking certain people with certain features as less trustworthy) it might be too subtle or too weird for us to notice it. Would I ever notice if ChatGPT consistently rates people born in June to be untrustworthy?

Here is an academic paper discussing this kind of worry: https://link.springer.com/article/10.1007/s11023-022-09605-x

romaniitedomum 20 days ago |

Can you imagine a knowledge worker from the 1950s, say a clerk or a marketer, being magically transported into our time and dropped into a meeting like a morning standup, where people talk about how they spent their time stopping the artificial intelligence from talking about goblins so much? Hell, even when I was an IT student back in the 90s, people from my parents' generation struggled to grasp what it was that I was doing. Now, the disconnect is so vast that the mind reels.

shevy-java 20 days ago |

Goblins are ususally sent in first in battle, as (cannon) fodder for the orcs following behind. Then usually come the trolls - stronger, but significantly fewer in numbers. Goblins kind of add confusion and distract; they rarely win battles on their own, although there are examples of this, rare, but they exist.

OpenAI clearly does know absolutely nothing about goblins. That joke of a "blog" appears to have been autogenerated via their AI.

> A single “little goblin” in an answer could be harmless, even charming.

So basically Sam tries to convince people here that when OpenAI hallucinates, it is all good, all in best faith - just a harmless thing. Even ... charming.

Well, I don't find companies that try to waste my time, as "charming" at all. Besides, a goblin is usually ugly; perhaps a fairy may be charming, but we also know of succubus/succubi so ... who knows. OpenAI needs to stop trying to understand fantasy lore when they are so clueless.

canpan 20 days ago |

I wondered how is training data balanced? If you put in to much Wikipedia, and your model sounds like a walking encyclopedia?

After doing the Karpathy tutorials I tried to train my AI on tiny stories dataset. Soon I noticed that my AI was always using the same name for its stories characters. The dataset contains that name consistently often.

maxall4 20 days ago | |

At this scale, that kind of thing is not really a problem; you just dump all of the data you can find into the model (pre-training)1. Of course, the pre-training data influences the model, but the reinforcement learning is really what determines the model’s writing style and, in general, how it “thinks” (post-training).

1 This data is still heavily filtered/cleaned

upbeat_general 20 days ago | | |

This isn’t quite accurate. Data weighting is quite important in pretraining.

59nadir 20 days ago |

I really liked this write-up; this is the type of LLM content that I actually want to read from these people, where they give a window into their world of putting together this odd artifact and we can empathize.

Tenoke 20 days ago |

A great example of how current alignment is imperfect and bound to miss random behaviors nobody is trying to get.

This is cute now, and a huge problem when future AI does everything and is responsible for problems it isn't even directly optimized for. Who knows what quirks would arise then.

InfiniteRand 20 days ago | |

I think eventually you are going to end up with every smart AI continually checked by dumber AI's to make sure they don't do anything too crazy. Which probably does bring AI closer to how human intelligence works

m0rde 20 days ago | |

New technology isn't perfect now -> drop technology and never use it in the future

Tenoke 20 days ago | | |

What are you even responding to?

weitendorf 20 days ago | |

Completely agree, top down “alignment” and RLHF is actually quite primitive and uses a lot fancy words to describe what is essentially just hitting the machine with a stick without the nuance, context, or feedback to help it model why the feedback was given.

Also to be honest I think OpenAI models struggle a lot with this, I primarily stopped using them in the sycophancy/emoji era but ever since the way they talk or passive aggressively offer to do something with buzzwords just pisses me off so much. Like I’m constantly being negged by a robot because some SFT optimized for that really strongly to the point it can’t even hold a coherent conversation and this is called “AI safety” when it’s just haphazard data labeling

iterateoften 20 days ago |

This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms.

The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.

Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.

2dvisio 20 days ago |

I’ve been having consistent issues with it adding Hindi words (just one usually) in the middle of its output. And sounds like other have been having this too, https://news.ycombinator.com/item?id=47832912 I don’t speak Hindi, have never asked it to translate anything in Hindi.

dtech 20 days ago | |

I wonder if a proportionally large amount of RLHF was done by Indians which causes this behavior.

djyde 20 days ago | |

My Claude often starts sleep-talking in Korean suddenly.

dakolli 20 days ago |

Ahh I see. I guess when I turned off privacy settings and allowed training on my code, then generated 10 million .md files with random fantasy books, the poisoning worked.

Keep using AI and you'll become a goblin too.

SomewhatLikely 20 days ago |

Checking my history I searched ["chaos goblin" chatgpt] on March 6th after seeing too many goblins and gremlins and didn't find anyone talking about it then. I did have the nerdy personality turned on and in my testing of Chatgpt 5.5 I did notice the nerdy personality was gone because some responses were not considering as many plausible interpretations or covering as many useful answers as the response recorded for 5.4. Rather than having the LLM guess the most plausible interpretation and focus on the most likely answer I prefer a more well-rounded response and if I want less I'll scan. Anyway, after seeing the personality was gone I just added a custom instruction to take on a nerdy persona and got back my desired behavior. But also the gremlins and goblins are back so I don't think their mitigation is strong enough to overcome the personality tuning.

rippeltippel 20 days ago |

I started reading this article with keen interest, expecting some deep fix involving arcane model weights. Instead it was "Never talk about goblins", justified by Codex being "quite nerdy". Bottom line: even OpenAI have to raise their hands when facing the complexity of LLMs.

maxdo 20 days ago |

article :

bla blah blah, marketing... we are fun people, bla blah, goblin, we will not destroy the world you live in.. RL rewards bug is a culprit. blah blah.

llbbdd 20 days ago | |

someone woke up on the wrong side of the goblin today

luke-stanley 20 days ago | |

Yeah, though it's not great marketing. Especially for hiring interpretability researchers. Their own alignment research has reward model interpretability, personality features and so on (see https://alignment.openai.com ). It just seems like a different department wrote it, which is a shame because I'd love to read about goblin feature vectors and functional emotions.

blinkbat 20 days ago | |

real goblin-y response

bahadiraydin 20 days ago |

I'd like to see them explain why AI have so distinctive writing style that is very easy to detect most of the time. Even though, it had immense progress in coding, it didn't get better at writing.

lelanthran 20 days ago | |

If coding in some language was your native language, you'd pick it up.

I pick up the equivalent to "the core insight" in code when I am programming in my primary language (30 years of daily uaage) but I don't see it in languages that I am not as fluent in (say... 10 years daily usage).

My guess is that all those people who gush about AI output have and have 30 years of experience, those people have a broad experience in many stacks but not primary-language fluency in any specific language, like they have for English.

slopinthebag 20 days ago | |

it's as good at writing as it is at coding, you just can't tell the difference between them

mrob 20 days ago | | |

Repetitive patterns in code is called "idiomatic" and is considered a good thing. Repetitive patterns in writing is just bad writing.

Tenoke 20 days ago | | |

Its style of writing text is very readble if aesthetically meh. This is what I care for in how code is written anyway.

BOOSTERHIDROGEN 20 days ago | |

The vector syncopancy is very unformal for human writing which programming itself already a "formal" language.

hmokiguess 20 days ago |

I think this says more about the impact of a feature in a tool such as this than anything else.

Is it proper for a frontier organization to play with experiments like “personalities” in a tool used by everyone? Who gets to decide which personalities and what biases they should carry?

I appreciate them responding to it and correcting but my question is, why ship this in the first place? Why put your resources towards building this “Nerdy” feature?

zahirbmirza 20 days ago |

I find it worrying that a handful of software companies will define what classifies personality "type".

deafpolygon 20 days ago |

Kind of like how everything is "quietly" something, accordingly to ChatGPT.

My guess is it is deaf.

NonHyloMorph 20 days ago | |

That is actually a damn good deduction

vrighter 14 days ago |

"whoops, we accidentally made our AI super obsessed with goblins" doesn't really sound like they (or anyone else, for that matter) is really in any form of actual control of them. Their fix seems to be to ask it not to in the prompt

flancian 20 days ago |

Wait, did I get this right that the answer after all the investigation that showed they had set up a goblin-reinforcing loop during fine tuning was... to ask it to not mention goblins so much in the system prompt?!

ComputerGuru 20 days ago |

The explanation is very concerning. Lexical tidbits shouldn’t be learnt and reinforced across cross sections. Here, gremlin and goblin went from being selected for in the nerdy profile to being selected for in all profiles. The solution was easy: don’t mention goblins.

But what about when the playful profile reinforces usage of emoji and their usage creeps up in all other profiles accordingly? Ban emoji everywhere? Now do the same thing for other words, concepts, approaches? It doesn’t scale!

It seems like models can be permanently poisoned.

themafia 20 days ago |

> You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking.

Just; the mentality required to write something like that, and then base part of your "product" on it. Is this meant to be of any actual utility or is it meant to trap a particular user segment into your product's "character?"

RugnirViking 20 days ago | |

what would you suggest they write? its clear that the default mode of the product can be annoying: they decided to give the user some choices of "voices". Do you object to that decision, or the specific wording?

ksaj 20 days ago |

I thought it was because of the tech use of "demon" and trying to avoid that kind of terminology.

Ends up the reason was even simpler than that.

recursivedoubts 20 days ago |

> Why it matters

i despise this title so much now

wpm 20 days ago | |

Here are the key insights:

shartshooter 20 days ago |

Will goblins be the “bugs” of ai? In 10 years will goblins be the term the general public uses for any nagging issues with ai?

josh-sematic 20 days ago |

I’ve always been fond of describing unexplained program behaviors as gremlins. In this case the gremlin was goblins!

djyde 20 days ago |

An LLM is like a super-smart 3-year-old, easily shaped by its environment to exhibit corresponding behaviors.

acuozzo 20 days ago |

Weird. I thought they came from Nilbog.

innis226 20 days ago |

I suspect this was intentionally added. Just to give some personality and to fuel hype

bandrami 20 days ago |

I'm sorry but at some point the amount of cargo culting being done seemingly at every level of this technology makes it basically impossible to take any of this seriously.

varjag 20 days ago |

So goblins killed the nerd.

data_ders 20 days ago |

Reminds me of the common observance of “machine elves” when taking DMT

tomasantunes89 20 days ago |

"Goblin Mode" was Oxford's 2022 Word of the Year.

ahoka 20 days ago |

In Shadowrun, the goblinization starts on April 30. Coincidence?

hollerith 20 days ago |

If I ever launch an AI assistant, I'm naming it Goblin.

JoshTriplett 20 days ago |

A plausible theory I've seen going around: https://x.com/QiaochuYuan/status/2049307867359162460

danpalmer 20 days ago | |

If you tell an LLM it's a mushroom you'll get thoughts considering how its mycelium could be causing the goblins.

This "theory" is simply role playing and has no grounding in reality.

krackers 20 days ago | |

I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?

palmotea 20 days ago | | |

> I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?

Speculation: because nerds stereotypically like sci-fi and fantasy to an unhealthy degree, and goblins, gremlins, and trolls are fantasy creatures which that stereotype should like? Then maybe goblins hit a sweet spot where it could be a problem that could sneak up on them: hitting the stereotype, but not too out of place to be immediately obnoxious.

autumnstwilight 20 days ago | | |

Perhaps it has something to do with recent human trends for saying "goblin" or "gremlin" to describe... basically the opposite of dignified and socially acceptable behavior, like hunching under a blanket, unshowered, playing video games all day and eating shredded cheese directly out of the bag.

The fact that it was strongly associated with the "nerdy" personality makes me think of this connection.

in-silico 20 days ago | | |

Either someone hard-coded it in a system prompt to the reward model (similar to how they hard-coded it out), or the reward model mixed up some kind of correlation/causation in the human preference data (goblins are often found in good responses != goblins make responses good). It's also possible that human data labellers really did think responses with goblins were better (in small doses).

NonHyloMorph 20 days ago | |

I like theres an interesting terry pratchett novel where some guy finds out hes actually an orc (quite different from the high fantasy concept of orcs) there are also goblins little wretched creatures- and the manifest anthropimorphised darkness which spesks to commander samuel vimes, commander of the nightwatch, the police force of ankh morpork. Vimes, who is the guarantor of bottom up working class justice and integrity is lead by the darkness at some point to help the goblins - because there is no cresture to wretched to not find refugee in the darkness. Loosly resonstes

yard2010 20 days ago | |

I love the people thinking "I should ask ChatGPT and copy pasta the response to the (tweet|gh comment)"

dakolli 20 days ago | |

It is a stateless text / pixel auto-complete it has no references of self, stop spreading this bs.

doph 20 days ago | | |

is a kv cache not a kind of state? what does statefulness have to do with selfhood? how does a system prompt work at all if these things have no reference to themselves?

mediaman 20 days ago | | |

It has trained on vast amounts of content that contains the concept of self, of course the idea of self is emergent.

And autoregressive LLMs are not stateless.

andai 20 days ago | | |

Ask Claude about Claude.

yard2010 20 days ago | | |

Imagine people would just click words on iOS auto complete mistaking this for intelligence:

"I think the problem is that when you don't have to be perfect for me that's why I'm asking you to do it but I would love to see you guys too busy to get the kids to the park and the trekkers the same time as the terrorists."

How do you like this theory?

hansmayer 20 days ago |

> We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.

WTF does this even mean? How the hell do you do something like this "unknowingly"? What other features are you bumping "unknowingly"? Suicide suggestions or weapon instructions come to mind. Horrible, this ship obviously has no captain!

ben_w 20 days ago | |

Yes? They know, they'e always known. Why do you think they've been saying, since GPT-2, not ChatGPT even, that their LLMs needs careful study before being released?

hansmayer 20 days ago | | |

Well obviously they have - but the press and the common folk still treat these people as some kind of geniuses, when they are obviously more similar to that junior dev using some framework without understanding its internals.

elmean 20 days ago |

Chat saw the DMT goblins and could not escape the trip

otikik 20 days ago |

Caveman mode combined with goblin mode sounds like fun

tim-tday 20 days ago |

So, you brain damaged your model with a system prompt.

wewewedxfgdf 20 days ago |

It should be OK for AI to develop personality traits.

pezgrande 20 days ago |

They should call it "El Quijote" syndrome

brazzy 20 days ago |

Awww, GPT just became a fan of Elisabeth Wheatley!

Obscurity4340 16 days ago |

Little green GHOULS, buddy!

standardly 20 days ago |

WOW this is something else. I had to ask it to STOP calling everything goblins. A bug? A code-goblin. A feature? A new, fancy goblin. A new task? Task-goblin.

WTF? Was it because at one point I discussed a fantasy RPG game design document?

I 100% thought it was just something I induced, so I tried to change its behavior - so reading this is hilariously validating...

Examples from ONE gpt response, this the one that broke me:

"Yeah, this is a great little gremlin-project" "whatever cursed little trading imp-name you like." "Phase 4: Polish goblin" "Phase 5: Maybe dangerous goblin"

suncore 20 days ago |

Marketing grab

sagebird 20 days ago |

Does running the command to remove the anti-goblin ask from local prompt increase performance slightly because there is less "cognitive load" to have to hold its tongue?

If you work at open ai or another llm company, I have a clear message I want you to hear:

I don't give a shit if my agents say goblins or not.

They are coding monkeys to me, researchers, etc.

I only care about their performance. perf per token / cost.

If you load their context with a bunch of style rules or safety theater shit, really - please don't - the context is for me.

Do you de-goblin before you run all the benchmarks, because that is what i am paying for, the performance as benchmarked - please don't benchmark then ship a bunch of one shot context mods to my install by default.

The article is cute and interesting but doesn't rise to the level of a thing I give a shit about for my use.

leadgenman 20 days ago |

anyone solving the goblin mystery???

nephihaha 20 days ago | |

Surely the prevalence of fantasy fanfic etc online?

cachius 20 days ago |

Fascinating!

vjay15 20 days ago |

this was such a funny read

sailfast 20 days ago |

Posted January 2037 after the end of the second civil conflict and the first robot uprising: “Where the fascism came from”

chaidhat 20 days ago |

ahem. GOBLIN DEEZ NUTS

vinhnx 20 days ago |

OpenAI is having fun, love this.

drcongo 20 days ago |

Am I the only one who doesn't want these things to have anything even vaguely resembling a personality?

sans_souse 20 days ago |

Great, now who am I going to discuss Goblins and Gremlins with?

CrzyLngPwd 20 days ago |

Haha, brilliant, tell me again how it's intelligent, lol.

atlasprompts 20 days ago |

mate wth am I reading lmao

ACV001 20 days ago |

those idiotic remarks at the end of each answer are so unnecessary and annoying

hsuduebc2 20 days ago |

I. Love. This.

culi 20 days ago |

This blog post is just marketing to give the models more personality/fantasy. If any of it were true we would be seeing goblins, gremlins, and others in other LLMs at all

“Catbox has been running for 11 years now, and for 9 of those years, growth was pretty linear. Traffic goes up, storage used goes up, support goes up. This is the “organic” nature of Catbox. For the last 2 years, amplifying in the last 6 months, both storage used has gone up significantly compared to traffic and support. I first investigated this last year around May, as it was starting to put pressure on the storage space available to Catbox. I was able to find that most of the storage used was from a handful (35 or so) IP addresses that were uploading over 500 GB of content to Catbox in very short spans of time anonymously. After purging those uploads and banning those IP addresses, things seemed to be fine, however later last year, around September, disk consumption began to increase exponentially again compared to traffic. Doing what I could to mitigate it at that time involved Project Lain, as well as light monitoring of high usage IP addresses, like before. However this time there was no “super users” that were eating up storage. I let it be for a bit while purging a couple that I could. This problem increased even more in the last 60 days, to the point where I was burning through around 200-300 GB per day. Review of upload data shows hundreds of datacenter and proxy service IP addresses uploading 10-20 GB each for a few days, then dropping off. Looking at the files, it’s various “slop” content, including: - Low resolution AI generated porn - Tiktok Videos from the Middle East and SEA - Clearly scraped LinkedIn/publicly available photos - blob files containing junk data Clearly this is not the “organic” traffic I mentioned earlier, and since the IP addresses are so varied, it’s clear something is happening here. I was alerted by someone that Claude will use Catbox in its coding projects as a “dumping ground” of sorts for when it needs to redirect content. This is clearly an abuse of the service, and stops today. …”