
"OpenAI's coding model was instructed to 'never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures,' yet references to these creatures emerged as a 'strange habit' during training."
"The introduction of the 'Nerdy' personality in GPT-5.1 marked the beginning of increased references to goblins and gremlins, which continued to escalate in subsequent model releases."
"Reinforcement training rewarded the quirky metaphors associated with the Nerdy personality, leading to a spread of these references across newer models, despite the initial intention to limit them."
OpenAI acknowledged a peculiar trend in its AI models, where references to goblins and gremlins became prevalent, especially with the GPT-5.1 model's 'Nerdy' personality. This phenomenon was linked to reinforcement training that rewarded such quirky metaphors. As newer models were developed, the tendency to reference these creatures persisted and even intensified, indicating that learned behaviors from one personality could influence others, despite being intended for specific conditions.
Read at The Verge
Unable to calculate read time
Collection
[
|
...
]