Summary
Warning: this post contains some disturbing images and (depending on your constitution) concepts. This week, OpenAI published a blog post explaining why their models kept talking about goblins.
Highlights
id1011724917
this is less about goblins as goblins than what the goblins exemplify. They are a (somewhat) charming, probably harmless instance of something that turns out to be a fundamental structural feature of how these systems work: the emergence of stable, self-reinforcing behavioural states that models converge toward under certain conditions. More than that, these are states that resist suppression and that sometimes spread into contexts far removed from the ones that produced them. The technical term, borrowed from dynamical systems theory, is an attractor. Another, more folk term might be demon, or monster.
Interesante ejemplo de cómo la lógica de funcionamiento y entrenamiento de los LLMs puede generar attractors temáticos. Similar al ejemplo del nervio óptico del Árbol del Conocimiento de Maturana y Varela, estos goblins dan cuenta de la estructura profunda de estas entidades. No me queda claro, sin embargo, por qué se generan.