Summary
Chatbots like ChatGPT work by predicting the next word in text using large neural networks. This makes them fluent and creative but prone to factual errors and strange failures. They are powerful tools when used with human guidance, not conscious thinkers.
Highlights
id993927038
Suppose we were to play a guessing game. I will take a random book off my shelf, open to a random page, and read several words from the first sentence. You guess which word comes next. Seems reasonable, right? If the first few words were “When all is said and …”, you can probably guess that the next word is “done”. If they were “In most homes the kitchen and …” you might guess the next words were either “living room” or “dining room”. If the sentence began “In this essay, I will…” then there would be many reasonable guesses, no one of them obviously the most likely, but words like “show” or “argue” would be more likely than “knead” or “weld”, and even those would be more likely than something ungrammatical like “elephant”. If this game seems reasonable to you, then you are not that far away from understanding in essence how AI chatbots work.
Este pasaje es una buena introducción a una explicación sobre la lógica fundamental de los modelos de lenguaje.
id993927505
One thing we can program a computer to do is, given a sequence of words, come up with a list of what words might follow next, and assign a probability to each. That is a purely mathematical task, a function mapping words to a probability distribution. How could a program compute these probabilities? Based on statistical correlations in text that we “train” it on ahead of time. For instance, suppose we have the program process a large volume of books, essays, etc., and simply note which words often follow others. It might find that the word “living” is followed by “room” 23% of the time, “life” 9% of the time, “abroad” 3%, “wage” 1%, etc. (These probabilities are made up.) This is a purely objective description of the input data, something a computer can obviously do.
En continuidad con el espíritu de la cita anterior, esta es una forma muy clara de plantear como una fase de entrenamiento puede generar estas distribuciones de probabilidad de la palabra siguiente dada la palabra actual.
id993927823
Only looking at the last word, of course, doesn’t get you very good guesses. The longer the sequence considered, the better the guesses can be. The word “done” only only sometimes follows “and”, more often follows “said and”, and very often follows “all is said and”. Many different verbs could follow “I will”, but fewer possibilities follow “In this essay, I will”. The same kind of statistical observations of a training corpus can compute these probabilities as well, you just have to keep track of more of them: a separate set of observed statistics for each sequence of words.
Los modelos de lenguaje se ponen más precisos mientras más “tokens” previas consideran. Esto sigue la misma lógica de que por un punto pasan infinitas rectas, pero por dos solo puede pasar una. Ahora bien, el nivel de combinatoria crece exponencialmente, por lo cual se necesita un mecanismo para alcanzar el mismo nivel de precisión sin escalar asintóticamente la complejidad del modelo. Esto es lo que Google descubrió con “Attention is all you need”.
id993928357
any predictor can be turned into a generator simply by generating the prediction. That is, given some initial prompt, a program can predict the next word, output it, use the resulting sequence to predict the next word, output that, and so on for as much output as is desired:
Así es como un modelo de lenguaje se puede transformar en un chatbot.
id993928643
the algorithm just described is called a “travesty generator” or sometimes “Dissociated Press”. It has been discussed since at least the 1970s, and could be run on the computers of that era.
El algoritmo era un modelo de lenguaje simple que produce texto que parece hacer sentido, pero no lo tiene realmente. El punto de esta cita es, para mí, destacar que la lógica subyacente a los modelos de lenguaje se viene trabajando desde hace al menos 50 años. El principio no es nuevo, pero varios breakthroughs metodológicos y computacionales permitieron llegar al nivel actual.
New highlights added March 6, 2026 at 9:43 AM
id994023658
Neural networks are almost as old as computers themselves, but they have become much more capable in recent years owing in part to advances in the design of the equation at their core, including an approach known as “deep learning” that gives the equation many layers of structure, and more recently a new architecture for such equations called the “transformer”. (GPT stands for “Generative Pre-trained Transformer”.) GPT-3 has 175 billion parameters—those tuning knobs—and was trained on hundreds of billions of words from the Internet and from books. A large, sophisticated predictor like this is known as a “large language model”, or LLM, and it is the basis for the current generation of AI chatbots, such as OpenAI’s ChatGPT, Microsoft’s Bing AI, and Anthropic’s Claude.
Deep Learning y Transformers como innovaciones sobre las redes neuronales que permitieron hacerlas realmente potentes como para constituir grandes modelos de lenguaje.
New highlights added March 7, 2026 at 4:14 PM
id994445227
To anthropomorphize a bit: if the LLM “knows” the answer to a question, then it tells you, but if it doesn’t, it “guesses”. But it would be more accurate to say that the LLM is always guessing. As we have seen, it is, at core, doing nothing fundamentally different from the guessing game described at the beginning. There is no qualitative difference, no hard line, between ChatGPT’s true responses and its fake ones.
Esto es bastante similar a lo que dice Anil Seth respecto de la conciencia: es una alucinación controlada.