Why Machines Learn

Highlights

id1001017538

None of this happened. The perceptron never lived up to the hype. Nonetheless, Rosenblatt’s work was seminal. Almost every lecturer on artificial intelligence (AI) today will harken back to the perceptron. And that’s justified.

La historia de la IA es antigua, como el concepto. En relación a los LLM, sus desarrollos técnicos fundamentales se pueden rastrear al menos hasta la idea del perceptrón de Rosenblat.

→ Readwise

id1001017539

By the mid-1850s, some of the basic math that would prove necessary to building learning machines was in place, even as other mathematicians continued developing more relevant mathematics and birthed and advanced the field of computer science. Yet, few could have dreamed that such early mathematical work would be the basis for the astounding developments in AI over the past half century, particularly over the last decade, some of which may legitimately allow us to envision a semblance of the kind of future Rosenblatt was overoptimistically foreshadowing in the 1950s.

Toda la matemática requerida para el desarrollo de los LLM ya había sido creada y era utilizada a mediados del siglo XIX.

→ Readwise

id1001017540

Getting under the mathematical skin of machine learning is crucial to our understanding of not just the power of the technology, but also its limitations.

En continuidad con el comentario anterior: todos los tomadores de decisiones deben desarrollar una literalidad básica respecto de la tecnología subyacente a los modelos de lenguaje y otras aplicaciones del Machine Learning.

→ Readwise

id1001017541

We cannot leave decisions about how AI will be built and deployed solely to its practitioners. If we are to effectively regulate this extremely useful, but disruptive and potentially threatening, technology, another layer of society—educators, politicians, policymakers, science communicators, or even interested consumers of AI—must come to grips with the basics of the mathematics of machine learning.

→ Readwise

id1001017542

As I did the research for this book, I observed a pattern to my learning that reminded me of the way modern artificial neural networks learn: With each pass the algorithm makes through data, it learns more about the patterns that exist in that data. One pass may not be enough; nor ten; nor a hundred. Sometimes, neural networks learn over tens of thousands of iterations through the data.

Dato importante respecto del proceso de entrenamiento, así como también una interesante reflexión sobre el proceso de aprendizaje humano en general, y cómo va adquiriendo profundidad en la medida en que se hacen pases sucesivos que vinculan cada vez más ideas de una manera progresivamente más robusta. Es representativo de mi propia experiencia y mis modelos cognitivos respecto del aprendizaje.

→ Readwise

id1001042497

What’s all this got to do with real life? Take a very simple, practical, and some would say utterly boring problem. Let’s say x1 represents the number of bedrooms in a house, and x2 represents the total square footage, and y represents the price of the house. Let’s assume that there exists a linear relationship between (x1, x2) and y. Then, by learning the weights of the linear equation from some existing data about houses and their prices, we have essentially built a very simple model with which to predict the price of a house, given the number of bedrooms and the square footage.

Esta es una forma bien concreta y relativamente accesible de explicar en qué sentido los “LLM” son modelos: en el sentido en que modelan una relación a partir de los datos y permiten predecir con relativa precisión cada uno de los casos futuros. La imagen de una regresión lineal.

→ Readwise

id1001042498

Also, we did a very particular kind of problem solving called regression, where given some independent variables (x1, x2), we built a model (or equation) to predict the value of a dependent variable (y). There are many other types of models we could have built, and we’ll come to them in due course.

Machine Learning requiere cantidades masivas de ejemplo para entrenar sus modelos. En este sentido, el desarrollo de la internet, la masificación de dispositivos que permiten añadir información multimedia (a veces anotada, a veces no) es lo que le ha permitido a las máquinas aprender. El mejor ejemplo es captcha.

→ Readwise

id1001042499

In this case, the correlation, or pattern, was so simple that we needed only a small amount of labeled data. But modern ML requires orders of magnitude more—and the availability of such data has been one of the factors fueling the AI revolution.

→ Readwise

id1001042500

McCulloch and Pitts turned this into a simple computational model, an artificial neuron. They showed how by using one such artificial neuron, or neurode (for “neuron” + “node”), one could implement certain basic Boolean logical operations such as AND, OR, NOT, and so on, which are the building blocks of digital computation.

Me parece que este es un buen trozo de información para explicar la idea de las redes neuronales. Es útil porque es fundacional (de las ideas más básicas que explican a qué nos referimos con “neuronales”) y porque permite situar históricamente el desarrollo de la tecnología.

→ Readwise

id1001042501

All this was amazing, and yet limited. The McCulloch-Pitts (MCP) neuron is a unit of computation, and you can use combinations of it to create any type of Boolean logic. Given that all digital computation at its most basic is a sequence of such logical operations, you can essentially mix and match MCP neurons to carry out any computation. This was an extraordinary statement to make in 1943. The mathematical roots of McCulloch and Pitts’s paper were apparent. The paper had only three references—Carnap’s The Logical Syntax of Language; David Hilbert and Wilhelm Ackermann’s Foundations of Theoretical Logic; and Whitehead and Russell’s Principia Mathematica—and none of them had to do with biology. There was no doubting the rigorous results derived in the McCulloch-Pitts paper. And yet, the upshot was simply a machine that could compute, not learn. In particular, the value of θ had to be hand-engineered; the neuron couldn’t examine the data and figure out θ. It’s no wonder Rosenblatt’s perceptron made such a splash. It could learn its weights from data. The weights encoded some knowledge, however minimal, about patterns in the data and remembered them, in a manner of speaking.

La diferencia entre la neurona de MCP y el perceptrón de Rosenblatt es que este último podía actualizar sus pesos en función de los datos, mientras que en el primero estos debían ser “dados”. En otras palabras, el perceptrón “aprende”.

→ Readwise

id1001042502

While McCulloch and Pitts had developed models of the neuron, networks of these artificial neurons could not learn. In the context of biological neurons, Hebb had proposed a mechanism for learning that is often succinctly, but somewhat erroneously, put as “Neurons that fire together wire together.” More precisely, according to this way of thinking, our brains learn because connections between neurons strengthen when one neuron’s output is consistently involved in the firing of another, and they weaken when this is not so. The process is called Hebbian learning. It was Rosenblatt who took the work of these pioneers and synthesized it into a new idea: artificial neurons that reconfigure as they learn, embodying information in the strengths of their connections.

El aprendizaje Hebbiano es el mecanismo que inspiró a Rosenblat para incorporar aprendizaje en las neuronas de McCulloch y Pitts.

→ Readwise

id1001042503

But what exactly is a perceptron, and how does it learn? In its simplest form, a perceptron is an augmented McCulloch-Pitts neuron imbued with a learning algorithm.

Relacionado con el comentario anterior. El algoritmo de aprendizaje está basado en el mecanismo propuesto por Hebb.

→ Readwise

id1001042504

To understand how this works, consider a perceptron that seeks to classify someone as obese, y = + 1, or not-obese, y =-1. The inputs are a person’s body weight, x1, and height, x2. Let’s say that the dataset contains a hundred entries, with each entry comprising a person’s body weight and height and a label saying whether a doctor thinks the person is obese according to guidelines set by the National Heart, Lung, and Blood Institute. A perceptron’s task is to learn the values for w1 and w2 and the value of the bias term b, such that it correctly classifies each person in the dataset as “obese” or “not-obese.”

El perceptrón aprende de manera supervisada. Este es un buen ejemplo para explicar este término.

→ Readwise

id1001042505

Once the perceptron has learned the correct values for w1 and w2 and the bias term, it’s ready to make predictions. Given another person’s body weight and height—this person was not in the original dataset, so it’s not a simple matter of consulting a table of entries—the perceptron can classify the person as obese or not-obese.

Nuevamente, esto ilustra a qué nos referimos con “modelo”: una ecuación que describe el mejor fit posible entre una ecuación y los datos de entrenamiento. Es básicamente una regresión, y lo que permite es clasificar un dato nuevo en base a los pesos que aprende a partir del dataste de entrenamiento.

→ Readwise

id1001042506

The perceptron starts with its weights, w1 and w2, and the bias initialized to zero. The weights and bias represent a line in the xy plane. The perceptron then tries to find a separating line, defined by some set of values for its weights and bias, that attempts to classify the points. In the beginning, it classifies some points correctly and others incorrectly. Two of the incorrect attempts are shown as the gray dashed lines. In this case, you can see that in one attempt, all the points lie to one side of the dashed line, so the triangles are classified correctly, but the circles are not; and in another attempt, it gets the circles correct but some of the triangles wrong. The perceptron learns from its mistakes and adjusts its weights and bias. After numerous passes through the data, the perceptron eventually discovers at least one set of correct values of its weights and its bias term. It finds a line that delineates the clusters: The circles and the triangles lie on opposite sides. This is shown as a solid black line separating the coordinate space into two regions (one of which is shaded gray). The weights learned by the perceptron dictate the slope of the line; the bias determines the distance, or offset, of the line from the origin.

En términos de resultado, se logra lo mismo ir en una regresión lineal, pero me da la sensación de que el procedimiento es distinto, aunque no tengo muy claro cómo o por qué. .indagar

→ Readwise

id1001042507

much of machine learning comes down to minimizing prediction error.

Un perceptrón ES una neurona artificial (de las “artificial neural networks”). La gracia que tienen es que este mecanismo brilla cuando se trata de miles de millones de pesos y su actualización, que es justamente lo que sucede en el caso de las redes neuronales profundas a la base de los modelos de lenguaje grandes.

→ Readwise

id1001042508

What’s described above is a single perceptron unit, or one artificial neuron. It seems simple, and you may wonder what all the fuss is about. Well, imagine if the number of inputs to the perceptron went beyond two: (x1, x2, x3, x4, and so on), with each input (xi) getting its own axis. You can no longer do simple mental arithmetic and solve the problem. A line is no longer sufficient to separate the two clusters, which now exist in much higher dimensions than just two. For example, when you have three points (x1, x2, x3), the data is three-dimensional: you need a 2D plane to separate the data points. In dimensions of four or more, you need a hyperplane (which we cannot visualize with our 3D minds). In general, this higher-dimensional equivalent of a 1D straight line or a 2D plane is called a hyperplane.

→ Readwise

id1001042509

Sure, it learns the correlations without being explicitly told what they are, but these are correlations nonetheless. Is identifying correlations the same thing as thinking and reasoning? Surely, if the Mark I distinguished the letter “B” from the letter “G,” it was simply going by the patterns and did not attach any meaning to those letters that would engender further reasoning. Such questions are at the heart of the modern debate over the limits of deep neural networks, the astonishing descendants of perceptrons.

La pregunta respecto de si el cálculo matemático subyacente al entrenamiento de las redes neuronales profundas y su inferencia es funcionalmente equivalente al aprendizaje y razonamiento humano es una materia de debate actualmente. En qué medida esta es la misma pregunta planteada por la habitación china de John Searle?

→ Readwise

Pensamiento Visible

Ideas

Why Machines Learn

Highlights

id1001017538

id1001017539

id1001017540

id1001017541

id1001017542

id1001042497

id1001042498

id1001042499

id1001042500

id1001042501

id1001042502

id1001042503

id1001042504

id1001042505

id1001042506

id1001042507

id1001042508

id1001042509

Vista Gráfica

Tabla de Contenidos