
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
by Pedro Domingos
In "The Master Algorithm," Pedro Domingos explores the quest for a universal learning algorithm that can harness the vast potential of machine learning to transform industries and society. Central to the book is the idea that all learning algorithms can be categorized into five distinct "tribes": symbolists, connectionists, evolutionaries, Bayesians, and analogizers, each with its own master algorithm. This classification highlights the diverse approaches to understanding and implementing machine learning, emphasizing that no single method holds all the answers. Domingos argues that the future of machine learning lies not in a singular, all-encompassing algorithm but in integrating insights from these different schools of thought. He likens the process of machine learning to farming, where data acts as soil and algorithms as seeds, cultivating knowledge through iterative learning. The author also underscores the importance of common sense and human oversight in guiding intelligent machines, asserting that ethical considerations and human goals should shape technological advancements. Additionally, Domingos delves into the philosophical implications of machine learning, likening the quest for the master algorithm to humanity's broader search for understanding in a complex world. Ultimately, he posits that the ultimate power of machine learning will depend not just on the algorithms themselves, but on how we choose to wield them in our lives.
30 popular highlights from this book
Key Insights & Memorable Quotes
Below are the most popular and impactful highlights and quotes from The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World:
You could even say that the God of Genesis himself is a programmer: language, not manipulation, is his tool of creation. Words become worlds.
A learner that uses Bayes’ theorem and assumes the effects are independent given the cause is called a Naïve Bayes classifier. That’s because, well, that’s such a naïve assumption.
If you’re a lazy and not-too-bright computer scientist, machine learning is the ideal occupation, because learning algorithms do all the work but let you take all the credit.
Believe it or not, every algorithm, no matter how complex, can be reduced to just these three operations: AND, OR, and NOT.
A good learner is forever walking the narrow path between blindness and hallucination.
Our beliefs are based on our experience, which gives us a very incomplete picture of the world, and it's easy to jump to false conclusions.
data mining means “torturing the data until it confesses.
People often think computers are all about numbers, but they’re not. Computers are all about logic.
The second simplest algorithm is: combine two bits. Claude Shannon, better known as the father of information theory, was the first to realize that what transistors are doing, as they switch on and off in response to other transistors, is reasoning. (That was his master’s thesis at MIT—the most important master’s thesis of all time.)
Homo sapiens is the species that adapts the world to itself instead of adapting itself to the world.
Your job in a world of intelligent machines is to keep making sure they do what you want, both at the input (setting the goals) and at the output (checking that you got what you asked for).
Listen to your customers, not to the HiPPO,” HiPPO being short for “highest paid person’s opinion.
Learning is forgetting the details as much as it is remembering the important parts.
God created not species but the algorithm for creating species.
prior probability that the sun will rise, since it’s prior to seeing any evidence. It’s not based on counting the number of times the sun has risen on this planet in the past, because you weren’t there to see it; rather, it reflects your a priori beliefs about what will happen, based on your general knowledge of the universe. But now the stars start to fade, so your confidence that the sun does rise on this planet goes up, based on your experience on Earth. Your confidence is now a posterior probability, since it’s after seeing some evidence. The sky begins to lighten, and the posterior probability takes another leap. Finally, a sliver of the sun’s bright disk appears above the horizon and perhaps catches “the Sultan’s turret in a noose of light,” as in the opening verse of the Rubaiyat. Unless you’re hallucinating, it is now certain that the sun will rise. The crucial question is exactly how the posterior probability should evolve as you see more evidence. The answer is Bayes’ theorem. We can think of it in terms of cause and effect. Sunrise causes the stars to fade and the sky to lighten, but the latter is stronger evidence of daybreak, since the stars could fade in the middle of the night due to, say, fog rolling in. So the probability of sunrise should increase more after seeing the sky lighten than after seeing the stars fade. In mathematical notation, we say that P(sunrise | lightening-sky), the conditional probability of sunrise given that the sky is lightening, is greater than P(sunrise | fading-stars), its conditional probability given that the stars are fading. According to Bayes’ theorem, the more likely the effect is given the cause, the more likely the cause is given the effect: if P(lightening-sky | sunrise) is higher than P(fading-stars | sunrise), perhaps because some planets are far enough from their sun that the stars still shine after sunrise, then P(sunrise | lightening sky) is also higher than P(sunrise | fading-stars).
As so often happens in computer science, we’re willing to sacrifice efficiency for generality.
Whoever has the best algorithms and the most data wins. A new type of network effect takes hold: whoever has the most customers accumulates the most data, learns the best models, wins the most new customers, and so on in a virtuous circle (or a vicious one, if you’re the competition).
Each of the five tribes of machine learning has its own master algorithm, a general-purpose learner that you can in principle use to discover knowledge from data in any domain. The symbolists’ master algorithm is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference, and the analogizers’ is the support vector machine.
We’re face-to-face with our old foe: the combinatorial explosion. Therefore we do what we always have to do in life: compromise. We make simplifying assumptions that whittle the number of probabilities we have to estimate down to something manageable. A very simple and popular assumption is that all the effects are independent given the cause. This means that, for example, having a fever doesn’t change how likely you are to also have a cough, if we already know you have the flu. Mathematically, this is saying that P(fever, cough | flu) is just P(fever | flu) × P(cough | flu). Lo and behold: each of these is easy to estimate from a small number of observations. In fact, we did it for fever in the previous section, and it would be no different for cough or any other symptom. The number of observations we need no longer goes up exponentially with the number of symptoms; in fact, it doesn’t go up at all.
Common sense is important not just because your mom taught you so, but because computers don’t have it.
every time I fire a linguist, the recognizer’s performance goes up.
Our search for the Master Algorithm is complicated, but also enlivened, by the rival schools of thought that exist within machine learning. The main ones are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers. Each tribe has a set of core beliefs, and a particular problem that it cares most about. It has found a solution to that problem, based on ideas from its allied fields of science, and it has a master algorithm that embodies it.
Learning algorithms are the seeds, data is the soil, and the learned programs are the grown plants. The machine-learning expert is like a farmer, sowing the seeds, irrigating and fertilizing the soil, and keeping an eye on the health of the crop but otherwise staying out of the way.
For a Bayesian, in fact, there is no such thing as the truth; you have a prior distribution over hypotheses, after seeing the data it becomes the posterior distribution, as given by Bayes’ theorem, and that’s all.
How do we learn? Is there a better way? What can we predict? Can we trust what we’ve learned? Rival schools of thought within machine learning have very different answers to these questions. The main ones are five in number, and we’ll devote a chapter to each. Symbolists view learning as the inverse of deduction and take ideas from philosophy, psychology, and logic. Connectionists reverse engineer the brain and are inspired by neuroscience and physics. Evolutionaries simulate evolution on the computer and draw on genetics and evolutionary biology. Bayesians believe learning is a form of probabilistic inference and have their roots in statistics. Analogizers learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization.
Evolutionaries and connectionists have something important in common: they both design learning algorithms inspired by nature. But then they part ways. Evolutionaries focus on learning structure; to them, fine-tuning an evolved structure by optimizing parameters is of secondary importance. In contrast, connectionists prefer to take a simple, hand-coded structure with lots of connections and let weight learning do all the work. This is machine learning’s version of the nature versus nurture controversy, and there are good arguments on both sides.
Machine learning will not single-handedly determine the future, any more than any other technology; it’s what we decide to do with it that counts, and now you have the tools to decide.
The main ones are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers. Each tribe has a set of core beliefs, and a particular problem that it cares most about. It has found a solution to that problem, based on ideas from its allied fields of science, and it has a master algorithm that embodies it. For symbolists, all intelligence can be reduced to manipulating symbols, in the same way that a mathematician solves equations by replacing expressions by other expressions. Symbolists understand that you can’t learn from scratch: you need some initial knowledge to go with the data. They’ve figured out how to incorporate preexisting knowledge into learning, and how to combine different pieces of knowledge on the fly in order to solve new problems. Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it as general as possible. For connectionists, learning is what the brain does, and so what we need to do is reverse engineer it. The brain learns by adjusting the strengths of connections between neurons, and the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly. The connectionists’ master algorithm is backpropagation, which compares a system’s output with the desired one and then successively changes the connections in layer after layer of neurons so as to bring the output closer to what it should be. Evolutionaries believe that the mother of all learning is natural selection. If it made us, it can make anything, and all we need to do is simulate it on the computer. The key problem that evolutionaries solve is learning structure: not just adjusting parameters, like backpropagation does, but creating the brain that those adjustments can then fine-tune. The evolutionaries’ master algorithm is genetic programming, which mates and evolves computer programs in the same way that nature mates and evolves organisms. Bayesians are concerned above all with uncertainty. All learned knowledge is uncertain, and learning itself is a form of uncertain inference. The problem then becomes how to deal with noisy, incomplete, and even contradictory information without falling apart. The solution is probabilistic inference, and the master algorithm is Bayes’ theorem and its derivates. Bayes’ theorem tells us how to incorporate new evidence into our beliefs, and probabilistic inference algorithms do that as efficiently as possible. For analogizers, the key to learning is recognizing similarities between situations and thereby inferring other similarities. If two patients have similar symptoms, perhaps they have the same disease. The key problem is judging how similar two things are. The analogizers’ master algorithm is the support vector machine, which figures out which experiences to remember and how to combine them to make new predictions.
Machine learners, like all scientists, resemble the blind men and the elephant: one feels the trunk and thinks it’s a snake, another leans against the leg and thinks it’s a tree, yet another touches the tusk and thinks it’s a bull. Our aim is to touch each part without jumping to conclusions; and once we’ve touched all of them, we will try to picture the whole elephant.
Our goal is to figure out the simplest program we can write such that it will continue to write itself by reading data, without limit, until it knows everything there is to know.