Book Notes/Human Compatible: Artificial Intelligence and the Problem of Control
Cover of Human Compatible: Artificial Intelligence and the Problem of Control

Human Compatible: Artificial Intelligence and the Problem of Control

by Stuart Russell

In "Human Compatible: Artificial Intelligence and the Problem of Control," Stuart Russell explores the intricate relationship between artificial intelligence (AI) and human values, emphasizing the necessity for AI systems to align with human preferences. Central to his argument is the recognition that current frameworks for implementing AI often overlook the complexity of human behavior and the difficulty in defining true human purposes. Russell critiques the naivety of democratic societies that assume truth will prevail in information dissemination, highlighting the challenges posed by misinformation, particularly through social media algorithms that manipulate user preferences for profit. Russell introduces a set of principles intended to guide AI developers, stressing that machines should aim to maximize human preferences while remaining uncertain about what those preferences entail, with human behavior serving as the ultimate source of guidance. He warns against the risks of allowing powerful AI to operate without strict controls, drawing parallels to historical missteps, such as the mythical King Midas. Furthermore, he discusses the potential of AI to revolutionize sectors like transportation and education, advocating for a proactive approach to ensure AI remains beneficial. Ultimately, Russell calls for a collective effort to design AI systems that are not only efficient but also intrinsically safe and aligned with human well-being, urging society to prepare for a future where superintelligent systems could dramatically reshape existence. The book serves as both a cautionary tale and a hopeful vision for the integration of AI in human life.

30 popular highlights from this book

Key Insights & Memorable Quotes

Below are the most popular and impactful highlights and quotes from Human Compatible: Artificial Intelligence and the Problem of Control:

The right to mental security does not appear to be enshrined in the Universal Declaration. Articles 18 and 19 establish the rights of “freedom of thought” and “freedom of opinion and expression.” One’s thoughts and opinions are, of course, partly formed by one’s information environment, which, in turn, is subject to Article 19’s “right to . . . impart information and ideas through any media and regardless of frontiers.” That is, anyone, anywhere in the world, has the right to impart false information to you. And therein lies the difficulty: democratic nations, particularly the United States, have for the most part been reluctant—or constitutionally unable—to prevent the imparting of false information on matters of public concern because of justifiable fears regarding government control of speech. Rather than pursuing the idea that there is no freedom of thought without access to true information, democracies seem to have placed a naïve trust in the idea that the truth will win out in the end, and this trust has left us unprotected.
Alas, the human race is not a single, rational entity. It is composed of nasty, envy-driven, irrational, inconsistent, unstable, computationally limited, complex, evolving, heterogeneous entities. Loads and loads of them. These issues are the staple diet—perhaps even raisons d'être—of the social sciences.
Finally, methods of control can be direct if a government is able to implement rewards and punishments based on behavior. Such a system treats people as reinforcement learning algorithms, training them to optimize the objective set by the state. The temptation for a government, particularly one with a top-down, engineering mind-set, is to reason as follows: it would be better if everyone behaved well, had a patriotic attitude, and contributed to the progress of the country; technology enables measurement of individual behavior, attitudes, and contributions; therefore, everyone will be better off if we set up a technology-based system of monitoring and control based on rewards and punishments.
To get just an inkling of the fire we're playing with, consider how content-selection algorithms function on social media. They aren't particularly intelligent, but they are in a position to affect the entire world because they directly influence billions of people. Typically, such algorithms are designed to maximize click-through, that is, the probability that the user clicks on presented items. The solution is simply to present items that the user likes to click on, right? Wrong. The solution is to change the user's preferences so that they become more predictable. A more predictable user can be fed items that they are likely to click on, thereby generating more revenue. People with more extreme political views tend to be more predictable in which items they will click on. (Possibly there is a category of articles that die-hard centrists are likely to click on, but it’s not easy to imagine what this category consists of.) Like any rational entity, the algorithm learns how to modify its environment —in this case, the user’s mind—in order to maximize its own reward.
the impossibility of defining true human purposes correctly and completely. This, in turn, means that what I have called the standard model—whereby humans attempt to imbue machines with their own purposes—is destined to fail. We might call this the King Midas problem: Midas, a legendary king in ancient Greek mythology, got exactly what he asked for—namely, that everything he touched should turn to gold. Too late, he discovered that this included his food, his drink, and his family members, and he died in misery and starvation.
This ability of a single box to carry out any process that you can imagine is called universality, a concept first introduced by Alan Turing in 1936.31 Universality means that we do not need separate machines for arithmetic, machine translation, chess, speech understanding, or animation: one machine does it all.
Human drivers in the United States suffer roughly one fatal accident per one hundred million miles traveled, which sets a high bar. Autonomous vehicles, to be accepted, will need to be much better than that: perhaps one fatal accident per billion miles, or twenty-five thousand years of driving forty hours per week.
Intelligent machines with this capability would be able to look further into the future than humans can. They would also be able to take into account far more information. These two capabilities combined lead inevitably to better real-world decisions. In any kind of conflict situation between humans and machines, we would quickly find, like Garry Kasparov and Lee Sedol, that our every move has been anticipated and blocked. We would lose the game before it even started.
There’s only a limited amount that AI researchers can do to influence the evolution of global policy on AI. We can point to possible applications that would provide economic and social benefits; we can warn about possible misuses such as surveillance and weapons; and we can provide roadmaps for the likely path of future developments and their impacts. Perhaps the most important thing we can do is to design AI systems that are, to the extent possible, provably safe and beneficial for humans.
find it helpful to summarize the approach in the form of three3 principles. When reading these principles, keep in mind that they are intended primarily as a guide to AI researchers and developers in thinking about how to create beneficial AI systems; they are not intended as explicit laws for AI systems to follow:4 The machine’s only objective is to maximize the realization of human preferences. The machine is initially uncertain about what those preferences are. The ultimate source of information about human preferences is human behavior. Before
One particular scene from Small World struck me: The protagonist, an aspiring literary theorist, attends a major international conference and asks a panel of leading figures, “What follows if everyone agrees with you?” The question causes consternation, because the panelists had been more concerned with intellectual combat than ascertaining truth or attaining understanding. It occurred to me then that an analogous question could be asked of the leading figures in AI: “What if you succeed?” The field’s goal had always been to create human-level or superhuman AI, but there was little or no consideration of what would happen if we did.
History has shown, of course, that a tenfold increase in global GDP per capita is possible without AI—it’s just that it took 190 years (from 1820 to 2010) to achieve that increase.
Thus, the direct effects of technology work both ways: at first, by increasing productivity, technology can increase employment by reducing the price of an activity and thereby increasing demand; subsequently, further increases in technology mean that fewer and fewer humans are required. Figure 8 illustrates these developments.
The potential benefits of fully autonomous vehicles are immense. Every year, 1.2 million people die in car accidents worldwide and tens of millions suffer serious injuries.
The Global Learning XPRIZE competition, which started in 2014, offered $15 million for “open-source, scalable software that will enable children in developing countries to teach themselves basic reading, writing and arithmetic within 15 months.” Results from the winners, Kitkit School and onebillion, suggest that the goal has largely been achieved.
A reasonable target for autonomous vehicles would be to reduce these numbers by a factor of ten.
Even if the technology is successful, the transition to widespread autonomy will be an awkward one: human driving skills may atrophy or disappear, and the reckless and antisocial act of driving a car oneself may be banned altogether.
In the Berkeley lab of my colleague Pieter Abbeel, BRETT (the Berkeley Robot for the Elimination of Tedious Tasks) has been folding piles of towels since 2011, while the SpotMini robot from Boston Dynamics can climb stairs and open doors
One way to protect the functioning of reputation systems is to inject sources that are as close as possible to ground truth. A single fact that is certainly true can invalidate any number of sources that are only somewhat trustworthy, if those sources disseminate information contrary to the known fact. In many countries, notaries function as sources of ground truth to maintain the integrity of legal and real-estate information; they are usually disinterested third parties in any transaction and are licensed by governments or professional societies.
For example, a machine with basic reading capabilities will be able to read everything the human race has ever written by lunchtime, and then it will be looking around for something else to do.22
With speech recognition capabilities, it could listen to every radio and television broadcast before teatime. For comparison, it would take two hundred thousand full-time humans just to keep up with the world’s current level of print publication (let alone all the written material from the past) and another sixty thousand to listen to current broadcasts.
A second reason for declining to provide a date for superintelligent AI is that there is no clear threshold that will be crossed. Machines already exceed human capabilities in some areas. Those areas will broaden and deepen, and it is likely that there will be superhuman general knowledge systems, superhuman biomedical research systems, superhuman dexterous and agile robots, superhuman corporate planning systems, and so on well before we have a completely general superintelligent AI system. These “partially superintelligent” systems will, individually and collectively, begin to pose many of the same issues that a generally intelligent system would.
As Alfred North Whitehead wrote in 1911, “Civilization advances by extending the number of important operations which we can perform without thinking about them.
That’s how things used to be done in personal travel too: if you wanted to travel from Europe to Australia and back in the seventeenth century, it would have involved a huge project costing vast sums of money, requiring years of planning, and carrying a high risk of death. Now we are used to the idea of transportation as a service (TaaS): if you need to be in Melbourne early next week, it just requires a few taps on your phone and a relatively minuscule amount of money.
The pie of pride is also finite: only 1 percent of people can be in the top 1 percent on any given metric. If human happiness requires being in the top 1 percent, then 99 percent of humans are going to be unhappy, even when the bottom 1 percent has an objectively splendid lifestyle.60 It will be important, then, for our cultures to gradually down-weight pride and envy as central elements of perceived self-worth.
The desire for relative advantage over others, rather than an absolute quality of life, is a positional good;
If Bob envies Alice, he derives unhappiness from the difference between Alice’s well-being and his own; the greater the difference, the more unhappy he is. Conversely, if Alice is proud of her superiority over Bob, she derives happiness not just from her own intrinsic well-being but also from the fact that it is higher than Bob’s. It is easy to show that, in a mathematical sense, pride and envy work in roughly the same way as sadism; they lead Alice and Bob to derive happiness purely from reducing each other’s well-being, because a reduction in Bob’s well-being increases Alice’s pride, while a reduction in Alice’s well-being reduces Bob’s envy.31 Jeffrey Sachs, the renowned development economist, once told me a story that illustrated the power of these kinds of preferences in people’s thinking. He was in Bangladesh soon after a major flood had devastated one region of the country. He was speaking to a farmer who had lost his house, his fields, all his animals, and one of his children. “I’m so sorry—you must be terribly sad,” Sachs ventured. “Not at all,” replied the farmer. “I’m pretty happy because my damned neighbor has lost his wife and all his children too!
With AI tutors, the potential of each child, no matter how poor, can be realized. The cost per child would be negligible, and that child would live a far richer and more productive life. The pursuit of artistic and intellectual endeavors,
am not saying that success in AI will necessarily happen, and I think it’s quite unlikely that it will happen in the next few years. It seems prudent, nonetheless, to prepare for the eventuality. If all goes well, it would herald a golden age for humanity, but we have to face the fact that we are planning to make entities that are far more powerful than humans. How do we ensure that they never, ever have power over us?
Even in the 1950s, computers were described in the popular press as “super-brains” that were “faster than Einstein.” So can we say now, finally, that computers are as powerful as the human brain? No. Focusing on raw computing power misses the point entirely. Speed alone won’t give us AI. Running a poorly designed algorithm on a faster computer doesn’t make the algorithm better; it just means you get the wrong answer more quickly. (And with more data there are more opportunities for wrong answers!) The principal effect of faster machines has been to make the time for experimentation shorter, so that research can progress more quickly. It’s not hardware that is holding AI back; it’s software. We don’t yet know how to make a machine really intelligent—even if it were the size of the universe.

More Books You Might Like