Book Notes/The Alignment Problem: Machine Learning and Human Values
Cover of The Alignment Problem: Machine Learning and Human Values

The Alignment Problem: Machine Learning and Human Values

by Brian Christian

In "The Alignment Problem: Machine Learning and Human Values," Brian Christian explores the critical intersection between artificial intelligence (AI) and human values, delving into the complexities of aligning machine learning systems with our ethical frameworks. Central to the discussion is the concept of inverse reinforcement learning (IRL), which seeks to infer human intentions and values from observed behaviors, reflecting the broader human quest to understand each other's motivations. Christian underscores the inherent challenges posed by machine learning in decision-making across various sectors, such as healthcare and criminal justice. He highlights the potential for algorithms to perpetuate biases present in their training data, emphasizing the need for fairness and transparency in AI systems. Through vivid examples, including observational studies of children's innate understanding of goals and intentions, he illustrates how human cognition can inform the development of AI. The book also probes the moral implications of entrusting AI with consequential decisions, urging a deeper reflection on how we can instill our values in these increasingly autonomous systems. Ultimately, Christian presents the alignment problem as not merely a technical issue but a profound ethical dilemma, urging collaboration between technologists and ethicists to navigate the delicate balance of human and machine interaction in the digital age.

11 popular highlights from this book

Key Insights & Memorable Quotes

Below are the most popular and impactful highlights and quotes from The Alignment Problem: Machine Learning and Human Values:

You should consider that Imitation is the most acceptable part of Worship, and that the Gods had much rather Mankind should Resemble, than Flatter them. —MARCUS AURELIUS
What the field needed, he argued, was what he called inverse reinforcement learning. Rather than asking, as regular reinforcement learning does, “Given a reward signal, what behavior will optimize it?,” inverse reinforcement learning (or “IRL”) asks the reverse: “Given the observed behaviour, what reward signal, if any, is being optimized?”15 This is, of course, in more informal terms, one of the foundational questions of human life. What exactly do they think they’re doing? We spend a good fraction of our life’s brainpower answering questions like this. We watch the behavior of others around us—friend and foe, superior and subordinate, collaborator and competitor—and try to read through their visible actions to their invisible intentions and goals. It is in some ways the cornerstone of human cognition. It also turns out to be one of the seminal and critical projects in twenty-first-century AI.
The basic training procedure for the perceptron, as well as its many contemporary progeny, has a technical-sounding name—“stochastic gradient descent”—but the principle is utterly straightforward. Pick one of the training data at random (“stochastic”) and input it to the model. If the output is exactly what you want, do nothing. If there is a difference between what you wanted and what you got, then figure out in which direction (“gradient”) to adjust each weight—whether by literal turning of physical knobs or simply the changing of numbers in software—to lower the error for this particular example. Move each of them a little bit in the appropriate direction (“descent”). Pick a new example at random, and start again. Repeat as many times as necessary.
As we’re on the cusp of using machine learning for rendering basically all kinds of consequential decisions about human beings in domains such as education, employment, advertising, health care and policing, it is important to understand why machine learning is not, by default, fair or just in any meaningful way.
They said people should have the right to ask for an explanation of algorithmically made decisions.
Indeed, the embeddings, simple as they are—just a row of numbers for each word, based on predicting nearby missing words in a text—seemed to capture a staggering amount of real-world information.
The only simplicity for which I would give a straw is that which is on the other side of the complex—not that which never has divined it. —OLIVER WENDELL HOLMES JR.39
As the artist Robert Irwin put it: “Human beings living in and through structures become structures living in and through human beings.
Curiosity bred competence.
University of Michigan psychologist Felix Warneken walks across the room, carrying a tall stack of magazines, toward the doors of a closed wooden cabinet. He bumps into the front of the cabinet, exclaims a startled “Oh!,” and backs away. Staring for a moment at the cabinet, he makes a thoughtful “Hmm,” before shuffling forward and bumping the magazines against the cabinet doors again. Again he backs away, defeated, and says, pitiably, “Hmmm . . .” It’s as if he can’t figure out where he’s gone wrong. From the corner of the room, a toddler comes to the rescue. The child walks somewhat unsteadily toward the cabinet, heaves open the doors one by one, then looks up at Warneken with a searching expression, before backing away. Warneken, making a grateful sound, puts his pile of magazines on the shelf.1 Warneken, along with his collaborator Michael Tomasello of Duke, was the first to systematically show, in 2006, that human infants as young as eighteen months old will reliably identify a fellow human facing a problem, will identify the human’s goal and the obstacle in the way, and will spontaneously help if they can—even if their help is not requested, even if the adult doesn’t so much as make eye contact with them, and even when they expect (and receive) no reward for doing so.2
Ever since there have been video games, there has been a subfield of study into the question of what makes them fun, and what makes one game more fun than another. There are obvious economic as well as psychological stakes in this.

More Books You Might Like