Cover of The Alignment Problem: Machine Learning and Human Values

The Alignment Problem: Machine Learning and Human Values

by Brian Christian

30 popular highlights from this book

Buy on Amazon

Key Insights & Memorable Quotes

Below are the most popular and impactful highlights and quotes from The Alignment Problem: Machine Learning and Human Values:

You should consider that Imitation is the most acceptable part of Worship, and that the Gods had much rather Mankind should Resemble, than Flatter them. —MARCUS AURELIUS
What the field needed, he argued, was what he called inverse reinforcement learning. Rather than asking, as regular reinforcement learning does, “Given a reward signal, what behavior will optimize it?,” inverse reinforcement learning (or “IRL”) asks the reverse: “Given the observed behaviour, what reward signal, if any, is being optimized?”15 This is, of course, in more informal terms, one of the foundational questions of human life. What exactly do they think they’re doing? We spend a good fraction of our life’s brainpower answering questions like this. We watch the behavior of others around us—friend and foe, superior and subordinate, collaborator and competitor—and try to read through their visible actions to their invisible intentions and goals. It is in some ways the cornerstone of human cognition. It also turns out to be one of the seminal and critical projects in twenty-first-century AI.
The basic training procedure for the perceptron, as well as its many contemporary progeny, has a technical-sounding name—“stochastic gradient descent”—but the principle is utterly straightforward. Pick one of the training data at random (“stochastic”) and input it to the model. If the output is exactly what you want, do nothing. If there is a difference between what you wanted and what you got, then figure out in which direction (“gradient”) to adjust each weight—whether by literal turning of physical knobs or simply the changing of numbers in software—to lower the error for this particular example. Move each of them a little bit in the appropriate direction (“descent”). Pick a new example at random, and start again. Repeat as many times as necessary.
As we’re on the cusp of using machine learning for rendering basically all kinds of consequential decisions about human beings in domains such as education, employment, advertising, health care and policing, it is important to understand why machine learning is not, by default, fair or just in any meaningful way.
They said people should have the right to ask for an explanation of algorithmically made decisions.
Indeed, the embeddings, simple as they are—just a row of numbers for each word, based on predicting nearby missing words in a text—seemed to capture a staggering amount of real-world information.
The only simplicity for which I would give a straw is that which is on the other side of the complex—not that which never has divined it. —OLIVER WENDELL HOLMES JR.39
As the artist Robert Irwin put it: “Human beings living in and through structures become structures living in and through human beings.
Curiosity bred competence.
University of Michigan psychologist Felix Warneken walks across the room, carrying a tall stack of magazines, toward the doors of a closed wooden cabinet. He bumps into the front of the cabinet, exclaims a startled “Oh!,” and backs away. Staring for a moment at the cabinet, he makes a thoughtful “Hmm,” before shuffling forward and bumping the magazines against the cabinet doors again. Again he backs away, defeated, and says, pitiably, “Hmmm . . .” It’s as if he can’t figure out where he’s gone wrong. From the corner of the room, a toddler comes to the rescue. The child walks somewhat unsteadily toward the cabinet, heaves open the doors one by one, then looks up at Warneken with a searching expression, before backing away. Warneken, making a grateful sound, puts his pile of magazines on the shelf.1 Warneken, along with his collaborator Michael Tomasello of Duke, was the first to systematically show, in 2006, that human infants as young as eighteen months old will reliably identify a fellow human facing a problem, will identify the human’s goal and the obstacle in the way, and will spontaneously help if they can—even if their help is not requested, even if the adult doesn’t so much as make eye contact with them, and even when they expect (and receive) no reward for doing so.2
Ever since there have been video games, there has been a subfield of study into the question of what makes them fun, and what makes one game more fun than another. There are obvious economic as well as psychological stakes in this.
search of the literature fails to reveal any studies in which clinical judgment has been shown to be superior to statistical prediction when both are based on the same codable input variables
If the untrained infant’s mind is to become an intelligent one, it must acquire both discipline and initiative. So far we have been considering only discipline. —ALAN TURING1
The story of human civilization, he notes, has always been about how to instill values in strange, alien, human-level intelligences who will inevitably inherit the reins of society from us—namely, our kids.
Griffiths views parenthood as a kind of proof of concept for the alignment problem. The story of human civilization, he notes, has always been about how to instill values in strange, alien, human-level intelligences who will inevitably inherit the reins of society from us—namely, our kids.
In seeing a kind of mind at work as it digests and reacts to the world, we will learn something both about the world and also, perhaps, about minds.
Our successes and failures alike in getting these systems to do “what we want,” it turns out, offer us an unflinching, revelatory mirror.
This is the delicacy of our present moment. Our digital butlers are watching closely. They see our private as well as our public lives, our best and worst selves, without necessarily knowing which is which or making a distinction at all. They by and large reside in a kind of uncanny valley of sophistication: able to infer sophisticated models of our desires from our behavior, but unable to be taught, and disinclined to cooperate. They’re thinking hard about what we are going to do next, about how they might make their next commission, but they don’t seem to understand what we want, much less who we hope to become.
system naïvely using word2vec, or something like it, might well observe that John is a word more typical of engineer résumés than Mary. And so, all things being equal, a résumé belonging to John will rank higher in “relevance” than an otherwise identical résumé belonging to Mary. Such examples are more than hypothetical. When one of the clients of Mark J. Girouard, an employment attorney, was vetting a résumé-screening tool from a potential vendor, the audit revealed that one of the two most positively weighted factors in the entire model was the name “Jared.” The client did not purchase the résumé-screening tool—but presumably others have.70
But what do you do if your dataset is as inclusive as possible—say, something approximating the entirety of written English, some hundred billion words—and it’s the world itself that’s biased?
There isn’t a world in which ProPublica couldn’t have found some number that was different that they could call bias. There’s no possible algorithm—there’s no possible version of COMPAS—where that article wouldn’t have been written.
They created a simple 3D maze game, where the agent is required to explore the maze and find an exit. In one version of the game, however, there is a television screen on one of the walls of the maze. Furthermore, the agent is given the ability to press a button that changes the channel on the television. What would happen? What happened is that the instant the agent comes within view of the TV screen, its exploration of the maze comes to a screeching halt. The agent centers the screen in its view and starts flipping through channels. Now it sees a video of an airplane in flight. Now it sees cute puppies. Now it sees a man seated at a computer. Now cars in downtown traffic. The agent keeps changing channels, awash in novelty and surprise. It never budges again.
okay as long as there was a chance it wasn’t sinful; this was condemned by Pope Innocent IX in 1591. Others advocated for “rigorism,” where something was forbidden if there was any chance at all that it was sinful; this was condemned by Pope Alexander VIII in 1690.73 A great number of other competing theories weighed the probability of a rule being correct or the percentage of reasonable people who believed it. “Probabiliorism,” for instance, held that you should do something only if it was less likely to be sinful than not; “equiprobabilism” held that it was also okay if the chance was perfectly even. The “pure probabilists” believed that a rule was optional as long as there was a “reasonable” probability that it might not be true; their cry was Lex dubia non obligat: “A doubtful law does not bind.” However, in contrast to the free-spirited laxists, the probabilists stressed that the argument for ignoring the rule, while it didn’t need to be more probable than the argument for obeying the law, nonetheless needed to be “truly and solidly probable, for if it is only slightly probable it has no value.”74 Much ink was spilled, many accusations of heresy hurled, and papal declarations issued during this time. The venerable Handbook of Moral Theology concludes its section on “The Doubting Conscience, or, Moral Doubt” by offering the “Practical Conclusion” that rigorism is too rigorous, and laxism too lax, but
Princeton cognitive scientist Tom Griffiths had an eerily similar situation happen with his own daughter. “She really liked cleaning things,” he tells me; “she would get excited about it. We got her her own little brush and pan. There were some, you know, chips on the floor, and she got her brush and pan and cleaned them up, and I said to her, ‘Wow! Great job! Good cleaning! Well done!’ ”40 With the right praise, Griffiths would manage to both foster motor-skill development in his daughter and get some help in keeping the house clean: a double parenting win. Or was it? His daughter found the loophole in seconds. “She looked up at us and smiled,” he says—“and then dumped the chips out of the pan, back onto the floor, and cleaned them up again to try and get more praise.
University of Toronto economist Joshua Gans wanted to enlist the help of his older daughter in potty training her younger brother. So he did what any good economist would do. He offered her an incentive: anytime she helped her brother go to the bathroom, she would get a piece of candy. The daughter immediately found a loophole that her father, the economics professor, had overlooked. “I realized that the more that goes in, the more comes out,” she says. “So I was just feeding my brother buckets and buckets of water.” Gans affirms: “It didn’t really work out too well.
They were in for an enormous shock. The network could almost perfectly tell a patient’s age and sex from nothing but an image of their retina. The doctors on the team didn’t believe the results were genuine. “You show that to someone,” says Poplin, “and they say to you, ‘You must have a bug in your model. ’Cause there’s no way you can predict that with such high accuracy.’ . . . As we dug more and more into it, we discovered that this wasn’t a bug in the model. It was actually a real prediction.
Rudin and her colleagues published a paper in 2018 showing that they could make a recidivism-prediction model as accurate as COMPAS that could fit into a single sentence: “If the person has more than three prior offenses, or is an 18-to-20-year-old male, or is 21-to-23 years old and has two or more priors, predict they will be rearrested; otherwise, not.
A statistical analysis appeared to affirm that there was a systemic disparity.7 The article ran with the logline “There’s software used across the country to predict future criminals. And it’s biased against blacks.
Broadly speaking, the idea of systems that “reason as if they are incomplete and potentially flawed in dangerous ways” and that strive for “brownie points in heaven”—even if it means forgoing explicit rewards in the here and now—sounds rather . . . Catholic. For centuries, Catholic theologians have struggled with the question of how to live life by the rules of their faith, given that there are often disagreements among scholars about exactly what the rules are. If, hypothetically, eight out of ten theologians think that eating fish on a Friday is perfectly acceptable, but one out of ten thinks it’s forbidden, and the other thinks it’s mandatory, then what is any reasonable, God-fearing Catholic to do?71 As the saying goes, “A man with a watch knows what time it is, but a man with two watches is never sure.”72 These were particularly hotly contested questions in the Early Modern period following the Middle Ages, between the fifteenth and eighteenth centuries. Some scholars advocated for “laxism,” where something was
Research on bias, fairness, transparency, and the myriad dimensions of safety now forms a substantial portion of all of the work presented at major AI and machine-learning conferences. Indeed, at the moment they are the most dynamic and fastest-growing areas, arguably, not just in computing but in all of science.

Search More Books

More Books You Might Like

Note: As an Amazon Associate, we earn from qualifying purchases