
The Book of Why: The New Science of Cause and Effect
by Judea Pearl
30 popular highlights from this book
Key Insights & Memorable Quotes
Below are the most popular and impactful highlights and quotes from The Book of Why: The New Science of Cause and Effect:
My emphasis on language also comes from a deep conviction that language shapes our thoughts. You cannot answer a question that you cannot ask, and you cannot ask a question that you have no words for.
you are smarter than your data. Data do not understand causes and effects; humans do.
Counterfactual reasoning, which deals with what-ifs, might strike some readers as unscientific. Indeed, empirical observation can never confirm or refute the answers to such questions.
I would rather discover one cause than be the King of Persia.
How much evidence would it take to convince us that something we consider improbable has actually happened? When does a hypothesis cross the line from impossibility to improbability and even to probability or virtual certainty?
skepticism has its place. Statisticians are paid to be skeptics; they are the conscience of science.
I conjecture, that human intuition is organized around casual, not statistical, relations.
If I could sum up the message of this book in one pithy phrase, it would be that you are smarter than your data. Data do not understand causes and effects; humans do.
You cannot answer a question that you cannot ask, and you cannot ask a question that you have no words for.
Data can tell you that the people who took a medicine recovered faster than those who did not take it, but they can’t tell you why. Maybe those who took the medicine did so because they could afford it and would have recovered just as fast without it.
while probabilities encode our beliefs about a static world, causality tells us whether and how probabilities change when the world changes, be it by intervention or by act of imagination.
Deep learning has instead given us machines with truly impressive abilities but no intelligence. The difference is profound and lies in the absence of a model of reality.
the surest kind of knowledge is what you construct yourself.
Where causation is concerned, a grain of wise subjectivity tells us more about the real world than any amount of objectivity.
Despite heroic efforts by the geneticist Sewall Wright (1889–1988), causal vocabulary was virtually prohibited for more than half a century. And when you prohibit speech, you prohibit thought and stifle principles, methods, and tools.
Much of this data-centric history still haunts us today. We live in an era that presumes Big Data to be the solution to all our problems. Courses in “data science” are proliferating in our universities, and jobs for “data scientists” are lucrative in the companies that participate in the “data economy.” But I hope with this book to convince you that data are profoundly dumb.
Fighting for the acceptance of Bayesian networks in AI was a picnic compared with the fight I had to wage for causal diagrams [in the stormy waters of statistics].
Counterfactuals are the building blocks of moral behavior as well as scientific thought. The ability to reflect on one’s past actions and envision alternative scenarios is the basis of free will and social responsibility. The algorithmization of counterfactuals invites thinking machines to benefit from this ability and participate in this (until now) uniquely human way of thinking about the world.
How was confounding defined then, and how should it be defined? Armed with what we now know about the logic of causality, the answer to the second question is easier. The quantity we observe is the conditional probability of the outcome given the treatment, P(Y | X). The question we want to ask of Nature has to do with the causal relationship between X and Y, which is captured by the interventional probability P( Y | do(X)). Confounding, then, should simply be defined as anything that leads to a discrepancy between the two: P(Y | X) != P(Y | do(X)). Why all the fuss.
definitions demand reduction, and reduction demands going to a lower rung.
Presumably the child brain is something like a notebook as one buys it from the stationer’s,” he wrote. “Rather little mechanism, and lots of blank sheets.
overrate it in the sense that they often control for many more variables than they need to and even for variables that they should not
I hope with this book to convince you that data are profoundly dumb. Data can tell you that the people who took a medicine recovered faster than those who did not take it, but they can’t tell you why. Maybe those who took the medicine did so because they could afford it and would have recovered just as fast without it.
With Bayesian networks, we had taught machines to think in shades of grey, and this was an important step toward humanlike thinking. But we still couldn't teach machines to understand causes and effects. We couldn't explain to a computer why turning the dial of a barometer won't cause rain.... Without the ability to envision alternate realities and contrast them with the currently existing reality, a machine...cannot answer the most basic question that makes us human: "Why?
It’s very important to realize that, contrary to traditional estimation in statistics, some queries may not be answerable under the current causal model, even after the collection of any amount of data. For example, if our model shows that both D and L depend on a third variable Z (say, the stage of a disease), and if we do not have any way to measure Z, then the query P(L | do(D)) cannot be answered. In that case it is a waste of time to collect data. Instead we need to go back and refine the model, either by adding new scientific knowledge that might allow us to estimate Z or by making simplifying assumptions (at the risk of being wrong)—for example, that the effect of Z on D is negligible.
Courses in “data science” are proliferating in our universities, and jobs for “data scientists” are lucrative in the companies that participate in the “data economy.” But I hope with this book to convince you that data are profoundly dumb. Data can tell you that the people who took a medicine recovered faster than those who did not take it, but they can’t tell you why. Maybe those who took the medicine did so because they could afford it and would have recovered just as fast without it.
This brings us to the saddest episode int he whole smoking-cancer controversy: the deliberate efforts of the tobacco companies to deceive the public about the health risks. If Nature is like a genie that answers a question truthfully but only exactly as it is asked, imagine how much more difficult it is for scientists to face an adversary that intends to deceive us. The cigarette wars were science’s first confrontation with organized denialism, and no one was prepared.The tobacco companies magnified any shred of scientific controversy they could. They set up their own Tobacco Industry Research Committee, a front organization that gave money to scientists to study issues related to cancer or tobacco—but somehow never got around to the central question. When they could find legitimate skeptics of the smoking-cancer connection—such as R. A.Fisher and Jacob Yerushalmy—the tobacco companies paid them consulting fees.
Scientists should seek shielded mediators whenever they face incurable confounders.
The hope—and at present, it is usually a silent one—is that the data themselves will guide us to the right answers whenever causal questions come up.
paradox is more than that: it should entail a conflict between two deeply held convictions.