Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society
By Jim Manzi
(Basic Books, 320 pages, $28.99)
THE FIRST THING Barack Obama did as president was enact the stimulus. Two of his economic advisers, Christina Romer and Jared Bernstein, preceded the rollout of the $800 billion bill with a now infamous prediction that it would prevent unemployment from climbing above 8 percent.
Three-plus years later, the unemployment rate hasn’t fallen below 8 percent (as of this writing), and Romer and Bernstein are a cautionary tale about the perils of making predictions.
Yet, while the public mostly thinks of the stimulus as a self-evident failure, its authors are unconvinced. If anything, they think it should have been bigger. Almost no one who had a strong opinion about the wisdom of stimulus before 2009 has changed his mind since.
As the administration argues, it’s quite possible that unemployment is still sky-high not because the stimulus failed, but because the economy was in even worse shape than was thought at the time of the Romer-Bernstein prediction. If the stimulus hadn’t been enacted, unemployment might even be worse than it is today.
In other words, there’s no way to assess the stimulus without knowing the counterfactual of what would have happened if the stimulus had never passed. The question is: Without knowing what would have happened in the world in which the stimulus never passed, could even the most sophisticated study provide a definitive estimate of its impact?
Jim Manzi, author of Uncontrolled, believes the answer is no: The question of whether the stimulus worked simply cannot be answered with the available evidence. To the extent that social scientists rely on studies like those justifying or discrediting the stimulus, they are fundamentally practicing an art, not a science.
Through his work as a software consultant, Manzi, who’s also associated with the Manhattan Institute and National Review, has become convinced that a type of study known as randomized field trials (RFTs) can do what other models cannot, in politics as well as business: generate reliable predictions.
What is a randomized field trial? It is essentially a clinical trial, or what most people would think of as the backbone of hard science and medicine. For example, to test a drug’s effectiveness, a researcher would give a dose of the drug to a treatment group in a randomly selected sample and placebos to a control group. By measuring the difference between the two groups afterward, the researcher obtains evidence that doctors and patients can trust.
Even simpler, think of kindergartners watering only half of a flower bed. When the other half withers in a day or so, the kids have absolute confi dence that water helps plants.
RFTs essentially bring the same approach to different areas. Although in some ways they are simpler than many prevailing statistical methods, Manzi argues persuasively that they could drastically improve the way governments operate.
The problem that RFTs address lies in what Manzi calls “causal density.” The factors affecting the outcome of any one intervention are just too numerous to control for them in a normal study.
Manzi gives an example from business to illustrate the concept of causal density. A company that owned thousands of convenience stores named QwikMart and FastMart asked Manzi to determine whether renaming all the QwikMarts “FastMart” would increase sales (the average FastMart had higher sales).
Although that sounds like an easy question, it proves quite diffi cult to answer reliably. Manzi points out that any number of factors—he goes through 32 before warning that the list could go on forever—could also affect sales, meaning that anyone trying to separate the impact of the name “FastMart” from all the other factors would fi nd doing so extremely difficult. Furthermore, each and every factor affects all the others, creating infi - nite interactions that any study would also have to control for.
For example, it’s possible that the presence of an ATM in a store increases sales, rather than the name “FastMart.” But Manzi cautions that to know for sure, one would also have to control for whether or not the ATM is, for instance, in a small or large store (it’s conceivable that ATMs in small stores Could lead to crowding and reduce sales). Furthermore, one would have to take into account higherorder interactions, such as whether or not a FastMart that has an ATM and is large is along a highway, in which case it’s probably so crowded that an ATM hurts sales even if it’s a large store.
And so on, ad infinitum, for every variable that can be thought of—and many that can’t. Anyone attempting to answer a question as simple as “does the name Fast- Mart increase sales” faces such an overwhelming density of possible causal factors that it is impossible for him to structure the study without making assumptions about which variables to include and which to exclude.
In other words, there is simply no way to answer the question without the risk of leaving out an important variable. He could present a plausible answer to the question of whether the name “Fast- Mart” increases sales, but that answer would inevitably reflect his own presuppositions.
Manzi concludes that RFTs present a possible resolution to some of our endless public debates, like the one over the stimulus. In part, he believes this because they have already made an impact in business. In particular, Manzi points to the success of Capital One, a credit card company that grew rapidly throughout the ’90s using RFTs to establish marketing strategies. The company would test new advertisements by randomly selecting treatment and control households, and then use the results to establish larger campaigns. Capital One ran about 60,000 such tests in 2000, according to Manzi, and has quickly grown into the Fortune 500 company it is today. Google, notably, also has incorporated RFTs into its business strategy, running 12,000 in 2009.
There are a few examples of RFTs being done for governments. The 1996 welfare reform, for one example, was partly motivated by a series of statelevel experiments. Manzi believes that far more could be done, in health care, crime prevention, education, and a host of other areas. The federal government, in particular, should be performing thousands of experiments a year.
A NUMBER of social scientists have begun to bring RFTs to the forefront of their specializations. The MIT economists Abhijit Banerjee and Esther Duflo use field experiments to study global poverty, by testing small-scale interventions such as distributing mosquito nets. Duflo won the John Bates Clark Medal, given to the best economist under age 40, in 2010. Manzi also cites the education economist Roland Fryer and the Chicago economist John List as RFT pioneers. If RFTs gain prominence in academia, they can also enter use in policymaking.
Of course, RFTs suffer from their own shortcomings. Mosquito nets that save lives in Africa may be useless in India, and school vouchers that increase test scores in Milwaukee could easily flop in Baltimore. Furthermore, RFTs are more likely to yield modest, specific answers to narrowly defined questions, rather than point to sweeping conclusions.
Manzi proposes that, ideally, many different functions of the federal government should be spun off to the states and then tested rigorously using RFTs as the states experiment with different approaches. In essence, he suggests that the government engage in the same sort of trial-and-error process that defines the private sector.
One obvious objection to Manzi’s enthusiasm for RFTs is that his technocratic view of their possibilities for government is divorced from reality. Democratic politics, after all, is about competing interests and coalitions, not best practices. Furthermore, it’s unlikely that governors or mayors, let alone legislators, are interested in running experiments that could prove to be failures.
Nevertheless, even if their use in government is generations away, RFTs could, in the meantime, dispel many myths that have entered the mainstream through suspect research. Uncontrolled is worthwhile for its reminder that most studies that shape public debates report little more than the authors’ biases, and have no predictive power whatsoever. In other words, Uncontrolled is a call for humility about what we know—and a little of that goes a long way.