The Problem With Evidence-Based Policies


Ricardo Hausmann in Project Syndicate:

The current so-called “gold standard” of what constitutes good evidence is the randomized control trial, or RCT, an idea that started in medicine two centuries ago, moved to agriculture, and became the rage in economics during the past two decades. Its popularity is based on the fact that it addresses key problems in statistical inference.

For example, rich people wear fancy clothes. Would distributing fancy clothes to poor people make them rich? This is a case where correlation (between clothes and wealth) does not imply causation.

Harvard graduates get great jobs. Is Harvard good at teaching – or just at selecting smart people who would have done well in life anyway? This is the problem of selection bias.

RCTs address these problems by randomly assigning those participating in the trial to receive either a “treatment” or a “placebo” (thereby creating a “control” group). By observing how the two groups differ after the intervention, the effectiveness of the treatment can be assessed. RCTs have been conducted on drugs, micro-loans, training programs, educational tools, and myriad other interventions.

Suppose you are considering the introduction of tablets as a way to improve classroom learning. An RCT would require that you choose some 300 schools to participate, 150 of which would be randomly assigned to the control group that receives no tablets. Prior to distributing the tablets, you would perform a so-called baseline survey to assess how much children are learning in school. Then you give the tablets to the 150 “treatment” schools and wait. After a period of time, you would carry out another survey to find out whether there is now a difference in learning between the schools that received tablets and those that did not.

Suppose there are no significant differences, as has been the case with four RCTs that found that distributing books also had no effect. It would be wrong to assume that you learned that tablets (or books) do not improve learning. What you have shown is that that particular tablet, with that particular software, used in that particular pedagogical strategy, and teaching those particular concepts did not make a difference.

But the real question we wanted to answer was how tablets should be used to maximize learning. Here the design space is truly huge, and RCTs do not permit testing of more than two or three designs at a time – and test them at a snail’s pace. Can we do better?

