Reviewing the evidence-base

what-do-they-know_largeThe Atlantic summarizes a recently published attempt to assess the evidence-base.

No one is entirely clear on how Brian Nosek pulled it off, including Nosek himself. Over the last three years, the psychologist from the University of Virginia persuaded some 270 of his peers to channel their free time intorepeating 100 published psychological experiments to see if they could get the same results a second time around. There would be no glory, no empirical eurekas, no breaking of fresh ground. Instead, this initiative—the Reproducibility Project—would be the first big systematic attempt to answer questions that have been vexing psychologists for years, if not decades. What proportion of results in their field are reliable?

A few signs hinted that the reliable proportion might be unnervingly small. Psychology has been recently rocked by several high-profile controversies, including: the publication of studies that documented impossible effects like precognition, failures to replicate the results of classic textbook experiments, and some prominent cases of outright fraud.

The findings were not pretty.

As such, the results of the Reproducibility Project, published today in Science, have been hotly anticipated.

They make for grim reading. Although 97 percent of the 100 studies originally reported statistically significant results, just 36 percent of the replications did.

And, this doesn’t even consider whether the study and coverage of it even speak to the outcomes that patients and their families want.

Does this mean we should ignore research? No. But, it does mean we should be very careful consumers of it. And, we should probably be skeptical of those whose express excessive certitude on the basis of their evidence-base, especially when they discount experiential knowledge.

There is some good news, and some bad news coming from all of this.

On the positive side:

A 1997 US law mandated the registry’s creation, requiring researchers from 2000 to record their trial methods and outcome measures before collecting data. The study found that in a sample of 55 large trials testing heart-disease treatments, 57% of those published before 2000 reported positive effects from the treatments. But that figure plunged to just 8% in studies that were conducted after 2000. Study author Veronica Irvin, a health scientist at Oregon State University in Corvallis, says this suggests that registering clinical studies is leading to more rigorous research.

The downside? From education advocate, Parker Palmer:

. . . when measurable, short-term outcomes become the only or primary standard for assessing our efforts, the upshot is as pathetic as it is predictable: we take on smaller and smaller tasks—the only kind that yield instantly visible results—and abandon the large, impossible but vital jobs we are here to do.

We must judge ourselves by a higher standard than effectiveness, the standard called faithfulness. Are we faithful to the community on which we depend, to doing what we can in response to its pressing needs?

Palmer’s concerns point to to the potential for increasingly narrow definitions of effectiveness that may not speak to the real world needs of patients. Particularly, in the case of complex diseases with social, emotional and environmental factors.

Long term ignorance

Keith Humphreys laments the short term focus of the evidence base for medications.

. . . the evidence base is almost useless for answering questions about the long-term costs and benefits of opioid medications. In the 41 randomized clinical trials that Furlan et al. review, the impact of the medication was evaluated for an average of only 5 weeks. That’s enough to show acute pain relief effects, but leaves us in the dark about what happens to the millions of people who take these medications over longer periods.

None of this is unique to the study of pain medication. Anti-depressants, which are prescribed long-term to tens of millions of people, are typically evaluated in 12 week trials. I suspect experts in cardiology, rheumatology and endocrinology could provide examples in their own areas of medications that are widely prescribed for the long term but only evaluated in the short term.


The primary reason for the short-termism of pharmaceutical research is that much of it is funded by the industry itself. Short-term studies are cheaper and meet the FDA requirements for demonstrating efficacy. If there are long-term problems with a medication a company is developing, it would be economically foolish of them to design a study that would reveal it.

The public of course has a different balance of interests and cares what happens to its health beyond a 5 or 12 week window. But public research money to pursue those interests scientifically is drying up, meaning that we will continue to learn about the long-term effects of many medications only by nervously watching what happens when millions of people take them in the course of receiving health care. That’s no way to protect the public health.

Follow his blog and read the rest here.