By Nick Emptage
Most experts agree that data (however defined) hold the key to addressing the health policy challenges of our time. If we had lots of studies directly comparing interventions, the theory goes, we’d know which treatments worked and which were a waste of resources. The conduct of these studies, known as comparative effectiveness research (or CER), is supported by data sources that are better organized and more ubiquitous than ever before, as well as vastly improved analytic methods and systems. But are the choices driven by these data likely to be valid? What if the logic behind CER theory falls apart when we use it to make policy decisions?
How CER Works
CER’s goals can be achieved in two ways. The first involves synthesizing comparative studies and helping patients and clinicians use them to inform specific decisions in specific encounters. The other way uses comparative research to inform the decisions of public policy-makers, who decide whether or not citizens will be allowed access to particular interventions. The latter approach is far more popular outside the United States, although it has gained momentum in American health care in recent years. Since the 2009 ARRA stimulus and the establishment of PCORI in the 2010 health reform, over $2 billion has been allocated to the planning, coordination and implementation of CER studies. Reasonable people will disagree on whether this is a good thing, but regardless of where you fall in this debate, there are a few issues worth considering.
Doing Comparative Effectiveness Right
There are numerous aspects of “effective” comparative effectiveness that I won’t go into here, because they’re obvious enough: being skeptical of poorly designed studies, being wary of things like publication bias, and so on. When it comes to the top-down approach to CER, there are some issues that you may not have thought of before. Specifically:
- Statistical significance isn’t always what it appears. The determination that one treatment “works” usually comes down to whether the p-value in a trial suggests a statistically significant difference between two groups. In any randomized trial, it’s assumed that the two groups don’t differ meaningfully apart from their treatment, and that variations in treatment response arise from random factors. Unfortunately, genomics and personalized medicine suggest that this variation is anything but random. Yet if treatment effects are truly heterogeneous within the patient population, the p-value doesn’t tell you anything about whether the effect will be significant in a given patient who walks through a clinician’s door. And it won’t be clear that one treatment is more “evidence-based” than another for that patient.
- Does the measured effect mean anything for the patient? Even if treatments effects are homogeneous, it’s worth asking whether a statistically significant difference has any meaning. Trials typically assume that a specific difference on an outcome measure is meaningful, and use power analyses to calculate how many subjects are needed to detect that difference. Yet it’s not clear that the outcome in question is one that patients or payers care about. And even if patients generally consider this measure meaningful, it doesn’t follow that the magnitude of difference driving the power analysis is important to everyone. Patients can have this conversation with their physician, but it’s hard to imagine how a government body can decide a treatment “works” if it can’t tell whether patients will realize any value from it.
- Is the “comparison” in comparative effectiveness valid? It’s no secret that the majority of medical care isn’t evidence-based. So how do we choose among treatments if the comparisons among them rest on interventions not backed by evidence? In drug research, we generally see head-to-head studies only in well-established medication classes; the initial studies almost always compare to placebo. Surgical trials often compare against sham procedures. And many new treatments are compared against interventions that have never been proven empirically, even if they’re the standard of care. So how can a policy-maker evaluate a treatment if it’s being compared to something that’s not itself evidence-based (let alone to something that isn’t even medical care)?
This isn’t to say that there’s no place for evidence in medical decision-making. Trials and systematic reviews can guide patients and providers as they evaluate treatment approaches, helping them make informed decisions. But there are inherent limitations to empirical data, and using them to replace a clinician’s judgment can easily lead to substandard care if the nuances of individual-level patient care aren’t considered.
Nicholas Emptage is the blogger behind healthycriticism.com. He is a health economist with extensive experience in both academic and private-sector health services research and technology evaluation. All the opinions herein are the author’s, and should not be attributed to any of his past or present employers or colleagues.