Statistics from Altmetric.com
The years since launch of the Choosing Wisely Campaign1 2 have seen an increase in studies reporting interventions aimed at reducing low-value care, from unindicated imaging3 4 and laboratory tests3 4 to prescriptions for medicines5–7 that deliver no net benefit. Many describe use of some combination of the usual suspects of intervention types: education,5 8 performance feedback data (sometimes described as audit-feedback, social comparison or peer comparison), policy changes (eg, restricting release of blood products to 1 unit at a time based on a haemoglobin cut-off in non-haemorrhage situations)9 and computer provider order entry (CPOE)–based modifications (eg, alerts) or restrictions.10
In this issue of BMJ Quality and Safety, Ambasta and colleagues examined the impact of a social comparison and education intervention on routine blood test utilisation at a single academic medical centre.3 Trainees and attending physicians each received their own performance feedback in comparison with a group aggregate. Compared with controls, the intervention groups ordered fewer routine laboratory tests (incidence rate ratio 0.89; 95% CI 0.79 to 1.00; p=0.048) with an associated cost savings of $68 877 in Canadian dollars (p=0.020).
Before commenting on the intervention itself, the value of controlled comparisons bears noting. The statistical process control charts shown by Ambasta et al clearly demonstrate special cause variation, with an obvious reduction in the weekly mean number of target laboratory tests per patient-day. In addition, the reduction is temporally associated with the roughly 4.5-month intervention period and remains sustained during a 1-year post-intervention period. Yet, the three control sites show the same pattern. The results may thus constitute a case of secular trends, with a ‘rising tide lifting all boats’.11 The rising tide here presumably owes its origin to the widespread interest in eliminating low-value care, as with the Choosing Wisely Movement.
The authors argue against this interpretation with their difference-in-difference analysis that the decrease in testing at the intervention site included an 11% reduction beyond that observed at the control sites. How to interpret this small incremental reduction remains unclear. First, the control sites had slightly higher baseline testing rates than the intervention sites and the post-intervention differences look very similar. It is thus difficult to know if the additional reduction suggested by the analysis reflects a real effect or some degree of intrinsic differences between the control and intervention sites. Second, while difference-in-difference analyses have enjoyed long-standing use, experts have increasingly called attention to the potential for bias due to regression to the mean12 13 and looked for newer methods for drawing inferences from non-randomised comparisons.14
Even if the effect is attributable to the intervention and statistically significant, the ‘clinical significance’ remains questionable. An argument can be made about preventing potential downstream testing and treatment. But on the face of it, saving about $69 000 Canadian dollars (about £40 000 or US$49 000) over a year seems on the small side. Moreover, the costs here do not take into account the difference between fixed and incremental costs.15 The authors state that they obtained the costs of the laboratory tests targeted by their study from laboratory services provider to their provincial health system. Such numbers typically reflect dividing the costs for the machines and reagents and personnel by the number of tests run in a given time period. Yet, once the laboratory exists, the marginal costs from running more tests are typically vanishingly small. True savings do not accrue until one eliminates enough tests to pay one less technician or order one less batch of reagents.
Common problems with quality improvement interventions
The issues encountered by Ambasta and colleagues highlight recurring themes among quality improvement work: small effect sizes, unintended consequences and scale/sustainability.
1. Small effect sizes
In pointing these issues out, we do not single out the thoughtful intervention developed and reported by Ambasta and colleagues. Small effect sizes represent the rule not the exception when it comes to this sort of work. The social comparison they employed falls into the general category of audit and feedback. While this strategy can modify clinician behaviour,16 a Cochrane review of 140 randomised trials reported a median improvement in desired clinician behaviour of only 4%.17 A similar analysis of computerised decision support interventions showed a strikingly similar result.18 Across 32 trials involving various improvement targets (eg, ordering venous thromboembolism prophylaxis for inpatients or influenza vaccination for outpatients), computer reminders and more complex forms of decision support improved the proportion of patients who received the target processes of care by a median of only 4.2%. In other words, if 30% of patients in the control group received the desired element of care, 34% of patients in the intervention group would receive it. One of us (KGS) has recently updated this meta-analysis (currently under peer review). Across 100 trials, computerised reminders and decision support interventions delivered a median absolute improvement of 7%—that is, whatever percentage of patients were receiving the desired care at baseline, the typical intervention increased the percentage of patients who received that desired care by only 7%.
Lest these comments be interpreted as implying that something is fundamentally wrong with improvement science, it’s worth remembering that most clinical research produces null results. One telling analysis showed that, once trial registration became mandatory, the proportion of large cardiovascular trials reporting positive results dropped from 57% to 8%.19 Most areas of biomedical research produce small, incremental gains, with major improvements in patient outcomes few and far between. And, this same pattern of small effects (not to mention serious side effects) for new clinical treatments exists despite massive public and private investments in basic and clinical research—many orders of magnitude more than the resources invested in the science and practice of improving healthcare.
2. Unintended consequences
Not only do these common types of improvement interventions—computerised decision support, audit and feedback—typically achieve small improvements, they have downsides and require non-trivial resources to implement. For computerised reminders, clinicians develop alert fatigue and frustration over pop-up screens interrupting their workflow, especially as the interruptions do not relate to the task at hand. For instance, no clinician wants to see a reminder to screen for thromboembolism risk while trying to order intravenous antibiotics for a patient with sepsis. Social comparisons, such as report cards and other feedback type interventions, can also produce unintended effects.20 In other contexts, we certainly recognise that receiving performance critiques can produce negative reactions. In a qualitative analysis, social comparison elicited emotions among clinicians such as frustration/irritation (67%), resentment (33%) and embarrassment/shame (17%).21 In addition, the vast majority (83%) of clinicians in the study reflected on how they felt unfairly penalised. Furthermore, clinicians may suffer ‘scorecard overload’, or a type of informational feedback fatigue. The larger question remains, how many performance metrics can a clinician interpret and manage well? Answering this question likely requires weighing a number of factors, including the magnitude of the improvement produced by feeding back the performance reports, the degree to which the effect persists over time, the resources required to sustain performance feedback, and the risks for adverse effects such provider burnout or frustration.
3. Scale and sustainability
While all of the aforementioned factors deserve careful consideration, we have found scale and sustainability in particular to receive the least attention. Working in hospital settings, our experience is that interventions frequently target a single geographical area (eg, a hospital ward) or single clinical service (eg, the general surgery service) and tend to not spread to other wards or services. In addition, how long should an intervention like performance feedback continue to achieve a sustained effect? Within the published literature, the feedback delivered varies widely in frequency and rarely persists for more than 1 year.17 There is a paucity of data evaluating the effect after feedback ceases. Study designs focusing on active withdrawal of the intervention would prove invaluable for institutions with limited resources. Experts have proposed practical tips to optimise success when using practice feedback interventions although overall effectiveness remains unknown.22
Moving forward: choose target problems and interventions wisely
In summary, interventions aimed at changing clinician behaviour typically achieve small improvements, often carry unintended consequences23 and require resources to maintain. In that context, we need to think not just about ‘choosing wisely’ when we decide when to order a test or treatment for a patient, but when and how we intervene to improve care. The first question to answer is whether the problem is worth the investment (‘Is the juice worth the squeeze?’). Prior to the decision to intervene, we should carefully consider the following factors: Does addressing the problem significantly impact on any elements in the quadruple aim24 (population health, individual experience of care, per capita cost of healthcare, and provider well-being)? How likely it is that the intervention will produce a clinically worthwhile effect? Is the required effort (eg, direct, indirect and opportunity costs, time) justified? And is it possible to mitigate against unintended consequences?
After answering these questions and carefully choosing to move forward, additional factors to consider are scale and sustainability. For instance, it would seem sensible to maximise results by spreading an effective intervention beyond one single area of overuse such as inpatient laboratory testing. In particular, the approach of targeting each specific Choosing Wisely recommendations in isolation misses an opportunity for a broader, more impactful approach. Cited barriers to providing high-value care25 26 are often similar among a range of low-value practices, so applying a comprehensive approach across a number of areas would seem logical rather than focusing on each area as a discreet problem each with its own unique intervention. For example, CPOE-enabled changes are often used to reduce inpatient laboratory testing10 which can be scaled up to also reasonably guide appropriate medication prescribing (eg, antimicrobials27) and diagnostic testing (eg, chest radiographs28), and the impact of the sum could be greater than the individual components, making the investment worthwhile. Broadly sweeping systems-based interventions that have capacity to influence behaviour change across multiple areas have demonstrated success.29 When tackling low-value care, it is time to move away from ‘one-offs’ to a systematic approach that leverages carefully chosen interventions scaled across the health system in a sustainable fashion.
Twitter @christinesoong, @HyungChoMD
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Commissioned; internally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.