Article Text

PDF

Learning from mistakes in clinical practice guidelines: the case of perioperative β-blockade
  1. Mark D Neuman1,2,
  2. Charles L Bosk1,2,3,
  3. Lee A Fleisher1,2
  1. 1Department of Anesthesiology and Critical Care, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
  2. 2The Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
  3. 3Department of Sociology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
  1. Correspondence to Dr Mark D Neuman, Department of Anesthesiology and Critical Care, Perelman School of Medicine at the University of Pennsylvania, 423 Guardian Drive, 1119A Blockley Hall, Philadelphia, PA 19104, USA; neumanm{at}mail.med.upenn.edu

Statistics from Altmetric.com

Introduction

For more than two decades, the role of β-blockers in preventing cardiac complications after surgery has been among the most hotly contested and controversial topics in medical practice. Based on two small randomised trials published in the late 1990s,1 ,2 leading physicians and experts in patient safety embraced preoperative β-blocker initiation as a therapeutic victory for high-risk surgical patients: an apparently simple and effective treatment that promised, for the first time, to prevent life-threatening postoperative cardiac events.

Yet nearly as soon as preoperative β-blocker initiation had come to be seen as a ‘best practice’, its status was cast into doubt. New randomised trials published between 2005 and 2008 failed to confirm promising early findings and highlighted the potential for harm with β-blocker overuse.3–6 Recommendations that had previously urged widespread preoperative β-blocker initiation among high-risk patients7 ,8 were softened or reversed.9 ,10 Debates over whether or not β-blockers were safe for surgical patients displaced discussions on how to promote their use on a large scale.

On one level, what may be most remarkable about the rise and fall of preoperative β-blocker guidelines is how unremarkable it seems. Preoperative β-blockade is only one of several recent examples in which expert endorsements of promising therapies changed markedly when new evidence highlighted potential harms that had been overlooked by these endorsements. Yet the β-blocker story differs in important ways from canonical examples of reversals in recommendations for medical practice. Prominent retellings of how expert recommendations changed regarding hormone replacement therapy (HRT) for women after menopause, for example, have emphasised the pitfalls of relying on non-randomised studies rather than randomised controlled trials in defining benefits and harms of therapies.11 ,12 In contrast, preoperative β-blocker initiation was elevated rapidly to the status of a best practice specifically because randomised trials had suggested that it could be effective. As we shall see, the β-blocker story shows how the prestige that medical researchers and clinicians afford to randomised controlled trials can obscure important uncertainties surrounding new treatments, particularly when placed in political contexts that prioritise the rapid translation of research into practice. As such, it provides an important counterpoint to dominant narratives of evidence reversal in medicine and helps to explain other recent examples where guidelines went wrong not because they overlooked the need for randomised trials but because of experts’ very faith in such trials.

Construction of an evidence-based practice: the emergence of perioperative β-blockade

Preoperative β-blocker research grew out of longstanding efforts to characterise cardiac postoperative events as a distinct clinical syndrome. Set in motion by Lee Goldman's 1977 bedside risk scoring system,13 ,14 the study of cardiac risk in non-cardiac surgery was already a defined area of academic inquiry by the mid-1990s. Yet despite ample research on risk stratification, practitioners lacked effective treatments known to reduce cardiac risk. When the American College of Cardiology and the American Heart Association's (ACC/AHA) first guidelines on preoperative care appeared in 1996, for example, they included detailed recommendations concerning risk assessment, but offered minimal guidance regarding treatment.15

This changed in December 1996 when Dennis Mangano, an anaesthesiologist at the University of California, San Francisco, published results from a trial that randomised 200 patients either to preoperative treatment with atenolol, a generic β-blocker, or to placebo. The results, which appeared in the New England Journal of Medicine, were almost too good to be true: compared with placebo, atenolol was associated with large, sustained decreases in postoperative mortality.1 Among patients who survived to hospital discharge, for example, 90% of those randomised to receive atenolol were alive at 2 years versus 79% of those randomised to receive placebo (p=0.019).1

Notably, Mangano's study employed a randomised, placebo-controlled trial design; by the mid-1990s, the randomised, placebo-controlled trial had been securely established in medical thought as the ‘gold standard’ of medical evidence. This phenomenon, which grew in part out of the US Food and Drug Administration's 1970 decision to require such trials for the approval of new medications, was promoted by subsequent efforts over the following decades by practitioners within the new fields of biostatistics,16 clinical epidemiology and evidence-based medicine17 to encourage the use of randomised trial designs in research and the application of their findings to clinical practice.

Mangano's study had an immediate impact. In August 1997, the American College of Physicians (ACP) published its own guideline on preoperative care, which included a dramatic ‘stop-the-presses’ addendum calling attention to an ‘important publication that we believe should alter current practice’.18 In it, the authors highlighted both the long duration of effect ascribed to atenolol in Mangano's report and the study's randomised, placebo-controlled design; ultimately, they recommended the immediate adoption of Mangano's findings on a large scale, urging ‘the perioperative use of atenolol in patients with coronary artery disease or risk factors for coronary artery disease (as per the criteria of Mangano and colleagues)’.19

In the lay press, atenolol was hailed as an ‘Rx for deaths after surgery’.20 An editorial in the leading anaesthesiology journal characterised β-blockers as ‘incredibly useful, incredibly underused’ and as carrying ‘remarkable…short and-long term benefits’.21 Others characterised preoperative β-blockade as being nearly risk-free, as its known downsides, which included hypotension and bradycardia, were ‘not only uncommon, but…also readily responsive to therapy’.22

Yet Mangano's work also had its critics. In a detailed, point-by-point critique of Mangano's paper, Jacqueline Leung, another University of California, San Francisco (UCSF) anaesthesiologist and a former collaborator of Mangano's, wrote in 1999 that ‘the recent popularized use of perioperative beta blockade is based on misinformation’.23 Patients in Mangano's study who took β-blockers at home but were randomised to placebo had had their home β-blockers discontinued before randomisation, potentially creating ‘withdrawal’ effects that might have led to relatively worse outcomes in this group.24 ,25 Further, Mangano's findings regarding 2-year survival were based on an analysis of patients who had survived to hospital discharge. Yet these survival differences no longer met criteria for statistical significance when analysed in a proper ‘intent-to-treat’ approach that included all subjects.26

Such concerns were somewhat allayed in 1999 when Don Poldermans and colleagues at the Erasmus Medical Center in Rotterdam published results from a second randomised, placebo-controlled trial of preoperative β-blockade. Poldermans's study—also published in the New England Journal—enrolled 112 vascular surgery patients at high risk of postoperative cardiac events, and its results echoed the dramatic findings of Mangano's earlier study. Compared with placebo, β-blockers were associated with a tenfold reduction in the risk of any postoperative cardiac event at 30 days from 34% to 3.4% (p<0.001).2

Poldermans's study had its own limitations. It examined a small, highly selected patient sample27 ,28 and was terminated early due to apparent evidence of benefit, increasing the potential for a misleading finding.29 ,30 Even so, an accompanying editorial cited Poldermans's ‘extraordinary’ results and proclaimed that ‘the era in which physicians can only guess at how to reduce a patient's risk of perioperative cardiac complications seems to be ending’.31 To others, the results of the 1996 and 1999 trials accorded ‘in gratifyingly intuitive fashion’32 with physiological principles, establishing that ‘for the first time, in addition to accurately identifying patients who are at increased risk for cardiovascular events, we can intervene to lower that risk’.33

Such enthusiasm translated into recommendations for clinical practice. In 2001, the US Agency for Healthcare Research and Quality (AHRQ) released ‘Making Health Care Safer’, its first major review of evidence on ‘practices relevant to improving patient safety’. The AHRQ report, prepared by investigators at UCSF and Stanford University, reviewed 79 practices and rated 11 ‘most highly in terms of strength of the evidence’. Of these 11—each of which was characterised as a ‘clear opportunity for safety improvement’—preoperative β-blocker initiation for high-risk patients was the second most highly rated and was described as a ‘major advance in perioperative medicine’ whose ‘wider use…should be promoted and studied’.7 ,34

A year later, the revised ACC/AHA guidelines on perioperative care also incorporated new recommendations in favour of wider preoperative β-blocker use. A new class I recommendation (‘procedure/treatment should be performed/administered’) endorsed preoperative β-blocker initiation for high-risk patients who met similar criteria to those used in Poldermans's trial; a less definitive class IIa recommendation (‘it is reasonable to perform procedure/administer treatment’) stated that β-blockers also had potential to benefit a broader group of moderate-to-high-risk patients similar to those enrolled in Mangano's 1996 study.35

These recommendations informed larger efforts to promote preoperative β-blocker use. In 2003, the Leapfrog Group, a US coalition of public and private healthcare purchasers, classified preoperative β-blockade for high-risk patients as a hospital quality standard.36 ,37 Hospitals and clinical departments undertook strategies to promote local adherence to β-blocker guidelines.38 ,39 In clinical practice, use of preoperative β-blockade increased rapidly between 1999 and 2005,40 reflecting one observer's sentiment that ‘the paradigm is shifting from predicting which patient is at high risk…to minimizing the likelihood of such an event with specific perioperative pharmacologic therapy’.41

Second thoughts

Despite the elevation of preoperative β-blocker initiation as a ‘best practice’ by expert groups, some investigators continued to question the validity of the evidence. In a meta-analysis published in July 2005, P.J. Devereaux, of McMaster University in Hamilton, Ontario, found ‘the evidence that perioperative beta-blockers reduce major cardiovascular events is encouraging but too unreliable to allow definitive conclusions to be drawn’.42 That same month, Peter Lindenauer, an internist at the Baystate Medical Center in Springfield, Massachusetts, published a retrospective cohort study in the New England Journal of Medicine that examined outcomes with and without perioperative β-blocker therapy among more than 700 000 non-cardiac surgical patients. Among high-risk patients, Lindenauer found preoperative β-blockade to be associated with a reduced risk of in-hospital death; yet among the lowest-risk patients, preoperative β-blockade was found to be associated with substantially greater postoperative mortality.43

Along with Devereaux's and Lindenauer's reports, three new randomised trials between April 2005 and November 2006 cast further doubt on Mangano's and Poldermans's initial findings. Together, the Perioperative Beta Blockade (POBBLE) trial, the Metoprolol After Vascular Surgery (MaVS) trial and the Diabetic Postoperative Morbidity and Mortality (DiPOM) trial enrolled more than four times the number of patients studied by Mangano and Poldermans combined; none of them observed any difference in postoperative outcomes with preoperative β-blockade versus placebo.4–6

In May 2008, Devereaux's own randomised trial of 8351 patients—named the PeriOperative Ischemic Evaluation (POISE)—showed lower rates of postoperative myocardial infarction, but higher rates of stroke and death, among high-risk surgical patients who were randomised to preoperative metoprolol versus placebo.3 POISE was criticised for its use of large, fixed doses of preoperative β-blockers, which critics considered to be riskier than titrated dosing approaches based on physiological endpoints.44–46 Yet POISE, along with POBBLE, MaVS and DiPOM, nonetheless provoked further uncertainty about the value of preoperative β-blockers for surgical patients.

Changes in clinical practice guidelines reflected this uncertainty. In 2006, even before POISE had been completed, the ACC/AHA downgraded their earlier class IIa recommendation regarding preoperative β-blockade for large groups of moderate-to-high-risk patients.47 In 2009, the ACC/AHA downgraded their recommendation regarding β-blocker initiation in the highest-risk patients from class I to class IIa and added a new class III recommendation (‘procedure/treatment should NOT be performed/administered’) to warn against the initiation of high-dose preoperative β-blockade for β-blocker naïve patients.10

AHRQ also made changes to their recommendations regarding preoperative β-blockade. In 2001, preoperative β-blockade had been the second most ‘highly rated’ of 79 patient safety practices reviewed by the AHRQ team; yet in the second edition of ‘Making Health Care Safer’, released in 2013, it did not appear among AHRQ's 22 ‘strongly encouraged’ or ‘encouraged’ patient safety practices.9 Instead, citing research that had appeared since 2001 to show 'that perioperative β blockers have mixed benefits and harms', the 2013 AHRQ report stated that ‘preoperative beta blockers…should not be considered a patient safety practice for all patients’.9

Aftermath

Beyond the conflicting results of the β-blocker trials themselves, discourse on perioperative care became still more complicated in 2011 when Don Poldermans was dismissed from his position at Erasmus University for violations of academic integrity, including the possible fabrication of research data.48 While none of Poldermans's documented infractions pertained to his 1999 study, his dismissal compounded existing confusion surrounding the proper interpretation of his research and led to calls for further re-evaluation of guidelines.49–51

Yet the controversy surrounding Poldermans's breaches of academic integrity is best viewed as a worrisome footnote to a larger story. By 2011, the status of preoperative β-blockers as an evidence-based practice had already come full circle. Over approximately 15 years, preoperative β-blockade had been elevated to the status of a ‘paradigm-shifting’ best practice and a symbol of safe medical care only to fall out of favour again as accumulating evidence called attention to previously overlooked potential harms (see figure 1).

Figure 1

Timeline of recommendations made by three US organisations regarding preoperative β-blocker initiation, 1997–2013. Items highlighted in green indicate statements broadening the indications for preoperative β-blocker initiation; items in yellow indicate statements restricting these indications. The broken line charts the cumulative number of patients randomised to preoperative β-blockers versus placebo across six key trials (lead author in parentheses): (A) Multicenter Study of Perioperative Ischemia (McSPI) atenolol study (Mangano)1; (B) Dutch Echocardiographic Cardiac Risk Evaluation Applying Stress Echocardiography (DECREASE-I; Poldermans)2; (C) Perioperative Beta Blockade (POBBLE; Brady)4; (D) Diabetic Postoperative Morbidity and Mortality (DiPOM; Juul)5; (E) Metoprolol After Vascular Surgery (MaVS; Yang)6; (F) PeriOperative Ischemic Evaluation (POISE; Devereaux).3 As shown, statements in favour of expanding β-blocker use from 1997 through 2002 occurred at a time when a relatively small number of patients had been studied randomised trials; as further evidence appeared between 2005 and 2008, guideline statements were revised to recommend more restricted use of preoperative β-blockers. ACP, American College of Physicians; AHRQ, US Agency for Healthcare Research and Quality; ACC/AHA, American College of Cardiology/American Heart Association.

To a degree, experts’ willingness to endorse Mangano's and Poldermans's trials was linked to their knowledge of concurrent efforts to promote the broader use of β-blockers for other conditions in which they had been shown to offer health benefits, such as the care of patients after myocardial infarction.52 ,53 To their advocates, the similarities in the effects ascribed to β-blockers in early preoperative trials and in historical studies of β-blockers after acute myocardial infarction made the ‘too good to be true’ findings of the 1996 and 1999 trials seem paradoxically ‘intuitive’. It became easy, in the words of one researcher, to give β-blockers ‘a free pass’ in terms of their likely risks for surgical patients (AD Auerbach, personal communication, 2013). Insofar as they took the safety of β-blockers in surgical populations for granted based on their experiences with β-blockers in other contexts, expert endorsements of early β-blocker trials underestimated the true risks of initiating β-blocker therapy prior to surgery.

In this sense, the story of perioperative β-blocker recommendations stands in contrast to other episodes in the history of medicine in which experts have argued over the primacy of physiological plausibility versus randomised controlled trial evidence as the proper basis for medical decision making. Early debates about the efficacy of coronary artery bypass grafting (CABG), for example, centred on cardiologists’ and cardiac surgeons’ differing beliefs regarding the adequacy of changes in physiology, as measured by angiography, versus statistical evaluation of health outcomes, as proof of CABG's efficacy.54 ,55 In contrast, proponents of Mangano's and Poldermans's trials emphasised the apparent concordance between the direction of effects demonstrated in these trials and prevailing beliefs regarding the salutary effects of β-blockade on cardiovascular physiology. This apparent concordance made the posited benefits of β-blockers for perioperative treatment seem self-evident and contributed to guideline authors’ seeing the extraordinary magnitude of Mangano's and Poldermans's findings as evidence in itself of their trials’ veracity rather than as a cause for further questioning.

The course of perioperative β-blocker recommendations over time also illustrates the extent to which interpretations of trial evidence were influenced not only by questions of scientific rigour but also by the larger political and professional importance that observers assigned to these findings. Beyond suggesting a novel treatment for a clinical problem, Mangano's and Poldermans's findings also lent new legitimacy both to perioperative medicine as a clinical domain and to the nascent patient safety movement itself. Prior to Mangano's 1996 paper, experts in perioperative medicine could make statements about a given patient's risk, but lacked the ability to do anything to change that risk; yet once it appeared, they gained a powerful new ability to intervene that lent importance to their role in clinical care and a new significance to their longstanding practices of risk stratification. In a similar vein, by reframing postoperative cardiac events as ‘patient safety problems’, AHRQ researchers could also claim Mangano's and Poldermans's trials as victories insofar as they provided patient safety advocates an opportunity to align their own nascent movement with the conventions of evidence-based medicine.

Conclusion: preoperative β-blocker recommendations in context

The story of preoperative β-blockers differs in important ways from other well-documented historical examples of reversals in practice recommendations. From the 1970s through the late 1990s, for example, HRT was widely recommended to treat symptoms of menopause and prevent heart disease in older women based largely on evidence from non-randomised, observational studies. Yet after randomised, placebo-controlled trials in 1998 and 2002 showed that oestrogen replacement actually increased the risk of heart disease, stroke and cancer, earlier recommendations in favour of widespread oestrogen use came to appear to have been misguided. As such, the rise and fall of HRT has typically been used to illustrate the perils of basing recommendations for clinical practice on non-randomised studies versus randomised controlled trials. As Jerry Avorn has noted: The only real hero to emerge from [the] complicated story [of HRT] is the randomized controlled trial. On the guilty side is its country cousin the observational study, which the estrogen debacle indicated as a codefendant or at least as an unintentional coconspirator.11

In contrast, early recommendations in favour of preoperative β-blockers occurred specifically because randomised controlled trials suggested potential benefits. Unlike the story of HRT recommendations, the rise and fall of β-blocker recommendations took place not because of a failure to recognise the limitations of non-randomised versus randomised studies; instead, it happened largely because of the prestige that expert observers were willing to assign to the randomised controlled trial as a form of evidence. To borrow Avorn's terminology, early β-blocker advocates ‘heroised’ Mangano's and Poldermans's initial studies specifically because of their randomised designs; as a result, these same experts were willing to trust the study results in spite of important limitations, including their relatively small sample sizes, their single-centre designs and, in the case of Poldermans's trial, the early termination of the trial for perceived benefit.

In this way, the rise and fall of preoperative β-blocker recommendations offers a useful starting point for understanding other instances that occurred over these same years where promising, trial-tested therapies—such as tight glucose control in the intensive care unit56–58 and activated protein C for sepsis59–61—initially gained experts’ endorsements as ‘best practices’ only to fall out of favour a few years later. As with preoperative β-blockers, each of these therapies provided new hope for the treatment of highly morbid conditions; in each case, experts latched onto the findings of one or two randomised trials and rapidly incorporated them into guideline recommendations. And in each case, initial expert recommendations for these therapies were revised and downgraded only a few years later when early endorsements were shown to have overlooked important potential harms.

We can only speculate as to why, in these instances, findings of early randomised trials were taken up so rapidly, particularly in light of historical observations of substantial delays in the translation of research findings into expert recommendations for care.62 Preoperative β-blocker initiation, tight glucose control in the intensive care unit and activated protein C for sepsis may have risen rapidly to ‘best practice’ status potentially because of the lack of other efficacious therapies to treat the clinical conditions they addressed. Further, their rapid incorporation into guidelines may have been facilitated by their appearance alongside advances in information technology in the 1990s and 2000s that changed how new research findings were communicated or by the increasing relevance of practice guideline recommendations to health policy over time.63

Like other recent observations of changes over time in research surrounding other evidence-based treatments,64–66 the story of preoperative β-blockers echoes current concerns regarding widespread problems in replicating medical research findings, and the pitfalls of using arbitrary statistical thresholds to make determinations about the effects of medical treatments.67 Moreover, it aligns with recent trends towards questioning views of biomedical evidence that place randomised trials at the top of a hierarchy of strength,68–70 and efforts to develop approaches towards guideline creation that move away from automatic categorisations of evidentiary strength based on the designs of research studies.71

On a more fundamental level, though, the story of β-blocker recommendations shows how a sense of political urgency to discover and implement effective practices can lead to negative consequences when it is allowed to obscure uncertainties implicit even in evidence drawn from randomised controlled trials. Practice guidelines make possible the elision of distinctions between treatments that are supported by the ‘best available evidence’ and treatments that will actually work as intended in practice. In so doing, they serve the purpose of providing clinicians and health policymakers with recommendations for action; yet they also potentially complicate more gradual efforts to understand what unanticipated risks might accompany the dissemination of new treatments.

Seen in this light, the up-and-down story of preoperative β-blockers appears not as lone aberration in judgement on the part of a few experts, but as a case study that offers practical lessons in how to make guidelines more ‘trustworthy’ going forward. In particular, it stresses the importance of avoiding over-reliance on one or two prominent studies as the basis for definitions of best practices; the pitfalls of hierarchical systems that automatically assign randomised trials to a higher category of evidentiary ‘strength’ than other types of studies; and the potential value of built-in systems for ‘premortem’ analyses to assess the potential harms that might accrue if recommendations—particularly those with major implications for practice—are later found to have been misguided.

Ultimately, guidelines’ stated goal of improving health through better care may be best served not by dogmatic adherence to expert advice but by active discourse that places individual recommendations within their larger scientific, social, political and historical context. The story of preoperative β-blocker guidelines illustrates the extent to which making guidelines more trustworthy requires that we understand in greater detail the processes by which individual recommendations form and evolve over time. Moreover, it requires that experts, clinicians and policymakers alike be prepared to quickly modify our understanding of best practices as new evidence becomes available.

References

View Abstract

Footnotes

  • Contributors All authors contributed to the planning, conduct and reporting of the work described in the article. MDN, CLB and LAF jointly developed the idea for the article. MDN performed literature searches and obtained feedback from key informants. MDN, CLB and LAF analysed and interpreted the data. MDN wrote the first draft of the article; CLB and LAF provided extensive critical revisions on article drafts. MDN is the guarantor and accepts full responsibility for the article. MDN had access to all study data and controlled the decision to publish. Additional contributions: Michael Cirullo provided research assistance. We received valuable feedback on a previous version of this manuscript from Mary Dixon-Woods, Renee Fox, Sandy Schwartz, Robert Aronowitz, Kim Eagle, Allan Detsky and Amir Qaseem.

  • Funding This study was supported by a grant to MDN from the National Institute on Aging, Bethesda, MD (grant 1K08AG043548).

  • Competing interests LAF has been a volunteer member and chair of ACC/AHA guideline committees from 1996 to the present; no compensation was provided for his work on these guidelines.

  • Ethics approval University of Pennsylvania School of Medicine IRB.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.