Elsevier

The Lancet

Volume 382, Issue 9905, 16–22 November 2013, Pages 1674-1677
The Lancet

Public Health
Public reporting of surgeon outcomes: low numbers of procedures lead to false complacency

https://doi.org/10.1016/S0140-6736(13)61491-9Get rights and content

Summary

The English National Health Service published outcome information for individual surgeons for ten specialties in June, 2013. We looked at whether individual surgeons do sufficient numbers of procedures to be able to reliably identify those with poor performance. For some specialties, the number of procedures that a surgeon does each year is low and, as a result, the chance of identifying a surgeon with increased mortality rates is also low. Therefore, public reporting of individual surgeons' outcomes could lead to false complacency. We recommend use of outcomes that are fairly frequent, considering the hospital as the unit of reporting when numbers are low, and avoiding interpretation of no evidence of poor performance as evidence of acceptable performance.

Introduction

From the summer of 2013, outcomes of some surgical procedures will be reported for individual surgeons as part of the English National Health Service (NHS) Commissioning Board's new policy.1 This policy follows the example of the Society for Cardiothoracic Surgery in Great Britain and Ireland (SCTS)2 and several US states (eg, New York3), which report mortality for adult cardiac procedures by surgeon. The aim is to allow patients to choose their surgeon and clinicians to improve outcomes of care. However, when overall numbers of specific procedures are low, correct identification of a surgeon with poor performance is challenging, even if mortality is high.4 The danger is that low numbers mask poor performance and lead to false complacency.

We examine this issue in relation to reporting of surgical mortality for individual surgeons for adult cardiac surgery, plus key procedures in three other specialties: oesophagectomy or gastrectomy for oesophagogastric cancer; bowel cancer resection; and hip fracture surgery. We address three questions. First, what number of procedures is necessary for reliable detection of poor performance? Second, how many surgeons in each specialty actually do this number of procedures in a period of 1, 3, or 5 years? Third, what is the probability that a surgeon identified as a statistical outlier has truly poor performance? Finally, we offer recommendations about how surgeon performance can be assessed in a meaningful way. We used postoperative mortality as an example to address these questions, because it is the outcome that will be reported for English surgeons this summer.

Section snippets

Number of procedures

The number of adult cardiac surgeries done in NHS hospitals is fairly high: 50% of cardiac surgeons do between 60 and 170 per year.2 Many other procedures are done less frequently, which means statistical power is poor and that poorly performing surgeons are unlikely to be correctly identified. In this context, statistical power is the probability that a surgeon with poor performance will be detected as a statistical outlier—ie, as significantly worse than average. For example, 80% power means

Proportion of surgeons who do the necessary number of procedures

We estimated the proportion of surgeons who do a sufficient number of procedures to achieve 60%, 70%, and 80% power to detect poor performance (table 2).2, 5 These proportions are calculated for reporting periods of 1, 3, and 5 years, assuming that the overall rate of mortality remains constant with time. The SCTS reports surgeon-level mortality with 3 years of data.2 Its data show that about three-quarters of surgeons do sufficient numbers of cardiac operations to achieve 60% statistical power

Correct identification of poor performance

Not all surgeons identified as statistical outliers will truly have poor performance. The proportion correctly identified as having poor performance is known as the positive predictive value.10 The number correctly identified depends on the significance threshold, how many procedures a surgeon does, and the prevalence of poor performance. With standard diagnostic reasoning, it can be calculated that, if one in 20 cardiac surgeons truly had poor performance, 63% would be correctly identified on

Improving statistical power

There are options for improvements in statistical power other than the pooling of data over time, but each introduces problems of its own. First, data for different procedures could be pooled. However when outcomes differ between procedures, this approach could prevent fair comparisons of outcomes. We grouped gastrectomy, which has a mortality of 6·9%, with oesophagectomy, which has a mortality of 5·7%.7 Additionally, cardiac procedures were grouped together, combining coronary bypass surgery

Implications of the new policy

Reporting of outcomes for individual surgeons for cardiac surgery in the UK has largely been viewed as a success.11, 12 As we have shown, numbers of cardiac surgeries are sufficient to allow the process of detection to operate with reasonable statistical power. However, we believe that consultant-level reporting could be far less effective for other specialties. The concern about false identification of poor performance has received much attention in view of the stigma attached to poor

Wider issues

Several wider issues have been raised previously about the reporting of surgeon outcomes, mainly related to adequate adjustment for patient case mix, the accuracy with which the responsible surgeon can be identified, and the shared responsibility for the care of patients within teams.17 Operative mortality, including unavoidable deaths, might not be a good proxy for preventable mortality. Of particular relevance is the mean proportion of deaths that can be prevented: if this proportion is low,

References (21)

There are more references available in the full text version of this article.

Cited by (94)

  • Commentary: Safety in numbers

    2021, Journal of Thoracic and Cardiovascular Surgery
View all citing articles on Scopus
View full text