Public HealthPublic reporting of surgeon outcomes: low numbers of procedures lead to false complacency
Introduction
From the summer of 2013, outcomes of some surgical procedures will be reported for individual surgeons as part of the English National Health Service (NHS) Commissioning Board's new policy.1 This policy follows the example of the Society for Cardiothoracic Surgery in Great Britain and Ireland (SCTS)2 and several US states (eg, New York3), which report mortality for adult cardiac procedures by surgeon. The aim is to allow patients to choose their surgeon and clinicians to improve outcomes of care. However, when overall numbers of specific procedures are low, correct identification of a surgeon with poor performance is challenging, even if mortality is high.4 The danger is that low numbers mask poor performance and lead to false complacency.
We examine this issue in relation to reporting of surgical mortality for individual surgeons for adult cardiac surgery, plus key procedures in three other specialties: oesophagectomy or gastrectomy for oesophagogastric cancer; bowel cancer resection; and hip fracture surgery. We address three questions. First, what number of procedures is necessary for reliable detection of poor performance? Second, how many surgeons in each specialty actually do this number of procedures in a period of 1, 3, or 5 years? Third, what is the probability that a surgeon identified as a statistical outlier has truly poor performance? Finally, we offer recommendations about how surgeon performance can be assessed in a meaningful way. We used postoperative mortality as an example to address these questions, because it is the outcome that will be reported for English surgeons this summer.
Section snippets
Number of procedures
The number of adult cardiac surgeries done in NHS hospitals is fairly high: 50% of cardiac surgeons do between 60 and 170 per year.2 Many other procedures are done less frequently, which means statistical power is poor and that poorly performing surgeons are unlikely to be correctly identified. In this context, statistical power is the probability that a surgeon with poor performance will be detected as a statistical outlier—ie, as significantly worse than average. For example, 80% power means
Proportion of surgeons who do the necessary number of procedures
We estimated the proportion of surgeons who do a sufficient number of procedures to achieve 60%, 70%, and 80% power to detect poor performance (table 2).2, 5 These proportions are calculated for reporting periods of 1, 3, and 5 years, assuming that the overall rate of mortality remains constant with time. The SCTS reports surgeon-level mortality with 3 years of data.2 Its data show that about three-quarters of surgeons do sufficient numbers of cardiac operations to achieve 60% statistical power
Correct identification of poor performance
Not all surgeons identified as statistical outliers will truly have poor performance. The proportion correctly identified as having poor performance is known as the positive predictive value.10 The number correctly identified depends on the significance threshold, how many procedures a surgeon does, and the prevalence of poor performance. With standard diagnostic reasoning, it can be calculated that, if one in 20 cardiac surgeons truly had poor performance, 63% would be correctly identified on
Improving statistical power
There are options for improvements in statistical power other than the pooling of data over time, but each introduces problems of its own. First, data for different procedures could be pooled. However when outcomes differ between procedures, this approach could prevent fair comparisons of outcomes. We grouped gastrectomy, which has a mortality of 6·9%, with oesophagectomy, which has a mortality of 5·7%.7 Additionally, cardiac procedures were grouped together, combining coronary bypass surgery
Implications of the new policy
Reporting of outcomes for individual surgeons for cardiac surgery in the UK has largely been viewed as a success.11, 12 As we have shown, numbers of cardiac surgeries are sufficient to allow the process of detection to operate with reasonable statistical power. However, we believe that consultant-level reporting could be far less effective for other specialties. The concern about false identification of poor performance has received much attention in view of the stigma attached to poor
Wider issues
Several wider issues have been raised previously about the reporting of surgeon outcomes, mainly related to adequate adjustment for patient case mix, the accuracy with which the responsible surgeon can be identified, and the shared responsibility for the care of patients within teams.17 Operative mortality, including unavoidable deaths, might not be a good proxy for preventable mortality. Of particular relevance is the mean proportion of deaths that can be prevented: if this proportion is low,
References (21)
- et al.
The New York State cardiac registries: history, contributions, limitations, and lessons for future efforts to assess and publicly report healthcare outcomes
J Am Coll Cardiol
(2012) - et al.
Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma
Lancet
(2004) - et al.
Autonomy, beneficence, justice, and the limits of provider profiling
J Am Coll Cardiol
(2012) Everyone counts: planning for patients 2013/14
UK surgeons' results
- et al.
Absence of evidence is not evidence of absence
BMJ
(1995) Hospital episode statistics
National Report 2012
National oesophago-gastric cancer audit
National bowel cancer audit
Cited by (94)
Commentary: Safety in numbers
2021, Journal of Thoracic and Cardiovascular SurgeryPerioperative mortality as a meaningful indicator: Challenges and solutions for measurement, interpretation, and health system improvement
2020, Anaesthesia Critical Care and Pain MedicinePrognostic Risk Modelling for Patients Undergoing Major Lower Limb Amputation: An Analysis of the UK National Vascular Registry
2020, European Journal of Vascular and Endovascular Surgery