Introduction Patient-safety monitoring based on health-outcome indicators can lead to misinterpretation of changes in case mix. This study aimed to compare the detection of indicator variations between crude and case-mix-adjusted control charts using data from thyroid surgeries.
Methods The study population included each patient who underwent thyroid surgery in a teaching hospital from January 2006 to May 2008. Patient safety was monitored according to two indicators, which are immediately recognisable postoperative complications: recurrent laryngeal nerve palsy and hypocalcaemia. Each indicator was plotted monthly on a p-control chart using exact limits. The weighted κ statistic was calculated to measure the agreement between crude and case-mix-adjusted control charts.
Results We evaluated the outcomes of 1405 thyroidectomies. The overall proportions of immediate recurrent laryngeal nerve palsy and hypocalcaemia were 7.4% and 20.5%, respectively. The proportion of agreement in the detection of indicator variations between the crude and case-mix-adjusted p-charts was 95% (95% CI 85% to 99%). The strength of the agreement was κ=0.76 (95% CI 0.54 to 0.98). The single special cause of variation that occurred was only detected by the case-mix-adjusted p-chart.
Conclusions There was good agreement in the detection of indicator variations between crude and case-mix-adjusted p-charts. The joint use of crude and adjusted charts seems to be a reasonable approach to increase the accuracy of interpretation of variations in outcome indicators.
- Safety management
- quality control
- outcome assessment
- risk adjustment
- patient safety
- control charts
- statistical process control
Statistics from Altmetric.com
- Safety management
- quality control
- outcome assessment
- risk adjustment
- patient safety
- control charts
- statistical process control
The statistical control chart concept is a method of decision support combining time-series analysis with a graphical presentation of data.1 It is based on the use of a control chart that guides the users towards appropriate action for improvement according to the nature of the variability.2 The control chart is a well-documented tool to monitor and improve the safety of healthcare processes.3 It is helpful in interpreting and reducing sources of indicator variability by distinguishing special causes of variation from common causes. Special cause variations reflect substantial disparities in patient safety that deserve further investigation and action. In contrast, common cause variations arise due to misleading factors, including random events.
There is strong interest in integrating control charts into clinical practice, particularly to alert multidisciplinary clinical teams when special causes of variation are detected. However, there is still debate about how to use control charts to monitor the outcomes of care, specifically regarding the necessity of adjusting for changes in patient case mix. Some argue that case-mix adjustment remains complex,4 may not be essential for longitudinal monitoring of outcomes5 or can lead to the erroneous conclusion of an unbiased measure.6 However, a control chart adjusted for case mix could theoretically improve the detection of special cause variations by controlling for biases that result from changes in case mix over time.7 ,8 In thyroid surgery, reported complication rates may vary depending on the extent of resection and the severity of the pathological thyroid condition, as well as the patient's characteristics.9–11 This study aimed to compare the detection of indicator variations between crude and case-mix-adjusted control charts in monitoring the safety of thyroid surgery.
Population and data collection
A prospective survey was conducted in the department of endocrine surgery in a large teaching hospital.12 The study population included every patient who underwent thyroid surgery during a 29-month period between 1 January 2006 and 31 May 2008. The surgical team under observation included three surgeons. Specific information about each thyroid surgery and its related complications was collected by the surgeon in charge of the patient using a standardised form. Other information was obtained from the hospital information system, which contains standard discharge abstracts, including compulsory data on the patients' demographics and the thyroid disease diagnoses.
Indicators and control chart characteristics
Two indicators were used to monitor the safety of thyroid surgery. The first indicator was the in-hospital proportion of immediate postoperative recurrent laryngeal nerve palsy, which was systematically assessed by flexible transnasal laryngoscopy among patients who underwent a lobectomy or a bilateral thyroidectomy. The second indicator was the in-hospital proportion of immediate postoperative hypocalcaemia, which was rigorously defined by a serum calcium level lower than 2 mmol/l only among patients who underwent a bilateral thyroidectomy.
Each indicator was extracted from the hospital information system and plotted on a Shewhart p-control chart.1 Crude and case-mix-adjusted p-charts were constructed such that each data point expressed the observed proportion of complications per month for samples of variable size, as follows:
The central line value of the crude p-chart was constant and was determined based on the overall proportion of complications. Exact limits were calculated for each month using the binomial-based SD of the overall proportion of complications.
The central line value of the case-mix-adjusted p-chart varied monthly and was established based on the expected proportion of complications per month. Exact limits were calculated for each month using the binomial-based SD of the expected proportions of complications.
When a special cause variation was detected by the p-charts, after checking the completeness and quality of the data, a systematic investigation was undertaken to identify the cause using two complementary methods. We first looked at the logbook, in which all changes in care processes were continually reported by surgeons. Every 4 months, a multidisciplinary meeting was also conducted to discuss the observed variations in monitored indicators on control charts.
Case-mix adjustment was based on the patients' gender and age, thyroid disease diagnosis and type of thyroidectomy.9–11 Case-mix-adjusted ORs for each postoperative complication were calculated using a multivariate logistic regression with all previous variables entered in the model. The expected proportion of each complication was then calculated for each month based on the model estimates. Final models were assessed for potential interactions. A finding of non-significance (p>0.05 and the closest to 1) following the Hosmer–Lemeshow test was interpreted as an adequate fit of the model to the data.13 The discriminatory power of the models was measured using the c-statistics (Receiver Operating Characteristic curve score). Values less than 0.7 were considered to show poor discrimination.14
The p-charts' limits were determined according to an exact method based on the binomial distribution.15 Exact control and warning limits were set at 99.73% (3 SD from the mean) and 95.45% (2 SD from the mean), respectively (Online Appendix 1). The detection of a special cause variation was defined as a single point outside the control limits or two out of three successive points between a warning limit and a control limit on the same side of the central line.16
The agreement between crude and case-mix-adjusted control charts in detecting indicator variations was measured using the weighted Cohen κ statistic.17 The positions of the data points for both postoperative recurrent laryngeal nerve palsy and hypocalcaemia were compared in terms of five ordinal levels based on warning and control limits.
Statistical analyses were conducted using SPSS (release 12.0.0; SPSS, Chicago), and control charts were generated using Microsoft Office Excel 2007 (Microsoft Corporation, Redmond, Washington).
We evaluated the outcomes of 1405 thyroidectomies. Among these procedures, 1036 were total thyroidectomies, 195 were lobectomies, 97 were thyroidectomies with lymph node resection, and 77 were completion thyroidectomies. The makeup of the thyroid diseases that were operated on was 758 non-toxic multinodular goitres, 301 carcinomas, 176 non-toxic solitary nodules, 100 Graves disease and 70 other diagnoses. The median patient age was 53 years (range 9–93), and 78.7% of patients were women (1106/1405).
The overall proportions of immediate postoperative recurrent laryngeal nerve palsy and hypocalcaemia were 7.4% (103/1388) and 20.5% (243/1186), respectively. Among the 309 patients who had an immediate complication consecutive to bilateral thyroidectomy, 215 had hypocalcaemia, 67 had recurrent laryngeal nerve palsy, and 27 had both complications.
Crude versus case-mix-adjusted control charts
The type of thyroid surgery (p=0.001) was associated with a risk of postoperative recurrent laryngeal nerve palsy, whereas patient gender (p<0.001), age (p<0.001) and the type of thyroid surgery (p<0.001) were associated with a risk of postoperative hypocalcaemia (table 1).
There was one difference between the crude and case-mix-adjusted control charts related to postoperative recurrent laryngeal nerve palsy (figure 1): in May 2006, the indicator reached the adjusted warning limit but not the crude warning limit. There were two differences between the crude and case-mix-adjusted control charts related to the monitoring of postoperative hypocalcaemia (figure 2). In July 2007, the indicator just reached the control limit of the crude control chart, whereas it crossed over the control limit of the case-mix-adjusted control chart. In January 2006, the indicator reached the crude but not the adjusted warning limit. Investigation of the single special cause variation that was detected in July 2007 revealed two elements. First, operating-room renovations took place from July to September 2007, leading to a reduction in available operating time for the same number of patients (from 42 to 32 h per week). Second, one surgeon was away in July 2007, although the number of patients undergoing thyroid surgery remained constant, so in July almost all procedures were performed by the only other surgeon available.
The agreement between the crude and case-mix-adjusted control charts in detecting indicator variations (table 2) was 95% (95% CI 85% to 99%). The strength of agreement was κ=0.76 (95% CI 0.54 to 0.98).
There was good agreement between crude and case-mix-adjusted p-charts in detecting indicator variations for the monitoring of thyroid surgery safety. However, the single special cause of variation that occurred in July 2007 for hypocalcaemia monitoring was only detected by the case-mix-adjusted p-chart, suggesting that this method was slightly more sensitive than the crude control chart. Investigation of this cause revealed that the overactivity of a single surgeon was probably detrimental to the quality of thyroid surgery. Consequently, the primary concern should be to guarantee that surgeons can operate under satisfactory conditions and that they are not given excessive workloads.12
Performance monitoring using outcome indicators requires that the observed variations reflect true variations in patient safety. Variability in medical outcomes typically arises from a combination of four key elements: data quality, patient case-mix, quality of care and random effects.18 First, we assume that the observed variation in thyroid surgery outcomes is not influenced by the quality of data, since our monitoring system was implemented in a hospital where standardised modalities for detecting postoperative complications and for data collection were not changed during the study period. Second, provided that the adjustment was sufficient, and in view of the satisfactory agreement between crude and case-mix-adjusted p-charts, we also presume that the occurrence of complications was poorly influenced by the patient case-mix. This may be biased if adjustment is inadequate or if it does not include some unmeasured (eg, patient weight, large or thoracic goitre, and invasive cancer) or unknown case-mix factors, which might affect the outcome irrespective of quality of care.19 To assess whether the choice of variables for case-mix adjustment changed the conclusions of this study, we tested models based on different adjustment methods and observed similar results (data not shown). Third, the rationale for using case-mix-adjusted control charts is that the residual unexplained variation in outcomes and the detection of special causes are more likely to be attributable to the quality of care. The remaining variance is presumed to be due to the characteristics of the individual surgeon (such as technical, physical and mental conditions) or to other organisational factors related to the surgical team (including equipment availability, team coordination or workload).20 It would be useful to further adjust for variables regarding surgeons' compliance with evidence-based practices in order to restrict the sources of variability in medical outcomes to uncontrollable random effects. In thyroid surgery, adjustments could include the systematic visualisation11 of recurrent laryngeal nerves in order to avoid injuries, as well as routine identification of at least two parathyroid glands in order to avoid permanent hypocalcaemia.21 Furthermore, to evaluate how one surgeon's performance influences the outcomes of the entire team, we could adjust for surgeon profile. Real-time monitoring of activity for each surgeon is also feasible using cumulative sum charts (CUSUM) to detect any defects in surgical processes as soon as possible.22 ,23 Focusing on individual performance allows each surgeon to check the quality of their daily practices and facilitates the identification of special cause variations. Nevertheless, compared with the simplicity of an adjusted p-chart, such an alternative tool can give less straightforward interpretations of variations in outcome indicators.
Though relatively simple and user-friendly, the value of the p-chart lies in its intuitive nature and the fact that the data are displayed graphically. Certainly, crude p-charts are fast and easy to produce, which provides quick performance feedback to the surgical staff in the context of real-time monitoring. One main limitation of these charts concerns the potential relationship between the occurrence of complications and case-mix characteristics. The clinical heterogeneity introduced by considering the outcomes of all procedures together in a broad grouping without case-mix adjustment is often argued to be clinically inappropriate. Due to the few differences observed between crude and case-mix-adjusted p-charts, our findings suggest that the expected gain from routine use of a case-mix-adjusted p-chart would be relatively minimal. Yet, as the great majority of data points were positioned between the lower and upper warning limits, statistical significance tests were likely to be underpowered and therefore a mere formality. Despite an acceptable fit to the data, the discriminatory power of the models was relatively poor. This might suggest that case-mix adjustment only controlled for a small part of the variability in thyroid surgery outcomes, which may simply demonstrate consistent care. This is perhaps not surprising, considering that thyroidectomy is a highly standardised procedure compared with other healthcare processes.24 Additional confounding variables (such as other patient characteristics, surgeon profiles, organisational factors or adherence to evidence-based practices) could more precisely represent the variability in thyroid surgery outcomes and thus could be useful adjustments.
More research is needed to provide empirical evidence to corroborate the usefulness of the case-mix adjusted control chart in other settings.5 Although case-mix adjustment requires training in data analysis and is relatively time-consuming to integrate into the team's daily routine, it may be useful when indicator is positioned close to the limits of the crude control chart, which requires careful interpretation. The joint use of both crude and adjusted p-charts seems to be a reasonable approach to increase the accuracy of interpretation of outcome indicator variations in clinical practice.
We are grateful to P Messy, for his helpful contribution to the study, and to A Favre, for the English revision of the manuscript.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
Previous communication The study's main results were presented at the 24th Patient Classification Systems International Working Conference (Lisbon, Portugal, 8–11 October 2008).
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.