Background and objective The publication of clinical outcomes for consultant surgeons in 10 specialties within the NHS has, along with national clinical audits, highlighted the importance of measuring and reporting outcomes with the aim of monitoring quality of care. Such information is vital to be able to identify good and poor practice and to inform patient choice. The need to adequately adjust outcomes for differences in case-mix has long been recognised as being necessary to provide ‘like-for-like’ comparisons between providers. However, directly comparing values of the standardised mortality ratio (SMR) between different healthcare providers can be misleading even when the risk-adjustment perfectly quantifies the risk of a poor outcome in the reference population. An example is shown from paediatric intensive care.
Methods Using observed case-mix differences for 33 paediatric intensive care units (PICUs) in the UK and Ireland for 2009–2011, SMRs were calculated under four different scenarios where, in each scenario, all of the PICUs were performing identically for each patient type. Each scenario represented a clinically plausible difference in outcome from the reference population.
Results Despite the fact that the outcome for any patient was the same no matter which PICU they were to be admitted to, differences between the units were seen when compared using the SMR: scenario 1, 1.07–1.21; scenario 2, 1.00–1.14; scenario 3, 1.04–1.13; scenario 4, 1.00–1.09.
Conclusions Even if two healthcare providers are performing equally for each type of patient, if their patient populations differ in case-mix their SMRs will not necessarily take the same value. Clinical teams and commissioners must always keep in mind this weakness of the SMR when making decisions.
Statistics from Altmetric.com
Recent events have highlighted the importance and controversy of reporting clinical outcomes with the aim of monitoring quality of care. The Keogh report,1 the Francis Inquiry,2 the suspension of paediatric cardiac surgery at Leeds3 and the publication of clinical outcomes for consultant surgeons4 have all generated considerable debate on the public reporting of clinical outcomes. This information is vital to be able to: (i) identify poorly performing providers so that they can be investigated and improvements made; (ii) identify centres of excellence and share best practice; (iii) allow patients, general practitioners, commissioners and other health professionals to choose between providers in the expectation that competition will improve patient outcomes5; and (iv) ensure fair implementation of payment by performance.6 ,7
The need to adequately adjust outcomes for differences in case-mix (risk-adjustment) has long been recognised8: a surgeon tending to treat only those patients with good prognoses would be expected to have a high rate of ‘good’ outcomes while, conversely, another treating patients with poor prognoses would expect a high rate of ‘poor’ outcomes.
The standardised mortality ratio (SMR) is the most widely used summary statistic for binary outcomes (eg, mortality, postoperative infection) to report case-mix adjusted outcomes for healthcare providers.9 The SMR is calculated by dividing the observed number of deaths in a unit by the expected number of deaths. The expected number is usually calculated by applying the risk-specific mortality probabilities from a reference population to the cohort of interest and summing the predicted probabilities,10 although there are more complex approaches to calculating an SMR.11 ,12
In addition to measuring the mortality for a single clinical specialty the SMR has also been used to quantify hospital-wide mortality rates with the development of measures such as the hospital standardised mortality ratio (HSMR)13 and summary hospital mortality index (SHMI).14 A combination of these two measures was used in the Keogh review to select the 14 trusts with the most worrying performance: 11 of these units were then put on special measures, although it is worth noting that Keogh did warn about weaknesses of both the HSMR and SHMI.
Comparing indirectly standardised outcome statistics, such as the SMR, between different healthcare providers can be misleading. Consider two healthcare providers performing equally for each type of patient but differing from the expected performance for at least one type of patient. If they differ in the proportions of the types of patients they treat (case-mix), their SMRs will not necessarily take the same value even if it were possible to carry out risk adjustment that perfectly quantifies the risk of a poor outcome in the reference population.15–19 While this may at first seem counterintuitive, this is a long recognised problem and is an example of Simpson's paradox (also known as the Yule–Simpson effect).20 ,21 This ‘paradox’ describes the effect whereby an association seen in multiple groups is modified, or even reversed, when the groups are combined.
Because of this the SMR is actually a measure of how the outcomes for an individual healthcare provider, given its particular case-mix of patients, compare to the outcomes of a defined reference population (detailed explanation in box 1). The SMR does not answer the question of which provider is ‘best’ for any particular patient. This restriction is not always recognised by the users of SMRs. When it is acknowledged there are those who argue that any bias that may arise when directly comparing two SMRs is likely to be small and would not adversely affect any inferences drawn.22 ,23
Calculation and comparison of the Standardised Mortality Ratio (SMR)
The standardised mortality ratio (SMR) is defined as the ratio of the number of deaths observed for the hospital of interest to the number of deaths that would have been expected if the patients at the hospital have experienced the same death rates at that found in the reference population:
where πi is the probability that a patient in the i-th case-mix stratum in the hospital will die; πRi is the probability that a patient in the i-th case-mix stratum in the reference population will die; pi is the proportion of patients in the i-th case-mix stratum in the hospital.
For the SMRs for two hospitals (A and B) to take the same value the following would need to be true:
However, even if the stratum-specific event probabilities were identical for both hospitals for all strata (ie, πAi=πBi=πABi for all values of i), and different from the reference population for at least one stratum, the SMRs would only be sure to take the same value if their population structures were also the same:
A simple hypothetical example
A hypothetical example of this phenomenon is shown in table 1 for two surgeons (A and B). For simplicity it is assumed that there are only two types of patients: low risk, where nationally 10% of patients die within 30 days; and high risk, where nationally 20% die within 30 days. The two surgeons both have 30-day mortality rates of 10% for low risk patients, exactly the same as the national rates. However, for both surgeons 30% of the high risk patients die before 30 days compared to the national average of 20%. While for any particular patient their probability of death before 30 days is the same whichever surgeon operates, the case-mix profiles of the surgeons do differ, specifically surgeon B's caseload has a higher proportion of high risk patients than does that of surgeon A. Thus, the two surgeons have identical performance for both low and high risk patients, but surgeon B operates on a higher proportion of the patients for whom both surgeons A and B have below average performance.
This difference in case-mix between the two surgeons gives rise to the perception that there is a difference in potential outcome for an individual patient. From table 1 it can be seen that the SMR for surgeon A (SMR=1.25) is lower (‘better’) than that of surgeon B (SMR=1.40). Since for any particular patient their probability of death before 30 days is the same whichever surgeon operates, this difference in the values of the SMR for the surgeons reflects the differences in their case-mix, not their risk-specific mortality probabilities.
This is a clear example of Simpson’s paradox, the ‘paradoxical’ result that the two surgeons have equivalent performance for each of the two subgroups, but the aggregated result shows a substantial difference in performance.
Is this really a problem in practice?
While this undesirable characteristic of the SMR has been previously described15–17 19–21 and can be shown through simple examples (such as the one above) or by simulated data,24 it is less clear how much bias would really be likely to occur in practice. In other words, are observed differences in case-mix between healthcare providers actually enough to lead to inappropriate and misleading interpretation of their SMRs? There are few examples of the likely impact given the typical differences in case-mix seen in practice. Pouw et al25 investigated changes in the value of the HSMR among 61 hospitals in the Netherlands. When the observed case-mix of each hospital for 2009 was replaced with its average case-mix for 2006–2009 they found that the change in the value of the HSMR ranged from −0.03 (from 0.69 to 0.66) to +0.01 (from 0.79 to 0.80). These are small absolute differences and are unlikely to grossly distort any comparison between hospitals. However, this might be because they were looking at outcomes by hospital which, because of their size, will tend to have relatively stable case-mix profiles.
Larger differences in case-mix between healthcare providers are more likely to occur when investigating clinical specialties and sub-specialties. We have previously reported the potential differences in the value of SMR for neonatal units in the East Midlands and Yorkshire regions of England.26 It was shown that, when applying the risk-specific mortality of one neonatal unit to another and the ratio of their SMRs calculated, that these ratios ranged from 0.79 to 1.68: that is, the value of the SMR for one unit was 68% greater than that of another even when their risk-specific probabilities of death were identical. However, neonatal units are organised into networks with different units providing different levels of care,27 so it is unsurprising that there are large differences in case-mix. When comparing units of the same type, for neonatal network units (those providing long-term intensive care) the ratios of the SMRs ranged from 0.92 to 1.00, a much smaller range than that seen for all units. When comparing local neonatal units (those providing high dependency care and some short-term intensive care), the ratios ranged from 0.79 to 1.56, potentially important differences. However, local admission and transfer policies mean that the case-mix profiles of local neonatal units vary greatly from one to another. Hence, the differences in the value of the SMR seen in acute neonatal care may not reflect the differences seen in other clinical specialties.
In addition, in this previous work the neonatal units were compared by using the observed case-mix specific death rates for each unit in turn to calculate an ‘observed’ number of deaths for each of the other units, thus ensuring all units had the same ‘observed’ outcome rates. This approach is likely to produce inflated estimates of the likely bias, particularly when using small neonatal units to calculate the ‘observed’ rates, because these small units may, by chance or otherwise, have observed outcomes that are very different from the whole region. Since the size of the bias is dependent on how different the rates of death are between the units being compared and that of the reference population, having artificially extreme case-mix profiles potentially inflates the estimates of the bias. The methodology we have used in this paper overcomes this problem by ensuring that clinically realistic estimates are used. Furthermore, in the neonatal example the values of the SMR were compared using their relative difference (ie, their ratio) rather than their absolute difference. This is potentially misleading as a large relative difference could represent a small absolute difference. In this paper we report the absolute values of the SMR.
A more realistic example: mortality in paediatric intensive care
To obtain more realistic estimates of the effect of differences in case-mix on the value of the SMR, four clinically plausible scenarios were investigated for mortality in paediatric intensive care. In each scenario the observed risk-specific mortality probabilities were set to be the same across all paediatric intensive care units (PICUs). Although all of the scenarios used in this example are hypothetical and simplistic they are all clinically plausible and therefore any variations observed are also plausible.
Data were obtained from the Paediatric Intensive Care Audit Network (PICANet),28 an international clinical audit that collects information on all children admitted for paediatric intensive care in the UK and Ireland. Information is collected for each child about admission, discharge, diagnoses, medical history, physiology, interventions and outcome (including death before discharge). For this analysis information was extracted for all children aged less than 16 years at admission who were admitted to any of the 33 PICUs in the UK or Ireland from 2009 to 2011.
In total there were 56 460 admissions identified. Of these, 1674 (3.0%) were excluded because of missing information on mechanical ventilation during the first hour of admission. The remaining 54 786 children were admitted to 33 different units; the number of admissions in each unit ranged from 95 to 3893. There was a large variation in the case-mix between units: for example, the percentage of elective admissions ranged from 0% to 87.7% and the percentage of high risk diagnosis patients from 2.3% to 21.0%. The percentage of admissions leading to death before discharge also varied widely: from 0.1% to 8.4%.
Calculating the SMRs
In all of the scenarios the expected number of deaths for each PICU was obtained by summing the risk-specific probabilities of death calculated using PIM2r, the version of the Paediatric Index of Mortality (PIM)29 used for the 2009–2011 PICANet report.28
PIM2r is a recalibrated version of the PIM2 model. A patient's PIM2 score is provided by the following equation:
Hypothetical ‘observed’ numbers of deaths were obtained by amending PIM2r and then calculating how many deaths would have been observed in each scenario.
The four scenarios were:
Changing the risk of death from to . This has the effect of slightly decreasing the risk for patients with very high risks of death (over 0.5) and slightly increasing the risk for others. The transformation is very small (the biggest increase is from 17.0% to 18.2% and the biggest decrease is 83.0% to 81.8%) but it affects high and low risk patients differently.
The coefficient for Elective admission was changed to the upper limit of its 95% CI (from −0.6830 to −0.4935). This increased the risk for elective patients.
The coefficient for Mechanical ventilation was changed to the upper limit of its 95% CI (from 0.9392 to 1.0942), thus increasing the risk for ventilated patients.
The coefficient for Recovery from surgery was changed to the upper limit of its 95% CI (from −0.9530 to −0.7376), thus increasing the risk for patients recovering from surgery.
The simulated probabilities of death were then summed to calculate a new observed number of deaths for each PICU and then used to calculate SMRs.
For each scenario, even though the risk-specific probabilities of death were the same for each PICU, the values of the SMR varied between PICUs (figure 1). The first scenario resulted in SMRs from 1.07 to 1.21. The second saw values from 1.00 (units with no elective admissions) to 1.14 (units with a high proportion elective). The third varied from 1.04 to 1.13 depending on the amount of mechanical ventilation, and the final scenario resulted in SMRs between 1.00 and 1.09 depending on what proportion of patients were recovering from surgery.
Variation in the values of the SMRs reduced as the size of the PICU increased, similar to the pattern seen in acute neonatal care.26 The variation was less between the largest units, partly because the large units had patient populations similar to each other but also because their case-mix was similar to the overall population: both of these factors reduce the variation in the observed SMRs.
The SMR is usually used to adjust for any differences in case-mix between healthcare providers facilitating ‘like-for-like’ comparisons. However, in the presence of differences in case-mix the direct comparison of SMRs becomes unreliable even if the risk-adjustment model perfectly quantifies the risk of a poor outcome in the reference population. Indeed, it is in those cases where the need for case-mix adjustment is greatest—that is, providers with very different types of patients—where the comparison of SMRs is potentially the most misleading. Hence, even if two healthcare providers are performing identically for each patient type and the prediction of the risk of a poor outcome in the reference population is perfect, the providers are very likely to have different values for their SMR.
The PICU example in this paper demonstrates the potential size of this problem for this clinical setting. Its size is dependent on both the case mix-differences between providers and on the difference in the providers’ performance from that of the reference population. In our PICU example the case-mix differences used are the real differences observed between PICUs in the UK. In addition, the risks of death used in each of the four scenarios are plausible variations from the real risk seen in the PICANet population, as quantified by the PIM score. Obviously, the size of this problem could be different in other clinical settings, smaller or larger, but the problem is always likely to exist.
In the example it was assumed that the risk-specific mortality probabilities were the same for all PICUs in order to show the effect of case-mix on the SMRs. In reality risk-specific mortality will vary between healthcare providers and differences in case-mix will hide or inflate true differences between the providers. Whenever SMRs are being directly compared to each other it is necessary to understand the variation in their case-mix. Small differences in case-mix between units are unlikely to result in comparisons that are significantly misleading but large differences can raise real concerns that the comparison is invalid.
Insisting that the publication of SMRs should not be seen as a ‘league table’ is, no doubt, sound advice but it is likely to remain unheeded. Patients will, quite naturally, want to compare the published performance of potential care providers. The use of misleading data could result in patients unnecessarily delaying treatment or travelling further in the hope of receiving better treatment.
The problem highlighted in this paper is only one of several potential difficulties in the reliable estimation of SMRs. Ensuring adequate risk adjustment, data quality and the presence of random variation also present challenges in estimating and presenting SMRs. However, the potential biases that can arise when directly comparing SMRs will still exist even in circumstances where the other difficulties have been overcome.
What should be done in practice?
If it is known a priori that two healthcare providers deliver different types of care with different expectation of outcome (eg, if one hospital provides end of life care and another has a policy of discharging patients to hospice care), then it would seem foolhardy to try to compare their performance by comparing standardised measures of outcome. However, when there are differences in case-mix for organisational (eg, transfer policy) or demographic (eg, age) reasons, then it seems entirely appropriate to want to compare standardised measures of outcome to ascertain whether any crude differences in outcome are the consequence of one hospital having a greater proportion of high risk (eg, older) patients than the other. As has been shown in this paper and previously,15–17 ,19 the SMR cannot be relied on to provide an unbiased answer to this question.
A number of alternatives to the SMR have been proposed. The most commonly used alternative is the comparative mortality figure (CMF),19 which solves this particular problem. However, to use the CMF it is necessary to be able to estimate the risk-specific mortality probability for each and every individual in the reference population from the data of each healthcare provider. This may be possible for large datasets and simple risk-adjustment models where there are sufficient observations for each provider to reliably estimate case-mix specific probabilities of death.16 However, when there are many different types of patients (as in most clinical specialties) this is not possible as it is unlikely that every provider will have sufficient patients (or indeed any) within each case-mix stratum to reliably estimate the probabilities of death. A number of other summary measures have also been proposed, such as the geometrically averaged ratio and the harmonically weighted ratio,30 ,31 but these are difficult to interpret, and therefore not suitable for an audience including clinicians, commissioners and in particular patients.
An alternative approach might be to consider only those types of patients who are seen by all of the healthcare providers of interest. For example, in paediatric intensive care cardiac patients who are likely to only ever be admitted to a few specialist units would be excluded from the analysis. Such an approach would be likely to allow more reliable direct comparisons between healthcare providers but would not include all of their workload. An alternative, but similar, approach would be to use propensity score methods to find similar populations within each healthcare provider.32 ,33
A further proposed approach is to first group patients into bands according to their probability of experiencing the outcome using estimated probabilities derived from outcome rates in the reference population and then undertake direct standardisation: known as direct risk standardisation.18 While this method may offer an answer in some circumstances it is not a general solution. It requires that the patients are grouped into a sufficiently large number of bands so that all of the patients within each band have approximately the same probability of experiencing the outcome while simultaneously ensuring that there are enough outcomes observed in each band to make the calculations. It is unlikely that these conflicting requirements will be met other than for large organisations, such as hospitals or trusts.
For most circumstances, we believe the SMR is the only viable statistic with which to report case-mix adjusted outcomes. It is still a valid measure of the difference in outcomes between an individual healthcare provider and the reference population. However, we have shown that case-mix itself can be the cause of important differences in reported SMRs when directly comparing one with another. It is, therefore, necessary to also report some measure of the differences in case-mix so that a judgement can be made about the heterogeneity of the populations.33
The M-statistic is a measure of agreement between two populations and has been suggested as a possible statistic with which to quantify the difference in case-mix profile between two healthcare providers.34 While it is useful in quantifying this difference in case-mix, the size of the bias that can occur when comparing two SMRs is also dependent on the size of the difference between the risk-specific probabilities of death observed in the reference population and the probabilities for the two providers.26 This would also need to be taken into account in any tool to predict which comparisons would likely be misleading.
In order to ensure the best quality of care in the NHS, poorly performing healthcare providers need to be identified and remedial action taken. Similarly, good performance should be highlighted, with good practice identified and shared. However, recent events may have damaged the public's confidence in our monitoring and inspection processes35 and it has become clear that it is probably no longer sufficient to ask patients to take it on trust that monitoring is being performed adequately. It is vital that clinical outcomes are reported openly, transparently and without the potential for misinterpretation in order for patients to once again have faith in the service they are receiving.
Publication of clinical outcomes data may well restore the public's faith that performance is being monitored and that required interventions are being made. However, publication alone is not sufficient. Clinical teams and commissioners must be mindful of the SMR's weaknesses when making decisions. More importantly, serious thought needs to be put into how these data are communicated to patients and research is required to examine whether and how patient experiences are being changed by their publication.
We would like to thank PICANet for access to the original dataset. The authors also wish to thank all the PICU staff in the UK and Ireland who participated in the present survey.
Contributors The idea for the paper was originally developed by BNM. TAE undertook the analyses for the example, supervised by BNM. All co-authors contributed to the drafting of the manuscript, revised the paper critically and approved the final version. BNM is the guarantor.
Competing interests None.
Funding AE was funded by a Research Methods Fellowship award from the National Institute for Health Research. This report is independent research arising from a Research Methods Fellowship supported by the National Institute for Health Research. The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health.
Ethics approval PICANet has research MREC ethics committee approval (05/MRE04/17) and National Information Governance Board (4-07(c)/2002-PICANet) approvals to collect patient identifiable data without informed consent.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.