Background Despite widespread use of quality indicators, it remains unclear to what extent they can reliably distinguish hospitals on true differences in performance. Rankability measures what part of variation in performance reflects ‘true’ hospital differences in outcomes versus random noise.
Objective This study sought to assess whether combining data into composites or including data from multiple years improves the reliability of ranking quality indicators for hospital care.
Methods Using the Dutch National Medical Registration (2007–2012) for stroke, colorectal carcinoma, heart failure, acute myocardial infarction and total hiparthroplasty (THA)/ total knee arthroplasty (TKA) in osteoarthritis (OA), we calculated the rankability for in-hospital mortality, 30-day acute readmission and prolonged length of stay (LOS) for single years and 3-year periods and for a dichotomous and ordinal composite measure in which mortality, readmission and prolonged LOS were combined. Rankability, defined as (between-hospital variation/between-hospital+within hospital variation)×100% is classified as low (<50%), moderate (50%–75%) and high (>75%).
Results Admissions from 555 053 patients treated in 95 hospitals were included. The rankability for mortality was generally low or moderate, varying from less than 1% for patients with OA undergoing THA/TKA in 2011 to 71% for stroke in 2010. Rankability for acute readmission was low, except for acute myocardial infarction in 2009 (51%) and 2012 (62%). Rankability for prolonged LOS was at least moderate. Combining multiple years improved rankability but still remained low in eight cases for both mortality and acute readmission. Combining the individual indicators into the dichotomous composite, all diagnoses had at least moderate rankability (range: 51%–96%). For the ordinal composite, only heart failure had low rankability (46% in 2008) (range: 46%–95%).
Conclusion Combining multiple years or into multiple indicators results in more reliable ranking of hospitals, particularly compared with mortality and acute readmission in single years, thereby improving the ability to detect true hospital differences. The composite measures provide more information and more reliable rankings than combining multiple years of individual indicators.
- quality improvement methodologies
- quality measurement
- continuous quality improvement
- healthcare quality improvement
- mortality (standardized mortality ratios)
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
- quality improvement methodologies
- quality measurement
- continuous quality improvement
- healthcare quality improvement
- mortality (standardized mortality ratios)
Commonly used and routinely collected quality indicators for hospital care include in-hospital mortality, 30-day acute readmissions and long length of stay (LOS). These indicators may provide information for healthcare providers and hospital managers to improve quality of care, for patients to choose between-hospitals, for healthcare insurers to purchase health services and for policy makers to monitor the performance of the healthcare system. There is, however, ongoing debate on whether single indicators adequately reflect quality of care but also whether they truly enable us to discriminate between hospitals in terms of their performance, that is, the reliability of hospital rankings.1–4 The reliability of ranking hospital performance can be assessed by determining the rankability of indicators. Previous research showed that the rankability of individual indicators differs2 4 and that the rankability is lower when estimates are imprecise, which is often the case when outcomes have fewer events in some patient group, for example, mortality after hip or knee replacement.5 Therefore, we hypothesise that increasing the number of events per hospital, by combining data, may result in more reliable hospital rankings.
Increasing the number of events to be included in quality measurements may be done in various ways. The first is to combine data from multiple years. However, even if this increases reliability of hospital performance ranking, information may reflect treatment outcomes that have been improved over time and do not reflect current practice anymore. Furthermore, it does not provide sufficient information for professionals trying to improve the quality of hospital care since short-term results of quality improvements will not be visible. Combining different indicators may be another solution, with the additional benefit that more information is captured and thus a more complete picture of quality of care is provided because indicators may be interrelated.6 Over the years, different initiatives have been taken to combine indicators and thereby provide a more complete view on hospital performance but often focused on a specific condition.7–10 A commonly used indicator in the Netherlands consisting of combined indicators is the ‘textbook outcome’, which is a dichotomised outcome representing the proportion of patients for whom all desired short-term outcomes of different indicators are realised.11 12 For instance, using the indicators in-hospital mortality, 30-day acute readmissions and long LOS, this would mean the ‘textbook outcome’ is a patient who is discharged alive, with no long LOS and no readmission. Such a ‘textbook outcome’ may be easier to interpret for patients than a single outcome indicator, but different adverse outcomes are lumped together so it may not provide sufficient information to be used for quality improvements in hospitals due to the dichotomisation. Ordering the various combinations and creating an ordinal composite measure may be a better suited alternative approach for quality improvement purposes in hospital.
Even though different initiatives have been undertaken to combine data, it is unknown whether this actually results in better reliability of ranking hospital performance than individual indicators. Therefore, this study aims to assess whether combining data into composites or including data from multiple years improves the reliability of ranking quality indicators for hospital care.
We used routinely collected administrative admission data from the Dutch National Medical Registration (LMR) from 2007 to 2012 retrieved from Statistics Netherlands,13 as more recent data were not publicly available due to conversion from International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) to International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM). These data capture all hospital patients rather than only specific patient groups. The LMR contains administrative data of approximately 88% of the hospital admissions in 2007 to 76% in 2012.14 This includes patient-specific data such as patient characteristics, as well as medical data such as diagnosis, surgical procedures and hospital stay. Multiple admissions within one patient in different hospitals are identified using an anonymous unique patient identifier.15 Each patient admission is assigned one primary diagnosis code according to ICD-9-CM based on the discharge letter and other available information in the medical records of patients and secondary diagnosis codes if applicable.6 16 We included clinical admissions with at least a primary diagnosis code to enable identification of specific patient groups (defined by the Clinical Classifications Software (CCS)). We excluded admissions for hospitals with incomplete follow-up, defined as months without any coded admissions or admissions from hospitals without a previous month of measurements, because these missing data made it impossible to assess readmission within 30 days. We selected patient groups with typical different LOS, readmission and mortality patterns to ensure sufficient variation and thereby enabling generalisation to other diagnoses.6 These patient groups are: stroke (CCS 109, high mortality and long LOS), colorectal carcinoma (CCS 14 and 15 and long LOS), heart failure (HF) (CCS 108 and high readmission), acute myocardial infarction (AMI) (CCS 100 and high mortality) and hip and knee replacements (THA/TKA) in patients with osteoarthritis (OA) (CCS 203 and high readmission). Procedures (THA/TKA) of 2012 were excluded since 45% was missing.
We studied the following indicators for each year (2007–2012) and 3-year periods (2007–2009 and 2010–2012):
In-hospital mortality: defined as death in hospital during the index admission.
Acute readmission: an emergency readmission within 30 days after discharge.
Long LOS: defined as a LOS in the top 25% LOS of the specific diagnosis (CCS group) or procedure group (for THA/TKA).
The textbook outcome: defined as patients discharged alive, no long LOS and no acute readmission.
The ordinal composite measure (Textbook Outcome Plus (TOP) defined as (from best to worst):
Alive, no long LOS and no acute readmission.
Alive, long LOS and no acute readmission.
Alive, no long LOS and acute readmission.
Alive, long LOS and acute readmission.
The ordering of this ordinal measure was described in a previous study17 based on patients views from the existing literature where patients considered complications after discharge (often resulting in readmissions) as worse quality of care than complications during admission (resulting in longer LOS).18 The measure was presented at a meeting to about 100 quality of care experts (including physicians and CEOs) from different countries involved in the Global Comparators Project and considered adequate.
We chose the rankability as a summary measure of the reliability of ranking hospitals, as this is most frequently used in the Netherlands, validated in previous research2 19–22 and similar to the methods used by Dimick et al.23 The rankability is a measure of the signal to noise ratio. The signal is the true differences between hospitals and the noise is the imprecision induced by small numbers (eg, low hospital volume).4 The intraclass correlation coefficient (ICC) is a similar measure when used as a measure of the discriminative power, as it reflects the proportion of the total variance that can be attributed to, for example, between hospital differences using multilevel modelling.24 The rankability is expressed as a percentage, for example, a rankability of 70% means that 70% of the variation is explained by ‘true’ hospital differences, while 30% is noise. The rankability is computed based on two components: the within-hospital variation and the between-hospital variation. The within-hospital variation was estimated using a fixed effects logistic regression model (individual indicators and ‘textbook outcome’) and a fixed effects ordinal logistic regression model for the ordinal composite measure, including hospitals and case-mix variables as fixed factors. The median squared SE of the coefficient for the hospital variable was used to estimate the within-hospital variation.4 Hospital volume is thus reflected in the precision of the hospital coefficient, so that low hospital volume will result in less precision (ie, larger within-hospital variation), which will make it harder to detect between-hospital differences (ie, lower reliability of ranking). The between-hospital variation was estimated using the heterogeneity from a random effects logistic regression model (individual indicators and ‘textbook outcome’) and a random effects ordinal logistic regression model for the ordinal composite measure, in which hospitals were included as a random factor and case-mix variables as fixed factors. The rankability was calculated using the following formula:
We classified rankability as follows: low (<50%), moderate (50%–75%) and high (>75%).4 This was done as an attempt to identify the relevance of increasing reliability. Increase in reliability is particularly needed for those indicators with low reliability of ranking, because in these circumstances, we are less likely to detect any true hospital differences due to the relatively high amount of noise.
The following patient level variables were included to adjust hospital outcomes for differences in case-mix: 5-year age groups, sex, socioeconomic status based on the postal area of the patient’s address (six categories), year of admission, diagnosis or procedure group (for THA/TKA), method of admission (acute/not acute), transferred in from other hospital, urgent admission in previous month (yes/no) and the Charlson comorbidity score to correct for severity of relevant comorbidities based on the secondary diagnosis codes.25 Age groups with fewer than 10 events were iteratively combined with the immediately older group. Statistical interactions between age and Charlson comorbidity score, and between method of admission and transfer were included based on previous findings.26
All analyses were executed using the software package STATA (V.14).
Admissions from 555 053 patients treated in 95 Dutch hospitals were included. Table 1 shows the median estimates for each indicator and condition per year as well as the range in the median across different years. Most patients were admitted for a THA/TKA due to OA with a median of 376 admissions per hospital per year but the smallest number of hospitals reporting these admissions (median of 61 hospitals across years). Patients who had HF were treated in most hospitals (median of 82.5 hospitals across years) with a median of 253.5 patients per hospital per year. The highest median in-hospital mortality was found for stroke (14.9%) and lowest for THA/TKA (0%). Acute readmissions most often occurred in patients who had HF (median 12.9%) and least in patients with OA undergoing a THA/TKA (median 3.5%).
Rankability of individual indicators for each year
Figure 1A–E shows the rankability of individual indicators for stroke, colorectal carcinoma, HF, AMI and THA/TKA for single years (in total 29 year–diagnosis combinations tested). The rankability for in-hospital mortality varied between less than 1% for patients with OA undergoing THA/TKA in 2011 to 71% for stroke in 2010. For 20 (69%) of the tested combinations, the rankability was low for mortality and moderate for 9 (31%) combinations (table 2). The highest rankability of acute readmission was found for AMI in 2012: 62%. Except for AMI in 2009 (51%) and 2012 (62%), the rankability for acute readmission was low. For long LOS, the rankability varied from moderate in 17 of the tested combinations, with the lowest rankability of 59% for colorectal carcinoma in 2010, and a high rankability for the remaining 12 combinations, with the highest rankability of 97% in 2008 for patients with OA undergoing THA/TKA. Given the frequent low rankability for mortality and acute readmission (in 69% and 93% of the tested combinations), most gain in reliability is likely for these indicators.
Combining data in 3-year periods
Combining single years into 3-year periods resulted in higher rankabilities compared with individual years for all indicators, except for THA/TKA in patients with OA (figure 1A–E). This results in fewer combinations with a low rankability for 3-year periods. From the 20 combinations with low rankability on mortality, 8 (40%) remained low when combining data in 3-year periods, 11 (55%) became moderate and 1 high (5%). From the nine combinations with moderate rankability, four (44%) remained moderate when combining data and five (56%) became high. For acute readmissions with 27 combinations having low rankability, 19 (70%) became moderate when combining data in 3-year periods. The two combinations with moderate rankability remained moderate when data were combined. For long LOS, combining multiple years resulted in a high rankability for all 29 combinations, whereas the rankability was moderate in 17 and high in 12 combinations of the single years. Thus, combining data in 3-year periods improves rankability for mortality and acute readmission and even results in high rankability for long LOS for all 29 combinations but remains low for 8 combinations of mortality and acute readmission.
Combining data into composite measures
Over the years 2007–2012, 57.6% of the patients who had stroke, 65.1% of the patients with colorectal carcinoma, 56.5% of the patients who had HF, 59.8% of the patients with AMI and 72.4% of OA patients with a THA/TKA had a textbook outcome. The rankability of the ‘textbook outcome’ was moderate for 15 (52%) and high for 14 (48%) of the tested year–diagnosis combinations (table 2). The lowest rankability was 51% for colorectal carcinoma in 2010, 96% as highest for patients with OA undergoing THA/TKA in 2008 (figure 1B,E). From the 20 year–diagnosis combinations having low rankability on mortality, combining data into the ‘textbook outcome’ improved the rankability to moderate in 13 (65%) combinations and high in 7 (35%) combinations. From the nine combinations having moderate rankability, six (67%) improved to high and three (33%) remained moderate. From the 27 combinations having low rankability on acute readmission, 15 (56%) improved to moderate and 12 (44%) improved to high when combining data into the textbook outcome. For the two combinations having moderate rankability, one improved to high and one remained moderate. From the 17 combinations having moderate rankability on long LOS, combining data into the ‘textbook outcome’ improved the rankability to high in 4 (24%) combinations and remained moderate in 13 (76%) combinations. From the 12 combinations having high rankability, 9 (75%) remained high, but 3 (25%) decreased to moderate. Therefore, combining data into the ‘textbook outcome’ improves reliability in all years having low rankability for either mortality or acute readmission. For long LOS, rankability decreased in some cases but was still at least moderate.
For the ordinal composite measure, the rankability was mostly moderate (66%) or even high (31%) (table 2). The lowest rankability was 46% for HF in 2008 and highest was 95% for patients with OA undergoing THA/TKA in 2007 (figure 1C,E). Looking at single years, the rankability of the composite measure improved in all combinations compared with the single indicators. From the 20 year–diagnosis combinations with low rankability for mortality, 14 (70%) improved to moderate and 5 to high (25%) with only 1 combination remaining low. From the nine combinations with moderate rankability, four (44%) improved to high and five (56%) remained moderate. In comparison with acute readmission, the picture was even more pronounced as from the 27 year–diagnosis combinations having low rankability for acute readmission, 17 (63%) improved to moderate when combining data into the ordinal composite, 9 (33%) improved to high and 1 remained low. The two combinations having moderate rankability remained moderate. Less was gained compared with long LOS: from the 17 year–diagnosis combinations having moderate rankability for long LOS, 4 (24%) improved to high and 12 (71%) remained moderate but 1 decreased to low rankability. From the 12 combinations having high rankability, 5 (42%) remained high but 7 (58%) decreased to moderate. So combining data into the ordinal composite measure improves reliability in most years when compared with mortality and acute readmission, but one combination remains with low rankability. Compared with long LOS, rankability remains the same or improves, resulting in at least moderate rankability except for one combination where rankability decreased to low rankability.
Within-hospital and between-hospital variations
To understand why rankability is increased or reduced, we studied the components of the calculation. The rankability may increase either if the within-hospital variation becomes smaller (eg, by increasing the number of events) but also if the between-hospital variation becomes larger. Figure 2A–C shows the between-hospital variation (Tau, x-axis) against the rankability (y-axis). The different lines represent the median SE, used to calculate the within-hospital variance (σ2). So by combining indicators, we aim to move towards a line with smaller SE and upwards to increase rankability assuming the same between-hospital variance. However, the question is whether between-hospital variation stays the same when indicators are combined or that it averages out because hospitals have relatively good scores on one indicator and worse on another.
Figure 2A shows that for mortality, the intended direction is achieved as we move upwards going from + to □ for the different conditions in different colours. The composite measure has lower within-hospital variation (smaller median SE) than in-hospital mortality. The between-hospital variation of the composite measure is also slightly lower than that for mortality as we move slightly to the left (except for THA/TKA). Since the within-hospital variation of the composite measure decreases more than the between-hospital variation, this results in a higher rankability for the composite measure compared with in-hospital mortality for all conditions and each single year. The huge improvement in rankability for THA/TKA is caused by the combination of a large increase in between-hospital variation together with a decrease in within-hospital variation. Looking at acute readmission (figure 2B), we see a similar picture. For long LOS (figure 2C), all the symbols are much closer together and already have relatively low within-hospital variation (small median SE). It is shown that the within-hospital variation remains approximately the same, and the between-hospital variation decreases when data are combined in the ordinal composite indicating more uniform outcomes for the composite measure across hospitals. This results in a reduced rankability of the composite measure than for long LOS for 8 (28%) of the 29 year–diagnosis combinations.
This study aimed to assess whether increasing the number of events per hospital by combining data into composites or including data from multiple years improves the reliability of hospital rankings (rankability). We found that the rankability of mortality and acute readmission was mostly low. Combining multiple years generally improves rankability because of the higher number of events but remains low for both mortality and acute readmission in eight cases where these outcomes are infrequent. Combining data into the ‘textbook outcome’ improves rankability, except in comparison with long LOS where the within-hospital variation is already relatively low because of the higher number of events and the between-hospital variation decreased when combining outcomes. Similarly, combining data into the ordinal composite measure improves rankability, but less if the within-hospital variation is already small and the between-hospital variation reduces because of the combination of indicators. Given that rankability still remained low for some conditions when combining multiple years for mortality and acute readmission, it seems that combining data into composite measures may be a better solution to improve reliability of hospital rankings as these can be calculated for single years and therefore provide more actionable indicators as well as providing a more complete picture on quality of care.
The choice for which composite measure will be used may depend on the purpose and the end-users. Evidence suggests that patients would be more likely to use information on differences in quality of care when presented as a summary measure, such as a textbook outcome.27 28 The rankability of the ‘textbook outcome’ was shown to be moderate or high for all conditions and single years, is easy to interpret and an event-free hospital admission is what patients aim for. However, for hospital professionals or insurers, it does not provide sufficient information to be used for quality improvements as it does not show which of the outcomes should be improved. The ordinal composite measure provides this information combined with moderate or high rankabilities for most conditions and single years and is therefore better suited for quality improvements in hospitals or for insurers.
Comparison with previous studies in the literature
Our results are consistent with previous research showing that composite measures are more informative than existing quality indicators.8 9 29 The present study adds that the ordinal composite measure combines indicators and orders outcomes. This may affect hospital comparison as different combinations are now separated and ordered while using the ‘textbook outcome’ measures are lumped together, weighted equally and it is thus unknown which of the adverse outcomes perform worse. Including mortality in the composite measure is also important as it accounts for potential survivor bias30 given that an individual who dies can never be readmitted. Survival bias may exist when hospitals are compared based on readmission rates or long LOS without considering differences in mortality. A previous study found that hospital performance on readmissions significantly differed from hospital performance on a composite metric based on readmissions and mortality.31 In our study, we used a more extended composite measure including long LOS and also found that rankability of the composite measure improved compared with a single readmission indicator but added that for indicators like long LOS, this is not necessarily the case. In addition, more insight was provided into reasons for improved rankability by showing both the gain (or not) in within-hospital and between-hospital variations, as well as that it is often not valid to assume that between-hospital variation will remain the same when combining data into composites. Furthermore, we showed that low rankability occurs less frequently when data are combined in composite measures than when multiple years were combined.
Other studies focused on individual indicators and showed that the rankability of individual indicators differs, especially since it also depends on case-mix correction.2 4 21 For example, van Dishoeck et al 21 found a rankability of 80% for surgical-site infection (SSI) after colonic resection but 0% for caesarean section. Rankability was 8% in all operations combined, as the differences in SSI rates were explained mainly by case mix. Furthermore, Henneman et al 2 found a rankability of 38% for mortality after colorectal surgery in the period 2009–2011. We found a rankability of 30% (2009), 28% (2010) and 21% (2011) for mortality in patients with colorectal carcinoma, but we showed that if years are combined, the rankability increases and we found a rankability of 51% (2007–2009) and 49% (2010–2012). The rankability for colorectal carcinoma increases far more when indicators are combined (51%–68% across single years for ‘textbook outcome’ and 50%–67% for the ordinal composite measure). Another study found a rankability of 58% for in-hospital mortality for AMI and 51% for readmission after HF after correction for age in 2007.4 We found a rankability of 50% for in-hospital mortality and 41% for acute readmission after HF, which is probably lower because we used more variables for case-mix correction. Another possibility to improve the reliability of ranking is to cluster hospitals, as a previous study found that clustered intensive care units increased the rankability.32 Although this results in a higher rankability, the question is how this can be used to track changes in performance over time for individual hospitals as well as for patients to choose a particular hospital (based on their performance rather than that of a cluster).
Strengths and weaknesses
An adequate sample size is necessary to obtain a reliable ranking.1 A strength of this study is that the chosen indicators are routinely collected so that a large sample size was available including almost all hospitals in the Netherlands. However, a limitation is that only data from 2007 to 2012 were available as more recent data were not publicly available due to conversion from ICD-9-CM to ICD-10-CM. We analysed multiple 1-year periods to determine whether rankability was stable over the years and did not find large differences between years. Therefore, we think that our results are generalisable for more recent years. Furthermore, we were limited in the ability for case-mix adjustments because we used administrative data, while previous studies showed that the rankability of indicators depends on case-mix correction.2 4 21 If more detailed case-mix variables would have been available in the data, this may explain additional differences between-hospitals and thus may have resulted in the between-hospital variance being overestimated in the present study and thereby also the higher rankability. In addition, we were not able to distinguish different types of hospitals (eg, academic or public) since we used anonymous data (both on patient and hospital levels). This may increase between-hospital differences and results in a higher rankability if the within-hospital variation remains stable. However, since we compared rankability of individual and combined indicators, the improvement in rankability is likely to be less affected by both these case-mix adjustment issues because the lack of adjustment applies to both individual as combined indicators. Our data did not include information on mortality after discharge so that the results of our study only reflect a selection of mortality cases. Future studies should therefore also include postdischarge mortality to examine whether this affects the rankability since different mortality time-frames may result in differences in judgement regarding the performance of hospitals.33
We showed that combining data overall improves the rankability of hospital performance, particularly for mortality and acute readmission because the within-hospital variation decreases. Combining data into composite measures may be a better solution than combining multiple years to improve rankability. This gives a more complete picture of quality of care, as well as representing current or recent practice and thus is more actionable for quality improvement but is also less likely to result in low rankability of hospital performance. Whereas the ‘textbook outcome’ may have the best rankability, the ordinal composite measure may be more actionable for hospitals trying to improve given that this measures enables them to distinguish different adverse outcomes and target specific combinations of outcomes that are lumped into one category with the textbook outcome.
Contributors PJM-vdM designed the study. SNH wrote the article and carried out the study. PJM-vdM supervised the study and writing of the manuscript. All authors have critically read and modified both the study protocol and previous drafts of the manuscript and have approved the final version. All authors read and approved the final manuscript.
Funding This study was funded by ZonMw (10.13039/501100001826) and grant number 516022513.
Competing interests None declared.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement We used routinely collected administrative admission data of the Dutch National Medical Registration (LMR) from 2007 to 2012 retrieved from Statistics Netherlands. To use these data, please contact Statistics Netherlands.