Article Text
Abstract
Introduction The role of transparency in quality of care is becoming ever more important. Various indicators are used to assess hospital performance. Judging hospitals using rank order takes no account of disturbing factors such as random variation and case-mix differences. The purpose of this article is to compare displays for the influence of random variation on the apparent differences in the quality of care between the Dutch hospitals.
Method The authors analysed the official 2005 data of all 97 hospitals on the following performance indicators: pressure ulcer, cerebro-vascular accident and acute myocardial infarction. The authors calculated CIs of the point estimate and the simulated CIs of the ranks with bootstrap sampling, and visualised the influence of random variation with three modern graphical techniques: forest plot, funnel plot and rank plot.
Results Statistically significant differences between hospitals were found for nearly all performance indicators (p<0.001). However, the CIs in the forest plot revealed that only a small number of hospitals performed significantly better or worse. The funnel plot provides a representation of differences between hospitals compared with a target value and allows for the uncertainty of these differences. The rank plot showed that ranking hospitals was very uncertain.
Conclusion Despite statistically significant differences between hospitals, random variation is a crucial factor that must be taken into account when judging individual hospitals. The funnel plot provides easily interpretable information on hospital performance, including the influence of random variation.
- Performance indicators
- performance measurement
- hospitals
- funnel plot
- random variation
- assessment
- healthcare quality
- organisation
- outcome
- quality of care
Statistics from Altmetric.com
- Performance indicators
- performance measurement
- hospitals
- funnel plot
- random variation
- assessment
- healthcare quality
- organisation
- outcome
- quality of care
Introduction
Hospitals face increasing demands with regards to the quality, transparency and accountability of healthcare. Since the early 1980s, interest in measuring hospital performance has led to the development of many performance indicators (PIs). A PI is a measurable element of practice performance for which there is evidence or consensus that it can be used to assess the quality of care.1 The purpose of PIs must balance the need for accountability with the need to promote quality-improvement initiatives,2 thus providing the incentive to improve the quality of care.
League tables are often used for displaying hospital performance (figure 1), suggesting a rank order. League tables provoke concerns among health service providers for several reasons, including concerns over adjustment for case-mix and the role of chance in determining their rank.3 Because there are no minimal sample size requirements in PI measurement, random variation plays an important role in the interpretation of the results. In The Netherlands, quality of care in hospitals is assessed by the Health Care Inspectorate (NHCI).4 In 2003, the NHCI developed a public and obligatory set of PIs to guide their assessment of the quality of care delivered in hospitals.5 6 In principle, this set of PIs enables the Inspectorate to identify hospitals whose performance lies below the minimum standard and guide further investigation into these hospitals. The Inspectorate publishes the anonymous data in a yearly report that presents data on more than 40 indicators concerning structure, process and health outcome using league tables of point estimates. This paper focuses on the particular aspect of random variation in comparing and ranking of institutions, and explores three graphical displays describing the data.
Methods
Data
For the analysis, we used the 2005 publicly available data on three PIs: pressure ulcer (PU), cerebrovascular accident (CVA) and acute myocardial infarction (AMI). The indicators are selected to reflect several problems that occur with PI, such as low total numbers or low number of cases. We assume hypothetically that the data provide a fair reflection of the quality of care in individual hospitals and that there is no significant effect of case mix. The data are publicly available at http://www.ziekenhuizentransparant.nl.7
Statistical analysis
On the basis of absolute numbers (n) and the cases (y), we calculated the SE and 95% CI, where y is the number of cases, and n the total number of patients in a hospital. The CI was calculated using the formulas CI=eα±1.96SE, SE=√(1/y+(1/n−y)) and α=log (y/n/(1−y/n)).8 With the function Qbinom in S plus, we calculated the 95% CI for the number of successes obtained in a number of binomial trials equal to the number of patients that was judged with the observed probability of being a case. These were divided again by the number of trials to obtain a CI that reflected the discrete character of the observations. CIs for the ranks were calculated by a parameterised bootstrap with the observed probability of being a case per hospital as input.9 Differences between the hospitals were calculated using a likelihood ratio test. A p value of <0.05 was considered statistically significant.
Graphical methods
We considered three techniques to visualise the influence of random variation:
A forest plot ranks the point estimate and the CI represented by horizontal lines for each hospital in ascending order. A vertical line represents a preselected norm or standard.10 11
In a funnel plot, the estimates of the hospitals are plotted together with the confidence limits of a norm or national average.12 The confidence limits are calculated in relation to the number of patients per hospital. It is customary to plot both 95% and 99.8% CIs, corresponding to approximately 2 and 3 standard errors' width.12 We calculated the CI, taking into account the discrete nature of the numbers. This exact calculation was necessary because the number of scores in which y=0 was high in some indicators.
A rank plot uses bootstrapping to estimate the CI around the rank and plots the true rank against the estimate of the bootstrap replicas and their CIs.13 Bootstrap samples are generated using a random draw with replacement to resample the individual observations from the original group. Per hospital thousand of bootstrap replicas were generated from the original binomial distribution. In this way, 1000 new datasets reflect what could be observed under the same circumstances. The rank numbers were determined for each new dataset. The distribution of ranking on the 1000 data sets was the basis of 95% confidence limits of the ranks.
Results
The overall results are described in table 1, and the graphical display of the influence of random variation is illustrated by examples for each of the indicator areas separately. There were significant differences between the hospitals for almost all PI reported, as summarised in table 1. The population size varied substantially between the indicators. For both age groups of the haemorrhagic cerebrovascular accident indicator, the mean population size was rather small (n<40). For the AMI indicators in the age group <65 years, the number of cases was small (three and four) leading to a borderline significant difference between the hospitals.
Pressure ulcers
The forest plot shows that the point estimates ranged from 1.3 to 19.4% for PU. The CIs surrounding the point estimates had a wide variation (figure 2A). For instance, the prevalence of PUs in the first hospital in figure 2A was 1.3%, but with a CI ranging from 0 to 9%. The wide range was due to the small number of patients (74). In contrast, the second hospital listed in this graph also scored 1.3% prevalence, but with CIs of 0.5 and 4%. Despite the equal results, the second hospital performed significantly better than the national standard of the Inspectorate, which is 5%.The funnel plot shows the CIs of this 5% norm (figure 2B). Hospitals situated above and below the 95% CIs had a point estimate more than twice as high as the SE. Seven hospitals performed significantly better, with a point estimate below the 95% confidence level. Ranking hospitals on the basis of presser ulcer prevalence was very uncertain, based on the wide CIs of the bootstrap samples (figure 2C).
Cerebro vascular accident
To illustrate the influence of random variation, we consider the 7-day mortality after a haemorrhagic stroke in patients younger than 65 years (figure 3). The point estimates of hospital mortality varied, ranging from 0 to 100% mortality. The wide CIs in the forest plot are due to the fact that small numbers of patients were admitted to the hospitals in 2005. The first 24 hospitals reported a mortality of 0%. However, hospital number 24 admitted only two that year, providing a CI from 0 to 100% (figure 3A). The funnel plot shows that apart from random variation, there were few differences between the hospitals (figure 3B). The rank plot reveals wide CIs, making the ranking attempt very uncertain (figure 3C).
Acute myocardial infarction
We illustrate the influence of random variation on 30-day hospital mortality after AMI in patients younger than 65 years. The point estimates in the forest plot ranged from 0 to 9.8%, with different CIs based on patient numbers (figure 4A). Given a mean score of 2.5% mortality, only two hospitals performed significantly worse than the others. The funnel plot shows that it is hard to distinguish between hospitals that performed well and those that performed poorly (figure 4B). No meaningful ranking of hospitals could be done on the basis of AMI mortality (figure 4C).
Discussion
Although league tables provide a simple overview of the data, they are easy to misinterpret, for this ranking of crude hospital performance does not include chance variability.3 Graphical displays should show the data and avoid distorting what the data have to say.14 The graphs should help compare different aspects of the data, such as the magnitude of differences in performance between the hospitals, as well as uncertainty. Therefore, visualising random variation is crucial. To accomplish this, we chose the most common graphical display in scientific medical research: the forest plot.11 Researchers often use the funnel plot in meta-analyses to display publication bias and other sample size effects.15 16 In more recent research, the funnel plot has been suggested for displaying data in public reporting of hospital performance.12 17–19 Using the rank plots, we aimed at only visualising the uncertainty in the rank order. All three displays focus on random variation, on different levels. The forest plot visualises the CI of the individual hospital, while the funnel plots focuses on the significant differences between the hospitals. The rank plot provides insight into the chance variation of the ranks.
The forest plot ranks hospitals on the point estimate but also provides information on the CIs. Our data show that the CI varies substantially, depending on sample size. For example, in PU two hospitals had the same point estimate, but the interpretation regarding the quality of care delivered in these hospitals differs in that only one performed significantly better. Therefore, the relative position is displayed, but not easily interpretable.
In our experience, the funnel plot provides a straightforward representation of the differences between hospitals. The hospitals situated outside the 95% CIs performed significantly worse or better in relation to a target or national average. The funnel plot clearly reveals that quality of care could not be measured using the stroke indicators because of the small numbers in individual hospitals. Small numbers make proper interpretation virtually impossible, because the vast majority of the apparent differences may be due to random variation.12 The funnel plot also provides professionals the information to compare their own performance with that of other hospitals with the same volume and subsequently set their own targets. The funnel plot provides a good overview of the relative position of the individual hospitals.
Although ranking raw scores provides an easy way to compare hospitals, the variance of the original measures strongly influences the rank. This is seen in the rank plot which shows that random variation greatly influences the interpretation of what the true rank might be. Ranking may even be misleading, since random variation plays a dominant role for some indicators, such as stroke and AMI. The overview of the relative position is limited.
When a graph is constructed, information is encoded. The visual decoding is called graphical perception.20 Judging the graphical perception of the different plots, we choose the two criteria described by Cleveland, pattern recognition (including random variation) and table look-up, relating to the accuracy of the relative position of hospital performance.20 As summarised in table 2, we conclude that the funnel plot is the most attractive graphical display of the three techniques.
Several articles discuss the use of league tables in presenting the results of hospital performance.10 19 21–30 They all conclude that, even when there are substantial differences between institutions, simple league tables or ranks are unreliable statistical summaries of performance. Since Spiegelhalter suggested the use of funnel plots for institutional comparison in 2002, several studies have described the usefulness of this plot.3 12 19 31 32
This research has several limitations. We concentrated on the role of random variation and paid no attention to bias, such as registration differences, organisational differences or the influence of case mix. With regard to the latter, it is likely that university hospitals or hospitals in urban areas have a different patient population from that of small hospitals or hospitals in rural areas.33 Correction for these confounding factors with these PI is impossible because the publicly available data do not include patient characteristics. This requires further investigation. In our methodology, we used the most common scientific approach calculating CI and using the described plots. We did not search intensively for other graphical displays to visualise the data. Also, the usefulness of the different graphs was not systematically assessed. This requires a more structural approach.20
We conclude that despite statistically significant differences between hospitals, random variation is a crucial factor that must be taken into account when judging individual hospitals. The funnel plot provides easily interpretable information on hospital performance, including the influence of random variation.
References
Footnotes
Funding Internal Erasmus MC grant for Healthcare Research (Mrace).
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.