Article Text

Download PDFPDF

Composite measures of healthcare quality: sensible in theory, problematic in practice
  1. Rocco Friebel1,2,
  2. Adam Steventon3
  1. 1 Department of Health Policy, London School of Economics and Political Science, London, UK
  2. 2 Center for Global Development, Washington, District of Columbia, USA
  3. 3 Data Analytics, The Health Foundation, London, UK
  1. Correspondence to Dr Adam Steventon, Data Analytics, The Health Foundation, London, UK; adam.steventon{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

All healthcare systems show variation in the quality of care provided, whether that means access to primary care services,1 ambulance response times,2 Accident & Emergency waiting times3 or treatment processes and outcomes.4–6 Monitoring this variation in quality can serve multiple purposes: informing patients about where best to seek care;7 allowing clinicians to compare their performance with that of their peers and thus identify targets for local-level quality improvement efforts, and supporting the development of national policy. Though, what all these have in common is a trust in the reliability of the data to adequately reflect healthcare quality—sometimes a questionable assumption.

In BMJ Quality and Safety, Hofstede et al 8 have addressed a common situation where providers (such as hospitals, general practices or community teams) are ranked according to their performance on a quality indicator. Rankings are used often to make direct performance comparisons between providers and used to identify positive or negative outliers. Yet, one of the downsides of this approach is that the ranks of providers can be susceptible to chance fluctuations in the indicators. The precision of rankings, that is their reliability, therefore has to be carefully assessed when developing these kinds of approaches to reporting the quality of care. This is particularly the case when payment is linked to performance,9 or when ongoing quality improvement efforts might be undermined by measurement errors.

Performance measures are driven by patient case-mix, differences in care provided and chance variation, with their accuracy to reflect on real quality variation determined by two components.10 The first is the reliability of the indicator for each healthcare provider. This component (the ‘within-provider uncertainty’) is highly dependent on the number of patients receiving the type of care in question at each provider and is likely to be affected by random variation especially in smaller population groups. The second component is the variance in the indicators between providers. This ‘between-provider uncertainty’ relates to the true variation in the indicators between providers, setting aside chance variation within the individual providers. These distinctions are relevant because the reliability of the ranking system will depend on both the within-provider and between-provider uncertainty.

One way to combine these two sources of uncertainty is to measure the ‘rankability’, defined as the ratio of between-hospital variation and the sum of between-hospital variation and within-hospital variation multiplied by 100.10 This calculated percentage describes the level of variation due to true hospital differences, as opposed to random noise. Low values for this percentage imply that variation in performance across hospitals largely reflects chance, not true differences in performance. Referring to this situation as having low rankability conveys the idea that hospital rankings are unstable: chance variation could just as easily have produced quite different rankings. By contrast, high values for rankability mean that most observed variation in performance reflects real differences between hospitals – any given ranking is thus quite stable.

Hofstede et al 8 examine whether it is possible to improve the reliability of rankings based on quality measures. Two strategies are assessed: combining indicator data across several years to increase the number of events (eg, reporting readmission rates based on the number of admissions occurring over a multi-year period rather than a single year) or generating a composite measure by combining information from two or more quality indicators. Both approaches might improve the rankability – yet with some downsides in terms of the usefulness of the quality indicators, as we will return to discuss.

Hofstede and colleagues make use of Dutch National Medical Registration data for over half a million patients treated in 95 hospitals, containing indicators for in-hospital mortality, length of stay and 30-day readmission rates across 12 years. The authors considered a rankability ratio below 50% as low, between 50% and 75% as moderate and above 75% as high. Findings from the analysis show that both strategies--collecting individual indicators over multiple years or combining multiple indicators into a single composite--can significantly improve rankability compared with the use of any single outcome measure. Yet, composite measures showed greatest reliability of rankings in this study, and the authors conclude that composite measures provide more information and more reliable rankings than combining multiple years of individual indicators. But of course, there are other considerations which we now address.

What are the benefits of using composite measures?

The focus on composite quality measures is timely because they are being used in many health systems: the Center for Medicare and Medicaid Services, for example, has introduced star ratings to measure the performance of Medicare Advantage Plans and Part D plans. Star ratings are available for five categories, covering aspects such as patient experience and access, while overall star ratings for drug plans are assigned across four categories, covering aspects such as drug safety.11 In Germany, overall ratings are made publicly available for residential and domiciliary care homes, covering 59 and 34 single criteria across multiple quality dimensions, respectively.12 13

The rationale for the adoption of composite measures is simple. Over the years, administrative data collected have become available and complemented by electronic medical records as well as disease-specific data from audits and registries. The result has been a proliferation of outcome measures, which can result in information overload. Composite measures can help condense this vast amount of information into a single indicator, which is easy to use and promises an overview of performance.14 Composite measures provide information that summarises a range of quality dimensions. This might be particularly helpful for patients, who tend to place great importance on several different aspects of quality, namely they want care that is effective, safe, patient centred and delivered compassionately.

Limitations of composite measures

The potential benefits of composite measures might be outweighed by their substantial limitations (table 1). An independent review by the Health Foundation about approaches to measuring the quality of general practice in England discouraged the development and dissemination of composite scores.15 One problem is that composite measures can lack the ability to signal changes in care quality that are specific enough to be the target of improvement projects. Quality improvement efforts are often directed towards a specific problem with care delivery and measured through a precisely defined set of indicators. Improvements against these indicators might not translate into changes in composite measures that also include information in other quality domains.

Table 1

Advantages and disadvantages of composite quality measures

Another problem is that composite measures might pick up potential spillover effects. For example, a reduction in mortality can lead to a subsequent increase in hospital readmissions, since a greater proportion of patients now survive the initial hospital stay who would otherwise have died. If a composite measure was formed by combining data on mortality and readmission rates, then the two effects might be cancelled out. Another possible type of spillover effect occurs when improvements to one area of care come at the cost of deteriorations elsewhere, for example, due to limited resources. Although composite measures purport to offer a comprehensive and balanced view of quality across several domains, this is only possible if the requisite data are available, yet if data on some domains are missing, then those domains will not be reflected as well as they should be on the composite score, which is potentially misleading.

Of course, individuals and stakeholder groups might differ in their assessment about the relative importance of the constituent measures. For example, patients place a great value on receiving care that is delivered compassionately and in a timely manner,16 while clinicians might sometimes place greater emphasis on the delivery of effective treatments. A key challenge in the use of composite measures is therefore the weighing of selected single outcome measures to reflect individual preferences,17 with different weighing methods being used such as equal, numerator and opportunity-based weighting, or weighting on expert judgement.18 Importantly, to ensure the usability of composite measures, their construction and selection of outcome measures has to be guided by the overall purpose of their use and tailored towards the end user. Composite measures can be misleading when data on certain domains relevant to the end user are not available. Also, it might be challenging to adjust composite measures for confounders that can differ from one quality indicators to the next.


Clinicians, healthcare managers and policymakers depend on reliable information to make judgements about the impact of past initiatives on quality, and to guide future improvements. Composite measures are a good idea in theory as they can provide a way to make sense of the growing number of measures on various aspects of care quality. The companion paper also found that a composite measure of in-hospital mortality, 30-day readmission and prolonged length of stay showed better rankability than did individual indicators for some important medical and surgical examples commonly subjected to performance measurement. Indeed, rankability, which describes the proportion of performance variation due to true differences rather than chance, represents an important technical consideration for any performance measure.

In practice, however, composite measures suffer from significant limitations because of missing data, complex causalities and difficulties setting the right weights to reflect individual preferences. Unless these limitations are addressed, for instance, through improving the transparency around the composite measures’ inherent aims and limitations, or by allowing users to adapt composites to reflect individual preference which could be aided through data visualisation tools,19 their main applications are likely to be about helping patients to decide where to go for care, rather than quality improvement. Producers of performance rankings might be better advised to combine data across multiple years to make impact assessments. Ultimately, though, as with any evaluation, the purpose of the quality measurement should determine the selection of the measure.



  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent Not required.

  • Provenance and peer review Commissioned; internally peer reviewed.

Linked Articles