Statistics from Altmetric.com
Just over 50 years ago, Avedis Donabedian published his seminal paper, which sought to define and specify the ‘quality of health care’, articulating the now paradigmatic triad of structure, process and outcome for measuring healthcare quality.1 In recent years, we have seen the rapid expansion of increasingly inexpensive information technology capability and capacity, facilitating the collection and analysis of large healthcare data sets. These technological advances fuel the current proliferation of performance measurement in healthcare.2 Increasingly, in an effort to improve care, many cancer health systems, including those in England,3 the USA4 and Canada,5 6 are publicly reporting performance indicators, generally derived from these large data sets. Not surprisingly, differences in prevention, early detection and/or treatment of cancer are often used to explain the observed differences in performance across jurisdictions.6–9
Given the considerable effort and resource invested in performance measurement as well as potential adverse consequences if done poorly,10 it is important to get it right. Determining the effectiveness of healthcare performance measurement is challenging,11 particularly at the health system level. Often, performance measurement is implemented uniformly across an entire system, making well-designed controlled analysis less feasible or impossible12 13 and leaving evaluations vulnerable to secular trends.14 At the physician level, audit and feedback studies report variable results: meta-analyses show a modest benefit overall,15–17 but an important proportion of interventions were ineffective or minimally effective with a few studies suggesting a negative effect on performance.16 Likely, this heterogeneity is due to the complexity of the endeavour and its many moving parts, which include the behaviour targeted, the recipients of the feedback, their environment, the use of cointerventions and the components of the audit and feedback intervention itself.18 The latter generally comprises performance indicators, often derived from large healthcare data sets; however, who reports these indicators and how they are reported are also critical components of audit and feedback.15 16 19
In this issue of the BMJ Quality & Safety, Abel et al20 illustrate empirically some of the complexities of measuring performance. They examine the properties of 16 primary care ‘diagnostic activity indicators’ related to cancer (ie, performance indicators for diagnosing cancer) among 7000+ large general practices in England. Performance indicators are used to assess the quality of healthcare delivery, sometimes by examining outcomes such as survival or recovery of function and at other times, measuring processes, that is, the intermediate steps that have been shown or are felt to be important in terms of achieving the desirable outcomes. In this interesting analysis, the authors challenge the assumption that variation in indicators necessarily reflects underlying differences in the quality of cancer care. Using mixed models, they parse the aetiologies of the observed variation in the cancer diagnostic activity indicators across the practices in their cohort. They then assessed the reliability or ‘rankability’ (whether a practice can be meaningfully distinguished from others using a particular indicator) of the indicators.
This study used data from the Cancer Service Public Health Profile,3 which is one of 30+ thematic National Public Health Profiles published in an interactive web format by Public Health England. The Cancer Service Profile reports cancer services-related indicators at the general practice level for practices of at least 1000 patients as well as at the health region level. The Profile is intended to assist health system planners to make decisions about services and to stimulate ‘reflective practice’ among providers. While use of these data in this way is supported by links between some of the indicators and outcomes such as cancer survival,21 22 there have been concerns regarding small sample sizes and the impact of underlying case mix as causes of the observed variation.23 This is an increasingly familiar refrain in the ‘big data’ era—similar concerns have been articulated in other contexts and health systems.2 11
In their study,20 Abel and colleagues found that an important proportion of the observed variation in indicators across practices is related to factors other than quality of healthcare. Depending on the indicator, chance alone accounted for 7%–85% of the observed differences in practice. They then examined the role of case mix, finding that age and sex differences explained an additional 5%–75% of the observed variation across practices beyond the role of chance. Chance played a larger role for indicators that were identified a priori as outcome indicators than for those considered process indicators. Outcome indicators also tended to be less rankable than process indicators. As the authors point out, the findings for the outcome indicators may have resulted from smaller sample sizes. However, these findings might also be expected on the basis of fundamental differences in the nature of these two types of indicators. As has been noted,1 24 25 clinicians and the health system influence healthcare processes to a greater extent than they do patient outcomes, as the latter are subject to factors other than the quality of medical care (and this may be reflected as chance). Put simply, the hospital that a patient attends has a greater impact on the tests and treatments they receive than on whether they live or die.
Touted by some as more sensitive and less vulnerable to differences in case mix,24 the paper20 by Abel et al demonstrates empirically that process measures can also be affected by chance and case mix, although to a lesser extent than outcome measures. This finding contributes to what is already known about the limitations of process measures,26–29 including the need for evidence linking processes to outcomes, challenges introduced when defining eligible patient populations and lack of comprehensiveness. In the case of the Cancer Service Profile used by Abel et al, only a few of the process indicators in the Profile have been shown to be linked to outcomes such as survival.20 Variation in process indicators, unrelated to underlying quality of care, can be introduced during measurement if exclusions are applied variably when defining eligible populations.29 Finally, many existing process indicators are limited to measuring a specific step in the care process; as such, they may not comprehensively reflect the full process that is tied to outcomes of interest.28 This issue is illustrated in the study by Abel et al, where process indicators are limited to practice screening coverage, rates of endoscopy, and referral, and as such do not reflect all the care steps leading to diagnosis of or survival from cancer.
Abel and colleagues’ work does have some limitations. Although they were able to quantify the proportion of the variation that could be attributed to case mix, they used age and sex only. There are other aspects of case mix, such as comorbidity and disease severity, which were not included and could also be a cause of the observed variation across practices. For instance, clinicians might understandably not pursue a new cancer diagnosis as assiduously in a patient who already has advanced medical conditions than a patient of the same age and gender who has no serious medical problems. In addition, social determinants of health could also contribute to the observed variations in care. Although the analysis did not account for these factors, the underlying message would remain the same if it did: much of the observed variation in Cancer Service Profile performance indicators (both outcomes and processes) is unrelated to the underlying quality of healthcare.
A previous editorial30 on hospital standardised mortality ratios suggested that we should consider appending cautionary notes to performance indicators, just as we publish warnings for patients on medication labels. Inappropriate use or interpretation of performance indicators might be considered less worrisome than some medication side effects; however, as pointed out in the editorial, resources could be directed towards non-existing problems and complacency induced among providers or hospitals that do have problems. The study by Abel et al20 adds to the list of caveats needed and helps us understand the properties of different types of indicators in a nuanced way. In general, process indicators for diagnostic performance can be used to reliably discern between primary care practices while outcome indicators cannot. An important reason for this is that chance plays too great a role in the variation we see in outcome indicators. While chance plays this role more commonly with outcome indicators, it is not always the rule—chance variation can be an issue for any type performance indicator in primary care if the number of cases per practice remains relatively small.
As Abel and colleagues20 suggest, further empirical work may address some of the specific shortcomings of these indicators. No doubt some of what we call chance variation today will likely turn out to involve identifiable characteristics of patients, providers or health systems. But, it will probably also remain the case that, just as Donabedian proposed some 50 years ago, no single type of indicator will capture the entirety of the quality of care. We will always need a balanced set of indicators involving structure, processes and outcomes.
Competing interests None declared.
Provenance and peer review Commissioned; internally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.