Statistics from Altmetric.com
Interest in the utility of measuring preventable hospital deaths to drive improvement dates back to Florence Nightingale's first intimation that variations in mortality between London hospitals might reflect differences in quality of care.1 In 1999, the US Institute of Medicine's report ‘To Err is Human’ published the frequently quoted estimate of 44 000–98 000 preventable deaths annually in US hospitals, claiming that medical error represented the eighth most common cause of death in the country.2
This claim has fuelled ongoing and vigorous debate over actual numbers across many countries. Following well-publicised failures at Bristol Royal Infirmary3 and Mid Staffordshire NHS Foundation Trust,4 ,5 in England, the problem of preventable deaths has come to the wider attention of politicians and the public alike. Of late, politicians in England have developed a myopic focus on tackling preventable deaths as the key to raising performance across the NHS and look to single out discrete measures for bench-marking purposes, despite clearly not representing the complexity of modern-day hospitals.6
Acknowledgement of the existence of preventable hospital deaths is helpful in raising interest in the scale and burden of healthcare-related harm and encouraging commitment to improvement among clinicians and hospital managers. However, using preventable deaths as a comparative measure of quality between hospitals, if measures are not robust and fair, may overestimate the size of the problem and the risk to patients inducing unjustified levels of anxiety and fear and have a powerful stigmatising effect on hospitals identified as ‘high death rate’ outliers. Conversely, underestimation may lead to complacency and failure to acknowledge ongoing risks to patients. A thorough understanding of problems associated with both the concept and different approaches to measurement is needed to determine the role of preventable deaths in quality improvement.
What could be wrong with ‘preventable deaths’ as a measure of quality?
Failing to prevent an avoidable death or, worse, contributing to its occurrence, has obvious intuitive appeal as a basic quality problem. Despite the intuitive appeal of this concept, there are significant limitations to using preventable deaths to gauge the quality and safety of healthcare. First, death is an uncommon outcome for many specialties including obstetrics, psychiatry and surgical specialties such as ophthalmology. Gauging performance with any indicator related to death serves little purpose in these settings and draws attention away from the much larger pool of failures leading to harm that affect patients discharged alive. Relatively small numbers of deaths for many specialties mean that random variation can have a large influence on trends or differences across organisations. Second, nearly a quarter of all NHS hospital admissions are aged over 75 years, and more than 40% of deaths occur in those older than 80 years.7 Moreover, half the UK population end their lives in hospital,8 with the actual number varying substantially between hospitals depending on local alternatives for provision of end of life care. Thus, expected deaths as a result of underlying disease account for a large proportion of mortality, making it difficult to identify a signal of preventable deaths due to problems with care. Even when errors of commission or omission do occur, establishing the degree to which healthcare has contributed to death among very elderly, frail patients with serious illness and multiple comorbidities towards the end of their natural lifespan and with just days or hours to live is difficult.
In summary, the vast majority of deaths do not involve quality problems, preventability of death is often difficult to determine and mortality occurs rarely in many specialties and settings. The persistent attention afforded to preventable deaths over the last decade, despite numerous statements of these problems,9 ,10 stands as something of a mystery.
Problems with measurement using retrospective case record review
Two main approaches to measuring preventable deaths have been adopted in many countries, retrospective case record review (RCRR) and the use of routine data derived from hospital administrative systems. Understanding the strengths and limitations of both approaches is key if we are to persist in their use.
Determination of the preventability of hospital deaths by trained reviewers undertaking RCRR has clinical credibility in terms of taking account of the complexity of patients’ conditions and care and indicating whether or not poor care was responsible for any death. The majority of large-scale RCRRs to date identify the proportion of adverse events that occur in patients who subsequently die, but do not specifically ask reviewers to judge if a death was preventable.11 A similar approach is used by the Global Trigger Tool.12 The few RCRR that have established preventability find similar rates of preventable deaths between 3% and 6%.13–15
Using RCRR to measure preventable deaths at a hospital level is problematic. Preventability is usually measured on a 1 to 6 Likert scale with preventable deaths defined as those scoring 4 and above (probably preventable (more than 50:50); strong evidence of preventability; definitely preventable). This definition has been used in the majority of RCRR studies and is derived from the legal definition of negligence.16 The largest RCRR study of deaths in England identified a preventable death rate of 3.6% and no significant variation in the proportion of preventable deaths between hospitals.17 The small numbers of deaths occurring in each hospital will inevitably result in large random error around the measure. Variations in the intensity of treatment delivered to the growing population of elderly, frail, multicomorbid patients has the potential to impact on the opportunity for errors of commission or omission,18 and potentially create erroneous differences in preventable death rates between hospitals.
Reviewer agreement (or lack thereof) usually determined using either Cohen's Kappa statistic or the Intraclass Correlation Coefficient, constitutes the well-known Achilles heel of RCRR, reflecting the subjective element in judgements of preventability.19 The fact that harsh judges have more influence on decisions than lenient ones,13 combined with the problem of hindsight bias,20 may lead to overestimation of preventability. On the other hand, the completeness of records will affect the capacity of reviewers to make judgements leading to underestimation. Clinicians do not always make comprehensive notes when adverse events occur.21 Even with training and standardised data collection, two reviewers will only achieve moderate agreement beyond that expected by chance.14 At least five reviewers are needed to achieve reliability of about 90%,22 ,23 an unaffordable cost for what is already a resource intensive approach to identify a relatively low proportion of healthcare-related harms. Moderate reliability has the potential to compound the risk of random error. Creating a measure that also includes deaths with a preventable element falling below the traditional Likert scale cut-off has the potential to increase measurement precision—two reviewers are more likely to agree that there was at least, say, a 25% chance that death was preventable than they are to both judge that the death had at least a 50% chance of being preventable. But, this increase in precision would come at the cost of labelling more deaths as preventable than many reviewers would agree. This higher false-positive rate would undermine the use of preventable deaths as a performance measure, but might support internal improvement efforts.
In this context, measuring preventable deaths to stimulate improvement rather than to compare institutional performance, the NHS in England and Wales has drawn on work by the US Institute for Healthcare Improvement,24 to advocate the use of RCRR for the local identification of preventable deaths. Within an institution, a focus on deaths can act as a rallying call for clinicians. Virtually all hospitals now have a hospital-wide mortality review process in place, some using mixed reviewing teams of doctors and nurses or near real-time review to maximise the number of problems in healthcare identified.25 The frequency of use of an assessment of preventability of deaths varies, with many clinicians believing that setting the bar too high limits targeting of improvement initiatives by identifying a limited number of heterogeneous problems. For internal use a lower threshold may be more productive. Assurance is needed by the NHS that these local programmes are robust and capable of achieving change. The quality of the review process, mechanisms to translate findings into actions and the number of improvements made as a result of the reviews would seem fit for this purpose in this context.
Problems with using routine hospital administrative data
Using routine data such as hospital episode statistics to measure preventable deaths has several appealing features. These data are easy to access and can generate mortality measures with relatively little expense. The larger numbers of patients involved potentially generates more robust comparisons between providers and a better steer as to where to direct improvement efforts across health systems.
Following the Bristol Inquiry, the NHS put much faith in the power of publicly available comparative data on hospital mortality to identify outliers for quality. Hospital-wide standardised mortality ratios (SMRs), a case-mix adjusted ratio of observed to expected hospital deaths calculated using hospital administrative data, was developed for this purpose.26 A number of different versions of the measure exist based on differing inclusion criteria for deaths (for instance use of in-hospital only or hospitals deaths plus those within 30 days of discharge) and case mix adjustment algorithms. The measures all compare each hospital with a standard ratio derived from the mean of all the ratios in the sample, which changes from year to year. These statistics have been used to infer that hospitals towards the higher end of the ratio distribution have higher levels of preventable deaths. Politicians are particularly prone to propagating this interpretation.27 This is concerning given that four such measures when applied to the same group of 83 hospitals, in Massachusetts were shown to produce radically different rankings.28
Despite international concerns about the value of hospital-wide SMRs,9 ,29–31 many countries have adopted the measure and continue to use it. Concerns focus on the multiple factors, other than the quality of hospital care, that influence values: chance, adequacy of case mix adjustment, depth and breadth of disease coding (primary diagnosis and comorbidity), differential admission and discharge policies, patient exclusions and local service configurations. Advocates argue that these concerns are overstated and that the measure can be an effective signal for poor performance, alerting hospitals and external regulatory agencies of the need for further investigation.32 Girling et al29 estimated that 91% of hospitals investigated for high hospital-wide standardised mortality rates are likely to be false alarms due to small numbers of actual preventable deaths, reflecting other sources of noise in the system. A recent study confirmed a lack of strong association with preventable deaths identified by RCRR, thereby weakening claims that hospital-wide SMRs are a good screening test.17 False alarms are difficult to investigate given the measure provides few clues as to where to look for problems. One way to deal with these limitations is to study variation in SMRs for specific groups of patients where death is a frequent outcome (such as critical care or high risk major surgery) and where there is also high quality clinical data available to allow adequate risk adjustment and to identify where process failures are occurring.
Alternative approaches to measuring preventable harm associated with death
One alternative approach is to look at the major causes of preventable healthcare-related harm and estimate associated increased mortality. Given the larger populations contributing to such measures, compared with the limited numbers that can be assessed using individual case record reviews, case-mix adjusted variations in mortality attributable to harms such as venous thromboembolism or hospital-acquired infections may provide a better indication of where failures may be found within hospitals.
Prospective surveillance systems already exist for some common harms associated with preventable deaths, including healthcare-acquired infections and surgical complications.33 ,34 In England, these systems usually rely on specially collected data and can be highly developed with robust definitions for events (numerators) and populations at risk (denominator) and detailed protocols for operationalising the surveillance. Surveillance systems have the advantage over retrospective approaches in being able to determine prevalence and incidence. There is potential to use prospective surveillance to measure a broader range of harms. However, understanding the impact of surveillance bias, whereby hospitals that are more effective in identifying harms appear as worse performers, will be important.35 Any perceived negative consequences associated with identifying harms is likely to further decrease detection and reporting levels in some hospitals.
Combining outcome with process measures is one way of increasing specificity when identifying preventable deaths, for instance measuring pulmonary embolism in patients who die and who did not receive adequate venous thromboembolism measures. In addition, this approach can establish the link between the harm event and death and also reduce the opportunity for surveillance bias.36
With politicians and the public demanding accountability and safer outcomes, we find ourselves without a single reliable measure to compare preventable deaths between hospitals. Current approaches carry an unacceptable degree of error, which further refinement is unlikely to reduce. Furthermore, these measures do not capture the broad scope of hospital quality, or even the vast majority of harms that are occurring in hospitals. Despite this argument being unpalatable it is also important to resist political imperatives to rely on any single metric. Given the complexity of modern-day healthcare, it would be naive to imagine that there could be one measure that summarises the quality of such complex organisations as hospitals. Combining multiple measures with an appreciation of their strengths and weaknesses will provide the highest yield of quality issues and the lowest false-positive rates. Furthermore progress needs to be made on capturing harm across the whole healthcare system—hospitals, outpatient clinics, chronic care facilities and home care—if we are to fully understand its scale and untangle its root causes. Preventable deaths might be included as part of an overall multi-faceted measurement approach, but serious thought needs to be given as to whether such investigations are best kept as a tool for stimulating local quality improvement programmes, rather than gauging the performance of healthcare.
Competing interests None declared.
Provenance and peer review Commissioned; internally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.