Article Text

Download PDFPDF

Overestimation of clinical diagnostic performance caused by low necropsy rates
  1. K G Shojania1,
  2. E C Burton2,
  3. K M McDonald3,
  4. L Goldman1
  1. 1Department of Medicine, University of California San Francisco, CA, USA
  2. 2Department of Pathology and Laboratory Medicine, Baylor Health Care System, USA
  3. 3Center for Primary Care and Outcomes Research, Stanford University, CA, USA
  1. Correspondence to:
 Dr K G Shojania
 The Ottawa Hospital - Civic Campus, 1053 Carling Avenue, Room C403, Box 693, Ottawa, ON, Canada K1Y 4E9; kshojania{at}ohri.ca

Abstract

Background: Diagnostic sensitivity is calculated as the number of correct diagnoses divided by the sum of correct diagnoses plus the number of missed or false negative diagnoses. Because missed diagnoses are generally detected during clinical follow up or at necropsy, the low necropsy rates seen in current practice may result in overestimates of diagnostic performance. Using three target conditions (aortic dissection, pulmonary embolism, and active tuberculosis), the prevalence of clinically missed cases among necropsied and non-necropsied deaths was estimated and the impact of low necropsy rates on the apparent sensitivity of antemortem diagnosis determined.

Methods: After reviewing case series for each target condition, the most recent study that included cases first detected at necropsy was selected and the reported sensitivity of clinical diagnosis adjusted by estimating the total number of cases that would have been detected had all decedents undergone necropsy. These estimates were based on available data for necropsy rates, time period, country (US v non-US), and case mix.

Results: For all three target diagnoses, adjusting for the estimated prevalence of clinically missed cases among non-necropsied deaths produced sensitivity values outside the 95% confidence interval for the originally reported values, and well below sensitivities reported for the diagnostic tests that are usually used to detect these conditions. For active tuberculosis the sensitivity of antemortem diagnosis decreased from an apparent value of 96% to a corrected value of 83%, with a plausible range of 42–91%; for aortic dissection the sensitivity decreased from 86% to 74%; and for pulmonary embolism the reduction fell only modestly from 97% to 91% but was still lower than generally reported values of 98% or more.

Conclusions: Failure to adjust for the prevalence of missed cases among non-necropsied deaths may substantially overstate the performance of diagnostic tests and antemortem diagnosis in general, especially for conditions with high early case fatality.

  • diagnostic errors
  • necropsy
  • tuberculosis
  • aortic dissection
  • pulmonary embolism

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Diagnostic errors have received surprisingly little attention in the literature on medical errors and patient safety,1 apart from specific issues such as errors in radiograph interpretation2 and discussions of the psychological underpinnings of common pitfalls in diagnostic reasoning.3 One area with a long history of studying errors in clinical diagnosis is the literature on necropsies. Numerous studies document substantial rates of major clinically unsuspected diagnoses detected at necropsy, including missed diagnoses that probably affected outcome.4–10

Clinicians have generally attributed these persistent and roughly unchanged6,8,9 discrepancies between antemortem and postmortem diagnoses to selection bias, arguing that cases sent for necropsy are precisely those in which there is diagnostic uncertainty. Despite its plausibility, this view is not supported by the available evidence.6,11,12 If rates of clinically important diagnoses first detected at necropsy largely reflected case selection by clinicians, one would expect studies with high necropsy rates to report substantially lower error rates. However, the literature bears out this expectation only weakly. For instance, among recent studies in intensive care settings, a British study in which only 8% of decedents underwent necropsy reported clinically important missed diagnoses in 39% of cases13; a French study from an intensive care unit with a necropsy rate of 53%,10 over six times higher than in the British study, reported clinically important missed diagnoses first detected at necropsy at a slightly lower rate of 32%; and a study from a Belgian intensive care unit reported detecting such diagnoses in 26% of necropsies, despite having a necropsy rate of 93%.14 Thus, even large increases in necropsy rates were associated with only modest decreases in the rates at which necropsy revealed clinically important missed diagnoses.

We formally confirmed this weak relationship between clinical selection, as measured by the percentage of decedents undergoing necropsy, and the rates at which necropsy detects clinically important missed diagnoses in a systematic review of the necropsy literature over a 40 year period.11 In addition to our analysis, several prospective necropsy studies have found clinicians to have little4,5,15–19 or no7 ability to predict cases in which necropsy will reveal important diagnoses that escaped clinical detection.

Despite these findings, there exists a general perception that necropsies are no longer necessary as antemortem diagnosis identifies the principal cause of death and other clinically significant diagnoses in the vast majority of cases. This perception has undoubtedly contributed to the progressive decline in necropsy rates over the past 30–40 years. In most jurisdictions, including the UK,20 Australia,21 France,22 and the United States,23 10% or fewer of all natural deaths undergo necropsy. These average rates reflect a wide range of institutional necropsy rates, with a small number continuing to performing relatively frequent necropsies but many performing almost none. A survey of hospitals in Louisiana revealed that 63% of hospitals performed no necropsies in a single year.24 Consequently, many clinicians, radiologists, and others involved in the antemortem diagnostic process never have the opportunity to learn of major missed diagnoses among patients who died under their care, further contributing to the perception that necropsies are no longer necessary.

In addition to affecting perceptions about the performance of antemortem diagnostic performance, low necropsy rates affect formal assessments of diagnostic performance. Even assuming that necropsied deaths are more likely to include missed diagnoses than non-necropsied deaths (an assumption only weakly borne out by the available evidence), the non-necropsied deaths outnumber necropsied deaths by a factor of 10–20. Consequently, the number of important missed diagnoses among non-necropsied deaths could approximate or even exceed the number observed at necropsy. In conditions where misdiagnosis confers substantial short term mortality, conventional estimates of diagnostic performance—for individual tests or for the entirety of antemortem diagnosis—may substantially overstate diagnostic performance because they do not take into account the possibility of missed cases among non-necropsied deaths. We explore this possibility using the results of our previous analysis11 and published estimates of the sensitivity of antemortem diagnosis for acute aortic dissection, pulmonary embolism, and active tuberculosis—three conditions of major clinical significance known to escape clinical detection until necropsy.6–9,12,18,25–37

METHODS

Definitions

The necropsy literature defines “major errors” as clinically missed diagnoses involving a principal underlying disease or primary cause of death.6,12 Class I errors are defined as clinically missed diagnoses which, if detected during life, “would”, “could”, possibly” or “might” have affected patient prognosis or outcome—at a minimum, discharge from the hospital alive. The studies included in the previous systematic review11 all used this classification system or reported data in such a manner that we could reclassify cases in this way. However, we avoid referring to these cases as “diagnostic errors” in the present paper as we do not know the extent to which they reflect atypical presentations or acceptable limitations to current diagnosis. We refer instead to rates of diagnostic discrepancies, meaning discrepancies between antemortem and postmortem diagnoses. Removing judgments about errors and referring only to diagnostic discrepancies also reflects the recognition that, like any diagnostic test, the necropsy has an error rate of its own.38

Search strategy and study selection

We supplemented our previous systematic review11 with an extensive search for studies reporting data relevant to calculating the sensitivity of antemortem diagnosis. Eligible studies reported case series of patients with acute aortic dissection, pulmonary embolism, or active tuberculosis in which investigators explicitly identified clinically missed cases detected at necropsy. For each target condition we selected the most recent eligible study to emphasize assessments of contemporary diagnostic performance.

Estimating clinical diagnostic sensitivity

The sensitivity of antemortem diagnosis for a particular target condition equals the ratio of true positive diagnoses to the sum of true positives and false negatives. Most studies estimate false negatives using subsequent clinical follow up and/or cases first detected at necropsy, ignoring the possibility of clinically missed cases among non-necropsied deaths.

Although the prevalence of undetected diagnoses among non-necropsied deaths cannot be calculated with certainty, our previous analysis11 permits estimation of the total number of clinically missed diagnoses among all deaths—not just necropsied deaths— if the necropsy rate, time period, and case mix are known. Thus, a conservative estimate of the sensitivity of clinical diagnosis depends only on adding to the above denominator a term for clinically missed cases among non-necropsied deaths. We deem this estimate conservative because it assumes all clinical diagnoses are true positives and ignores clinically missed diagnoses that have been lost to follow up.

Estimating missed cases among non-necropsied deaths

If necropsies occurred independently of the chance of detecting clinically missed diagnoses, the total number of clinically missed cases among patients who died would equal the number of clinically missed cases detected at necropsy divided by the necropsy rate. This result would overestimate clinical false negatives as it assumes that missed cases are distributed evenly between necropsied and non-necropsied deaths—that is, the absence of any selection by clinicians. Because the relationship between necropsy rates and detection rates for missed diagnoses is roughly linear (fig 1), we can correct for this overestimate using the ratio of the diagnostic error rate expected if the institution had a necropsy rate of 100% to the error rate observed with its actual necropsy rate. This correction assumes that the analysis of diagnostic errors as a group applies to errors involving specific target conditions such as aortic dissection, pulmonary embolism, and tuberculosis. This assumption seems reasonable because these diagnoses commonly appear in the necropsy studies that generated these estimates in the first place.39

Figure 1

 Graph showing major diagnostic errors as a function of necropsy rate, adjusting for case mix, study period, and country.13 The parameters of the curve correspond to those in the larger of the two aortic dissection studies discussed in the paper41 (case mix = general hospital necropsies, year = 1990, country = US). The calculation to estimate the corrected sensitivity of antemortem diagnosis is illustrated below using the data from this study.41 The study reported 77 cases of aortic dissection detected antemortem and six cases first diagnosed at necropsy41 giving an apparent sensitivity of 77/(77 + 6) = 93%. The denominator in this estimate should include all clinically missed cases (i.e. necropsied and non-necropsied cases). Given the institutional necropsy rate of 14% (indicated on the figure), a rough estimate of the total number of clinically missed cases would be 6 divided by 0.14 = 43 cases. As indicated by the horizontal dashed line, this result overestimates clinical false negatives because it assumes that the same proportion of errors would have been observed if the necropsy rate were 100%. Because the relationship between necropsy rates and major error rates is roughly linear, the correction for the initial estimate of 43 missed cases can be approximated by the ratio of the diagnostic error rate expected if the institution had a necropsy rate of 100% (in this case 10%) to the error rate observed with its actual necropsy rate (in this case 26%). Thus, the correction for this case series would be 0.10/0.26 = 0.38, making the estimated total number of clinically missed cases 0.38*43 = 16. The corrected estimate of the sensitivity of clinical diagnosis in this example would therefore be 77/(77+16)  = 82%.

Sources of relevant necropsy rates

Because none of the clinical studies (table 1) reported necropsy rates, we drew relevant data from other sources.23,37,44,45 For the most recent study of aortic dissection40 a study from the same institution reported a necropsy rate for 1994, the mid point of the aortic dissection study period. For the second aortic dissection study41 (included because of the small size of the most recent study) we contacted the study’s first author to ascertain the necropsy rate during the study period. We used 14%, the rate from the study mid point, as the best estimate of the relevant necropsy rate and conducted a sensitivity analysis using the rates from the beginning and end of the 10 year study period (table 2). Necropsy rates for the pulmonary embolism study42 were neither reported nor available from the investigators due to the large number of participating institutions. We therefore used median necropsy rates obtained from a survey of 410 institutions in the US and Canada44 during a comparable time period.

Table 1

 Studies and data sources for target diagnoses

Table 2

 Relevant necropsy rates and correction factors for estimating clinically missed diagnoses among non-necropsied deaths

For the study of tuberculosis detected after death43 we approximated necropsy rates for hospitals in the San Francisco area using national data from the same period. In 1990, the mid point of the study’s observation period, 7% of natural deaths underwent necropsy.23 In choosing a lower bound we considered the degree to which average necropsy rates tend to be skewed by the high rates achieved by a minority of institutions. For instance, the previously mentioned survey of 410 necropsy departments in the US and Canada44 reported a very skewed distribution with a small number of institutions reporting very high necropsy rates. Even more dramatically, a recent survey of Louisiana hospitals reported that 63% of hospitals performed no necropsies in a single year.24 This study reported a mean necropsy rate of 5% but a median rate of 2%. Using this same ratio of mean to median necropsy rates, we conducted our analysis of the San Francisco tuberculosis study using 3% as the lower bound of the relevant necropsy rate (table 2). For the upper bound of the range of relevant necropsy rates we used the necropsy rate at San Francisco General Hospital which contributed the majority of detected cases (Kathryn DeRiemer, personal communication). In 1990, the mid point of the study period, San Francisco General Hospital had a necropsy rate of 17% (Walter Finkbeiner, personal communication). This value was counted as an upper bound because other hospitals in the San Francisco area would be unlikely to have a necropsy rate higher than a large teaching hospital that is also the only level 1 trauma center in the city.

RESULTS

Identified studies of clinical diagnostic performance

Table 1 presents the most recent study meeting our inclusion criteria for each of the three target diagnoses.40–43 The most recent included aortic dissection study involved 43 patients initially evaluated in the emergency departments of three University of Pittsburgh hospitals from 1992–6 with six cases first detected at necropsy.40 Because the most recent study of aortic dissection was relatively small, we also included the next most recent eligible study.41 This study of aortic dissection also reported six cases first detected at necropsy, but with a total of 77 cases diagnosed before death. The International Cooperative Pulmonary Embolism Registry42 included 2454 patients with suspected or confirmed acute pulmonary embolism from 52 hospitals in seven countries. In 61 cases detection first occurred at necropsy. We confirmed the completeness of the registry with the study’s first author who estimated that the Registry had missed no more than 5% of eligible patients at participating institutions (Samuel Z Goldhaber, personal communication). The most recent study of clinically missed active tuberculosis reviewed all cases reported to the San Francisco Department of Health from 1986 to 1995.43 Among 3102 reported cases of tuberculosis, 120 (3.9%) met the definition for diagnosis after death. We considered discarding this study because of the concern that the prevalence of AIDS in San Francisco might limit generalisability. However, a study at the national level reported approximately the same result, with 5.1% of cases meeting the same definition for diagnosis after death.46 Given this close agreement and because we had access to key data from the investigators in the San Francisco study,43 we chose to keep this study instead of the national one.46

Corrected estimates of clinical sensitivity

The contribution of missed cases among non-necropsied deaths substantially lowered the sensitivity of clinical diagnosis for all three target conditions (table 3). With the exception of the smaller study of aortic dissection,40 the corrected sensitivities were outside the 95% confidence intervals associated with the apparent sensitivities implied by the original studies.41–43 For tuberculosis the adjusted sensitivity associated with the lower bound of our estimate for the underlying necropsy rate is strikingly low at 42%. Even using 5% (instead of 3%) as the lower bound for the underlying necropsy rate, the adjusted sensitivity for detecting tuberculosis before death would be just 68%.

Table 3

 Reported diagnostic sensitivity and estimates corrected for missed cases among non-necropsied deaths

In addition to the question of the appropriate range for the underlying necropsy rate, the analysis of clinically missed cases of tuberculosis involved two issues that did not apply to the studies of aortic dissection and pulmonary embolism. Firstly, case identification relied on Public Health Department records which may underestimate clinically detected cases by almost 20%.47 We repeated our analysis correcting for this potential underestimation of clinically detected cases, but the correction exerted minimal impact given the difference in magnitude between the numbers of clinically detected cases and cases detected after death. Secondly, some of the clinically missed cases may have been identified by antemortem culture results that became positive after death, independent of a necropsy. The first author of the study confirmed that 13% of cases detected after death occurred on the basis of antemortem culture results (Kathryn DeRiemer, personal communication). Including cases detected by antemortem culture results among the cases detected after death can be regarded as increasing the effective necropsy rate—that is, with respect to diagnosing tuberculosis, sending cultures is equivalent to performing more necropsies. Since such a small percentage of cases was diagnosed in this manner, the range of values we considered for the underlying necropsy rate (3–17%) captured this effect.

DISCUSSION

Adjusting for clinically missed cases among non-necropsied deaths substantially lowered the sensitivity of clinical diagnosis for tuberculosis and aortic dissection. While not as dramatic, the decreased sensitivity for acute pulmonary embolism from 97% to 94% (with a plausible range extending as low as 90%) is still noteworthy because the corrected value lies outside the generally accepted miss rate of 4%, based on the reported false negative rate for normal lung scintigraphy.48 Certainly, a 90–94% sensitivity is well below the approximately 99% sensitivity reported for a clinical algorithm for the investigation of pulmonary embolism49 and for helical computed tomography.50

The major limitation of the analysis concerns the number of assumptions required to generate the corrected estimates for the performance of antemortem diagnosis. Given the need for these assumptions, the conservative nature of our analysis deserves special attention. Firstly, we ignored false positive clinical diagnoses so that cases of aortic dissection, pulmonary embolism, and tuberculosis were counted as “true positives” even though clinical overdiagnosis clearly occurs for pulmonary embolism and tuberculosis.48,51 Secondly, we ignored clinically missed cases lost to follow up such as those resulting in outpatient death or admission to another institution. Thirdly, we counted as “clinically detected” all cases diagnosed any time before death, thus ignoring the problem of delayed diagnosis.52–55 We also counted serendipitous diagnoses as clinically detected cases. Studies of aortic dissection specifically noted that a substantial proportion of cases were detected as a result of investigations intended to diagnose other conditions.40,41,56 Thus, while the details of the calculations used to generate the corrected estimates of antemortem diagnostic performance involve some speculative assumptions, the overall framework of the analysis heavily favours antemortem diagnosis. Our results are therefore unlikely to represent an overstatement of the problem.

One limitation of the calculations we employed concerns the instability of our estimates of corrected diagnostic performance with very low necropsy rates. For example, in the study of tuberculosis from San Francisco area hospitals, changing the lower bound for the underlying necropsy rate from 3% to 5% increased the corrected sensitivity from 68% to 77%. We would suggest, however, that this instability provides an argument for maintaining necropsy rates higher than seen in contemporary hospitals. In other words, necropsy rates in the 0–10% range simply do not permit reliable assessment of the performance of clinical diagnosis.

Another limitation relates to the age of the studies, which reported data collected approximately 10–15 years ago. Antemortem diagnosis may have improved since then, especially in the case of pulmonary embolism, given the technological advances with helical computed tomography. On the other hand, of the three conditions, antemortem diagnosis of pulmonary embolism may be the one most affected by false positive clinical diagnoses. With the advent of helical computed tomography as the primary mode of evaluation for patients with suspected pulmonary embolism, clinicians tend to treat virtually all positive results without further confirmatory testing.57 Preliminary results from Prospective Investigation of Pulmonary Embolism Diagnosis (PIOPED) II confirm the problem of false positive diagnoses among patients who did not have a high pretest probability of disease, which represents the majority of patients referred for testing.58 Thus, while technological advances may have occurred, the high yield of antemortem diagnosis for pulmonary embolism probably includes a substantial number of false positive cases, inflating the apparent sensitivity of antemortem performance.

An additional limitation of our analysis relates to the necropsy itself. Like most complex procedures involving multiple observational and cognitive elements, the necropsy almost certainly has an error rate of its own, although this issue has received relatively little attention.38 Few data address the properties of the necropsy as a diagnostic test, such as inter-rater reliability.59 Future research addressing this issue would be very valuable. However, the three conditions analysed in the present work can be diagnosed at necropsy with little ambiguity. Pulmonary embolism offers some room for error as clotting can occur after death. However, specific macroscopic and microscopic characteristics reliably distinguish antemortem and postmortem thromboemboli.60–63

As mentioned earlier, the necropsy literature frequently refers to clinically missed diagnoses as “diagnostic errors”. We have avoided this term because many so-called “errors” undoubtedly represent atypical presentations or acceptable limitations to current diagnosis. On the other hand, increased awareness of the full spectrum of clinical presentations associated with a given condition can prompt revision of what constitutes an atypical presentation. For example, abrupt onset of pain, the presence of any pulse deficit, or the presence of a murmur of aortic regurgitation—three classic signs of acute aortic dissection—are each significantly less likely to occur in patients over 70 years of age with aortic dissection than in younger patients presenting with acute dissection.64

Part of the reduction in sensitivity we have illustrated probably reflects discrepancies between effectiveness and efficacy. Just as therapeutic outcomes obtained in routine practice can diverge from the results reported in clinical trials,65–67 diagnostic tests may exhibit important discrepancies between efficacy and effectiveness. For example, despite the impressive results in formal evaluations of the sensitivity and specificity of computed tomography in diagnosing appendicitis,68 a population level analysis indicated no change in relevant outcomes such as finding a normal or ruptured appendix at laparotomy.69 Suboptimal diagnostic performance, whether for a particular test or the entirety of antemortem diagnosis, may reflect a variety of process failures from the adequacy of clinical examination41 to the appropriateness of test ordering, adequacy of test interpretation, and even eroded performance of diagnostic tests in patient populations typically excluded from formal evaluations of diagnostic tests.70,71

Regardless of their causes, clinically significant diagnoses escape detection at a much greater rate than generally appreciated, and clinicians must entertain the possibility of missed causes of death when considering whether or not to request a necropsy. Equally important, investigators who assess diagnostic tests by including clinically missed cases detected only at necropsy overestimate diagnostic sensitivity, because low necropsy rates lead to underestimation of the false negative rate. We have shown that, given the low necropsy rates in both teaching and non-teaching hospitals, including necropsy detected cases by itself is insufficient. The assessment of sensitivity for any diagnostic process or test must adjust for the substantial prevalence of clinically missed cases among non-necropsied deaths. Interventions to increase necropsy rates will be required if accurate measures of diagnostic sensitivity are to be calculated for key clinical diagnoses.

REFERENCES

Footnotes

  • This article is based in part on work performed by the UCSF-Stanford Evidence-based Practice Center under contract to the Agency for Healthcare Research and Quality (Contract No. 290-970013), Rockville, MD. The authors are responsible for the contents of this article. No statement in this article should be construed as an official position of the Agency for Healthcare Research and Quality or of the US Department of Health and Human Services.

  • Dr Shojania holds a Canada Research Chair in Patient Safety and Quality Improvement.

Linked Articles

  • Quality lines
    BMJ Publishing Group Ltd