Original Article
Combining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification

https://doi.org/10.1016/j.jclinepi.2006.11.019Get rights and content

Abstract

Objectives

Adverse events (AEs) are poor patient outcomes, resulting from medical care. We performed this study to quantify the misclassification rate obtained using current AE detection methods and to evaluate the effect of combining physician AE ratings.

Study Design and Setting

Three physicians independently rated poor patient outcomes. We used latent class analysis to obtain estimates for AE prevalence and reviewer accuracy. These estimates were used as a base case for four simulations of 10,000 cases rated independently by five reviewers. We assessed the effect of AE prevalence, reviewer accuracy, and the number of agreeing reviewers on the probability that cases were correctly classified as an AE.

Results

Reviewer sensitivity and specificity for AE classification were 0.86 and 0.94, respectively. When prevalence was 3%, the positive predictive value that an AE occurred when a single reviewer classified the case as such was 31%, whereas when 2/3 reviewers did so it was 51%. The positive predictive values of ratings for AE occurrence increased with AE prevalence, reviewer accuracy, and the number of reviewers.

Conclusion

Current methods of AE detection overestimate the risk of AE. Uncertainty regarding the presence of an AE can be overcome by increasing the number of reviews.

Introduction

Patient safety is an important component of health care quality [1]. “Adverse events” (AEs), defined as poor health outcomes attributable to medical care, and the subset of them that are preventable, are important outcomes commonly used to measure patient safety [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. The Institute of Medicine and patient safety researchers adopted these definitions from the Harvard Medical Practice Study, which was performed to estimate the incidence of compensable medical injuries in New York state [4]. Because patient safety research is focused on a broader definition of poor outcomes than those eligible for compensation through litigation, the initial validation studies on AE measurement may not be relevant to their current use [15].

AEs are generally identified using case review [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. Using implicit criteria, physicians judge the cause of poor outcomes by reviewing medical records. If the physician reviewer thinks it was more likely that an outcome was caused by medical care than by the patient's underlying disease, then the outcome is rated as an AE. Given the subjective nature of these assessments, it is not surprising that reviewer judgments in the major AE studies demonstrate only moderate inter-rater reliability, with κ scores ranging between 0.39 and 0.61 [2], [4], [5], [10], [11]. This level of agreement suggests that a substantial proportion of cases is misclassified. According to one prominent AE investigator, the poor reliability of reviewers will preclude improvements in health care safety [16].

We performed this study to estimate the misclassification rates associated with standard AE review methodology. We used a latent class analysis (LCA) model to estimate reviewer sensitivity and specificity using data from a previously published study [7]. Using the estimates of reviewer sensitivity and specificity obtained from this model, we then created a computer simulation to demonstrate the effect of AE prevalence, the number of agreeing reviewers, and reviewer accuracy on “posterior probability of AEs.” The results of these analyses can help demonstrate the inaccuracy of existing AE studies for estimating AE prevalence. They can also establish the feasibility of requiring multiple reviewers to agree before a case is deemed an AE.

Section snippets

AE review as a diagnostic test

In this study, we viewed the physician review for AE as a qualitative diagnostic test. A qualitative diagnostic test, such as a chest radiograph interpretation, weighs several pieces of information to determine if a particular diagnosis, such as pneumonia, is present. Similarly, to determine if an AE has occurred, physicians review clinical information and decide whether the data support the “diagnosis” of an AE. Physician reviewers decide the cause of a poor health outcome in a manner

Observed AE ratings

Table 1 describes the number of ratings with agreement and disagreement between the three reviewers for 269 outcomes. The kappa statistic for ratings of the 269 outcomes by the three reviewers was 0.63. This indicates good agreement between reviewers and suggests that the agreement was greater than expected due to chance alone (P < 0.05).

Test characteristics of physician reviewers

Table 2 describes the estimates from our two LCA models (model 1 assumed common reviewer sensitivity and specificity, whereas model 2 allowed these to vary). In

Discussion

This analysis has several important findings. First, we found that physician reviewers exhibited reasonably good test characteristics. Using LCA, we found that sensitivity and specificity of their ratings were 86% and 94%, respectively. With a positive likelihood ratio of 14.3, these test characteristics compare favorably with many diagnostic tests used in clinical medicine. However, even with these favorable test characteristics, a single positive review provided reasonable certainty about AEs

References (39)

  • A.J. Forster et al.

    Adverse events affecting medical patients following discharge from hospital

    CMAJ

    (2004)
  • A.J. Forster et al.

    Ottawa Hospital Patient Safety Study: incidence and timing of adverse events in patients admitted to a Canadian teaching hospital

    CMAJ

    (2004)
  • P. Michel et al.

    Comparison of three methods for estimating rates of adverse events and rates of preventable adverse events in acute care hospitals

    BMJ

    (2004)
  • E.J. Thomas et al.

    Incidence and types of adverse events and negligent care in Utah and Colorado

    Med Care

    (2000)
  • C. Vincent et al.

    Adverse events in British hospitals: preliminary retrospective record review

    BMJ

    (2001)
  • T.K. Gandhi et al.

    Adverse drug events in ambulatory care

    N Engl J Med

    (2003)
  • J.H. Gurwitz et al.

    Incidence and preventability of adverse drug events among older persons in the ambulatory setting

    JAMA

    (2003)
  • T.A. Brennan et al.

    Reliability and validity of judgments concerning adverse events suffered by hospitalized patients

    Med Care

    (1989)
  • T.A. Brennan

    The Institute of Medicine report on medical errors—could it do harm?

    N Engl J Med

    (2000)
  • Cited by (25)

    View all citing articles on Scopus
    View full text