Combining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification

doi:10.1016/j.jclinepi.2006.11.019

Journal of Clinical Epidemiology

Volume 60, Issue 9, September 2007, Pages 892-901

https://doi.org/10.1016/j.jclinepi.2006.11.019 Get rights and content

Abstract

Objectives

Adverse events (AEs) are poor patient outcomes, resulting from medical care. We performed this study to quantify the misclassification rate obtained using current AE detection methods and to evaluate the effect of combining physician AE ratings.

Study Design and Setting

Three physicians independently rated poor patient outcomes. We used latent class analysis to obtain estimates for AE prevalence and reviewer accuracy. These estimates were used as a base case for four simulations of 10,000 cases rated independently by five reviewers. We assessed the effect of AE prevalence, reviewer accuracy, and the number of agreeing reviewers on the probability that cases were correctly classified as an AE.

Results

Reviewer sensitivity and specificity for AE classification were 0.86 and 0.94, respectively. When prevalence was 3%, the positive predictive value that an AE occurred when a single reviewer classified the case as such was 31%, whereas when 2/3 reviewers did so it was 51%. The positive predictive values of ratings for AE occurrence increased with AE prevalence, reviewer accuracy, and the number of reviewers.

Conclusion

Current methods of AE detection overestimate the risk of AE. Uncertainty regarding the presence of an AE can be overcome by increasing the number of reviews.

Introduction

Patient safety is an important component of health care quality [1]. “Adverse events” (AEs), defined as poor health outcomes attributable to medical care, and the subset of them that are preventable, are important outcomes commonly used to measure patient safety [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. The Institute of Medicine and patient safety researchers adopted these definitions from the Harvard Medical Practice Study, which was performed to estimate the incidence of compensable medical injuries in New York state [4]. Because patient safety research is focused on a broader definition of poor outcomes than those eligible for compensation through litigation, the initial validation studies on AE measurement may not be relevant to their current use [15].

AEs are generally identified using case review [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. Using implicit criteria, physicians judge the cause of poor outcomes by reviewing medical records. If the physician reviewer thinks it was more likely that an outcome was caused by medical care than by the patient's underlying disease, then the outcome is rated as an AE. Given the subjective nature of these assessments, it is not surprising that reviewer judgments in the major AE studies demonstrate only moderate inter-rater reliability, with κ scores ranging between 0.39 and 0.61 [2], [4], [5], [10], [11]. This level of agreement suggests that a substantial proportion of cases is misclassified. According to one prominent AE investigator, the poor reliability of reviewers will preclude improvements in health care safety [16].

We performed this study to estimate the misclassification rates associated with standard AE review methodology. We used a latent class analysis (LCA) model to estimate reviewer sensitivity and specificity using data from a previously published study [7]. Using the estimates of reviewer sensitivity and specificity obtained from this model, we then created a computer simulation to demonstrate the effect of AE prevalence, the number of agreeing reviewers, and reviewer accuracy on “posterior probability of AEs.” The results of these analyses can help demonstrate the inaccuracy of existing AE studies for estimating AE prevalence. They can also establish the feasibility of requiring multiple reviewers to agree before a case is deemed an AE.

Section snippets

AE review as a diagnostic test

In this study, we viewed the physician review for AE as a qualitative diagnostic test. A qualitative diagnostic test, such as a chest radiograph interpretation, weighs several pieces of information to determine if a particular diagnosis, such as pneumonia, is present. Similarly, to determine if an AE has occurred, physicians review clinical information and decide whether the data support the “diagnosis” of an AE. Physician reviewers decide the cause of a poor health outcome in a manner

Observed AE ratings

Table 1 describes the number of ratings with agreement and disagreement between the three reviewers for 269 outcomes. The kappa statistic for ratings of the 269 outcomes by the three reviewers was 0.63. This indicates good agreement between reviewers and suggests that the agreement was greater than expected due to chance alone (P < 0.05).

Test characteristics of physician reviewers

Table 2 describes the estimates from our two LCA models (model 1 assumed common reviewer sensitivity and specificity, whereas model 2 allowed these to vary). In

Discussion

This analysis has several important findings. First, we found that physician reviewers exhibited reasonably good test characteristics. Using LCA, we found that sensitivity and specificity of their ratings were 86% and 94%, respectively. With a positive likelihood ratio of 14.3, these test characteristics compare favorably with many diagnostic tests used in clinical medicine. However, even with these favorable test characteristics, a single positive review provided reasonable certainty about AEs

References (39)

J.H. Gurwitz et al.
Incidence and preventability of adverse drug events in nursing homes
Am J Med
(2000)
S.D. Walter et al.
Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review
J Clin Epidemiol
(1988)
L.L. Leape
Institute of medicine medical error figures are not exaggerated
JAMA
(2000)
M. Schulzer et al.
Sensitivity and specificity of a diagnostic test determined by repeated observations in the absence of an external standard
J Clin Epidemiol
(1991)
The Institute of Medicine
Crossing the quality chasm: a new health system for the 21st century
(2001)
G.R. Baker et al.
The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada
CMAJ
(2004)
D.W. Bates et al.
Incidence of adverse drug events and potential adverse drug events. Implications for prevention. ADE Prevention Study Group
JAMA
(1995)
T.A. Brennan et al.
Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I
N Engl J Med
(1991)
P. Davis et al.
Adverse events in New Zealand public hospitals I: occurrence and impact
N Z Med J
(2002)
A.J. Forster et al.
The incidence and severity of adverse events affecting patients following discharge from the hospital
Ann Intern Med
(2003)

A.J. Forster et al.

Adverse events affecting medical patients following discharge from hospital

CMAJ

(2004)

A.J. Forster et al.

Ottawa Hospital Patient Safety Study: incidence and timing of adverse events in patients admitted to a Canadian teaching hospital

CMAJ

(2004)

P. Michel et al.

Comparison of three methods for estimating rates of adverse events and rates of preventable adverse events in acute care hospitals

BMJ

(2004)

E.J. Thomas et al.

Incidence and types of adverse events and negligent care in Utah and Colorado

Med Care

(2000)

C. Vincent et al.

Adverse events in British hospitals: preliminary retrospective record review

BMJ

(2001)

T.K. Gandhi et al.

Adverse drug events in ambulatory care

N Engl J Med

(2003)

J.H. Gurwitz et al.

Incidence and preventability of adverse drug events among older persons in the ambulatory setting

JAMA

(2003)

T.A. Brennan et al.

Reliability and validity of judgments concerning adverse events suffered by hospitalized patients

Med Care

(1989)

T.A. Brennan

The Institute of Medicine report on medical errors—could it do harm?

N Engl J Med

(2000)

Cited by (25)

Improving patient safety through the systematic evaluation of patient outcomes
2012, Canadian Journal of Surgery
Despite increased advocacy for patient safety and several large-scale programs designed to reduce preventable harm, most notably surgical checklists, recent data evaluating entire health systems suggests that we are no further ahead in improving patient safety and that hospital complications are no less frequent now than in the 1990s. We suggest that the failure to systematically measure patient safety is the reason for our limited progress. In addition to defining patient safety outcomes and describing their financial and clinical impact, we argue why the failure to implement patient safety measurement systems has compromised the ability to move the agenda forward. We also present an overview of how patient safety can be assessed and the strengths and weaknesses of each method and comment on some of the consequences created by the absence of a systematic measurement system.
En dépit des efforts accrus de sensibilisation à la sécurité des patients et de la multiplication de programmes de grande envergure en prévention des préjudices, notamment les listes de vérification en chirurgie, de récentes données d’évaluation globale des systèmes de santé révèlent une absence de progrès au chapitre de la sécurité des patients et une fréquence actuelle tout aussi grande des complications à l’hôpital qu’au cours des années 1990. Nous suggérons que cette stagnation est attribuable à l’absence de mesure systématique de la sécurité des patients. Nous définissons les résultats recherchés en matière de sécurité des patients et leurs répercussions financières et cliniques, et nous cernons les raisons pour lesquelles l’échec de la mise en œuvre de systèmes de mesure de la sécurité des patients a entravé l’avance du programme. Nous présentons aussi un aperçu de modes possibles d’évaluation de la sécurité des patients, avec leurs forces et leurs faiblesses, et nous commentons certaines des conséquences d’une absence de système de mesure systématique.
An Illustration of a Latent Class Analysis for Interrater Agreement: Identifying Subpopulations with Different Agreement Levels
2023, Journal of Measurement and Evaluation in Education and Psychology
Effect of contextual factors on the prevalence of diagnostic errors among patients managed by physicians of the same specialty: a single-centre retrospective observational study
2023, BMJ Quality and Safety
How safe are paediatric emergency departments? A national prospective cohort study
2022, BMJ Quality and Safety
Incidence, origins and avoidable harm of missed opportunities in diagnosis: Longitudinal patient record review in 21 English general practices
2021, BMJ Quality and Safety
Diagnostic Errors Are Common in Acute Pediatric Respiratory Disease: A Prospective, Single-Blinded Multicenter Diagnostic Accuracy Study in Australian Emergency Departments
2021, Frontiers in Pediatrics

View all citing articles on Scopus

View full text

Original ArticleCombining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification

Abstract

Objectives

Study Design and Setting

Results

Conclusion

Introduction

Section snippets

AE review as a diagnostic test

Observed AE ratings

Test characteristics of physician reviewers

Discussion

Am J Med

J Clin Epidemiol

JAMA

J Clin Epidemiol

Crossing the quality chasm: a new health system for the 21st century

The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada

CMAJ

Incidence of adverse drug events and potential adverse drug events. Implications for prevention. ADE Prevention Study Group

JAMA

Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I

N Engl J Med

Adverse events in New Zealand public hospitals I: occurrence and impact

N Z Med J

The incidence and severity of adverse events affecting patients following discharge from the hospital

Ann Intern Med

Adverse events affecting medical patients following discharge from hospital

CMAJ

Ottawa Hospital Patient Safety Study: incidence and timing of adverse events in patients admitted to a Canadian teaching hospital

CMAJ

Comparison of three methods for estimating rates of adverse events and rates of preventable adverse events in acute care hospitals

BMJ

Incidence and types of adverse events and negligent care in Utah and Colorado

Med Care

Adverse events in British hospitals: preliminary retrospective record review

BMJ

Adverse drug events in ambulatory care

N Engl J Med

Incidence and preventability of adverse drug events among older persons in the ambulatory setting

JAMA

Reliability and validity of judgments concerning adverse events suffered by hospitalized patients

Med Care

The Institute of Medicine report on medical errors—could it do harm?

N Engl J Med

Original Article
Combining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification