The inter-rater agreement of retrospective assessments of adverse events does not improve with two reviewers per patient record

doi:10.1016/j.jclinepi.2009.03.004

Journal of Clinical Epidemiology

Volume 63, Issue 1, January 2010, Pages 94-102

https://doi.org/10.1016/j.jclinepi.2009.03.004 Get rights and content

Abstract

Objective

To evaluate the inter-rater agreement of the record review process of the Dutch Adverse Event study, which we aimed to improve by the involvement of two independent physician reviewers per record instead of one including a consensus procedure in case of disagreement.

Methods

The inter-rater agreement within pairs of physicians (independent review between physician A+B) and between pairs of physicians (independent review between physician A+B and C+D) was measured to evaluate the record review process with two physicians including a consensus procedure, with 4,272 and 119 records, respectively.

Results

The inter-rater agreement within pairs of physicians was substantial for the determination of adverse events (AEs) with a kappa of 0.64 (95% confidence interval [CI]: 0.61, 0.68). The inter-rater agreement between pairs of physicians was fair for the determination of AEs with a kappa of 0.25 (95% CI: 0.05, 0.45).

Conclusion

A record review process with two physicians per record including a consensus procedure to assess AEs is not more reliable than a record review process with one physician. Retrospective estimates of incidence of AEs from record review studies should be interpreted with caution. Improvement of the method is necessary for monitoring incidence of AEs over time at a national level.

Introduction

Patient record review of hospital admissions is by far the most widely applied and thoroughly studied method for measurement of patient safety [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. It is a standard method by which adverse events (AEs) of clinical care and their degree of preventability are measured and it forms the basis for patient safety policy in several countries [11]. This method was proven valid to identify AEs and estimate their incidence in hospitals nationwide [2]. However, previous AE studies showed poor to moderate inter-rater agreement for the determination of AEs and their preventability [1], [2], [3], [5], [7], [8], [9], [10]. Therefore, standing on the shoulders of our predecessors and keeping the method and instruments maximally comparable, we have tried to improve the inter-rater agreement of the measurement of AEs and their preventability within the Dutch Adverse Event study.

Inter-rater agreement refers to the consistency of ratings or to the ability of various raters to reach the same conclusion about a specific case [2], [12]. Strategies to enhance inter-rater agreement are standardization of the measurement and consensus procedure between the reviewers [12], [13]. To improve the inter-rater agreement for the assessment of AEs in the Dutch Adverse Event study all records were independently reviewed by two physicians instead of one and in case of disagreement, the two physicians discussed and reconsidered their review to obtain consensus. We hypothesized that the involvement of two physicians per patient record including a consensus procedure would give a more reliable assessment of AEs and their preventability. Within the Dutch Adverse Event study a reliability study was conducted to evaluate the inter-rater agreement of the patient record review. The objective was twofold. First, to examine the inter-rater agreement of the original review by two independent physician reviewers before the consensus procedure. This is called the inter-rater agreement within pairs of physicians (physician A vs. B). Second, to examine the inter-rater agreement of the complete record review process, including the consensus procedure, with a second pair of physicians. This is called the inter-rater agreement between pairs of physicians (physician A+B vs. C+D). The Harvard Medical Practice Study in the United States and the Australian study on the occurrence of AEs also involved two physician reviewers and the Australian study also used a consensus procedure in case of disagreement between the two physicians [3], [10]. However, these studies only evaluated the inter-rater agreement of the original review within pairs of physicians (physician A and B) and not of the ultimate decisions made by pair of physicians. To gain insight in the reliability of the record review procedure with two physicians per patient record including a consensus procedure in case of disagreement, the inter-rater agreement between pairs of physicians is more relevant and has not yet been studied thoroughly.

Section snippets

Study design and setting

A retrospective patient record review study was conducted to determine the incidence and preventability of AEs among hospitalized patients in the Netherlands [14]. The method of this study was based on a protocol and instruments originally developed by the Harvard Medical Practice Study. They studied the incidence of AEs in New York state hospitals in 1984, based on analysis of information in patient records [3], [15]. This method, with modifications, was used in subsequent studies in

Inter-rater agreement within pairs of physicians

The inter-rater agreement within pairs of physicians (physician A and B) was determined for 2,757 (65%) records of deceased patients and for 1,515 (35%) records of discharged patients. The inter-rater agreement for the determination of AEs was substantial (κ = 0.64, 95% CI: 0.61, 0.68). Also for the determination of their preventability the inter-rater agreement was substantial (κ = 0.72, 95% CI: 0.66, 0.79) (Table 2).

Physician A and physician B separately determined 592 and 621 AEs before a

Discussion

We hypothesized that the involvement of two physicians per patient record including a consensus procedure in case of disagreement between their reviews would improve the reliability of the review process to assess AEs. However, the inter-rater agreement of the complete medical review process (inter-rater agreement between pairs of physicians), including the consensus procedure, was only fair, although the inter-rater agreement within pairs of physicians was substantial.

More consensus procedures

Conclusion

Although judgment of presence of AEs is difficult, retrospective patient record studies currently offer the best method available to assess the incidence of AEs and their preventability, nature, and types [6]. The results of record review studies provide urgently needed insight in the current state of patient safety and possibilities for improvement of patient safety and are therefore generally highly appreciated.

Involvement of two physicians per patient record and consensus procedure in case

Acknowledgments

The authors thank everyone who contributed to the study—the physicians who reviewed the patient records; the researchers for the coordination of the data collection; and the 21 participating hospitals and their staff who facilitated the patient records.

Funding: The Dutch Patient Safety Research Program has been initiated by the Dutch Society of Medical Specialists (in Dutch: Orde van Medisch Specialisten) and the Dutch Institute for Health care Improvement (CBO) with financial support from the

References (27)

H.C.W. de Vet et al.
Current challenges in clinimetrics
J Clin Epidemiol
(2003)
A.R. Feinstein et al.
High agreement but low kappa: I. The problems of two paradoxes
J Clin Epidemiol
(1990)
G.R. Baker et al.
The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada
CMAJ
(2004)
T.A. Brennan et al.
Reliability and validity of judgments concerning adverse events suffered by hospitalized patients
Med Care
(1989)
T.A. Brennan et al.
Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I
N Engl J Med
(1991)
P. Davis et al.
Adverse events regional feasibility study: methodological results
N Z Med J
(2001)
P. Davis et al.
Adverse events in New Zealand public hospitals I: occurrence and impact
N Z Med J
(2002)
R.J. Lilford et al.
The measurement of active errors: methodological issues
Qual Saf Health Care
(2003)
A.R. Localio et al.
Identifying adverse events caused by medical care: degree of physician agreement in a retrospective chart review
Ann Intern Med
(1996)
E.J. Thomas et al.
Incidence and types of adverse events and negligent care in Utah and Colorado
Med Care
(2000)

E.J. Thomas et al.

The reliability of medical record review for estimating adverse event rates

Ann Intern Med

(2002)

R.M. Wilson et al.

The quality in Australian Health Care Study

Med J Aust

(1995)

G.R. Baker

Commentary. Harvard medical Practice Study

Qual Saf Health Care

(2004)

Cited by (69)

Performance and optimisation of a trigger tool for the detection of adverse events in hospitalised adult patients
2017, Gaceta Sanitaria
Caracterizar el rendimiento de los triggers utilizados en la detección de eventos adversos (EA) de pacientes adultos hospitalizados y definir un panel de triggers simplificado suficientemente sensible y específico, para la detección de EA.
Estudio transversal de altas de pacientes de un servicio de medicina interna para la detección de EA mediante revisión sistemática de la historia clínica y la identificación de 41 triggers (evento clínico relacionado frecuentemente con EA), determinando si hubo EA según el contexto en que apareció el trigger. Una vez identificado el EA, se procedió a la caracterización de los triggers que lo detectaron. Se aplicó regresión logística para la selección de los triggers con mayor capacidad de detección de EA.
Se revisaron 291 historias clínicas y se detectaron 562 triggers en 103 pacientes, de los cuales 163 estuvieron implicados en la detección de un EA. Los triggers que detectaron más EA fueron «A.1. Úlcera por presión» (9,82%), «B.5. Laxante o enema» (8,59%), «A.8. Agitación» (8,59%), «A.9. Sobresedación» (7,98%), «A.7. Hemorragia» (6,75%) y «B.4. Antipsicótico» (6,75%). Se obtuvo un modelo simplificado de triggers que incluyó la variable «Número de fármacos» y los triggers «Sobresedación», «Sondaje», «Reingreso en 30 días», «Laxante o enema» y «Cese brusco de la medicación». Este modelo obtuvo una probabilidad del 81% de clasificar correctamente las historias con EA y sin EA (p <0,001; intervalo de confianza del 95%: 0,763-0,871).
Un número elevado de triggers estuvieron asociados a EA. El modelo resumido permite detectar una gran cantidad de EA con un mínimo de elementos.
To characterise the performance of the triggers used in the detection of adverse events (AE) of hospitalised adult patients and to define a simplified panel of triggers to facilitate the detection of AE.
Cross-sectional study of charts of patients from a service of internal medicine to detect EA through systematic review of the charts and identification of triggers (clinical event often related to AE), determining if there was AE as the context in which it appeared the trigger. Once the EA was detected, we proceeded to the characterization of the triggers that detected it. Logistic regression was applied to select the triggers with greater AE detection capability.
A total of 291 charts were reviewed, with a total of 562 triggers in 103 patients, of which 163 were involved in detecting an AE. The triggers that detected the most AE were “A.1. Pressure ulcer” (9.82%), “B.5. Laxative or enema” (8.59%), “A.8. Agitation” (8.59%), “A.9. Over-sedation” (7.98%), “A.7. Haemorrhage” (6.75%) and “B.4. Antipsychotic” (6.75%). A simplified model was obtained using logistic regression, and included the variable “Number of drugs” and the triggers “Over-sedation”, “Urinary catheterisation”, “Readmission in 30 days”, “Laxative or enema” and “Abrupt medication stop”. This model showed a probability of 81% to correctly classify charts with EA or without EA (p <0.001; 95% confidence interval: 0.763-0.871).
A high number of triggers were associated with AE. The summary model is capable of detecting a large amount of AE, with a minimum of elements.
The Twelve Ds: An Update to Edwards and Benson’s Reasons for Non-Parental Caregiving
2023, International Journal of Environmental Research and Public Health
Adverse Events in Pediatric Critical Care Nonsurvivors with a Low Predicted Mortality Risk: A Multicenter Case Control Study∗
2023, Pediatric Critical Care Medicine
Evolution of left–right asymmetry in the sensory system and foraging behavior during adaptation to food-sparse cave environments
2022, BMC Biology
The effect of a training webinar on decreasing inter-observer variability in making a radiologic diagnosis of bronchiectasis
2022, BMC Medical Imaging
Segmented 3D Echo Planar Acquisition for Rapid Susceptibility-Weighted Imaging: Application to Microhemorrhage Detection in Traumatic Brain Injury
2022, Journal of Magnetic Resonance Imaging

View all citing articles on Scopus

View full text

Original ArticleThe inter-rater agreement of retrospective assessments of adverse events does not improve with two reviewers per patient record

Abstract

Objective

Methods

Results

Conclusion

Introduction

Section snippets

Study design and setting

Inter-rater agreement within pairs of physicians

Discussion

Conclusion

Acknowledgments

J Clin Epidemiol

J Clin Epidemiol

The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada

CMAJ

Reliability and validity of judgments concerning adverse events suffered by hospitalized patients

Med Care

Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I

N Engl J Med

Adverse events regional feasibility study: methodological results

N Z Med J

Adverse events in New Zealand public hospitals I: occurrence and impact

N Z Med J

The measurement of active errors: methodological issues

Qual Saf Health Care

Identifying adverse events caused by medical care: degree of physician agreement in a retrospective chart review

Ann Intern Med

Incidence and types of adverse events and negligent care in Utah and Colorado

Med Care

The reliability of medical record review for estimating adverse event rates

Ann Intern Med

The quality in Australian Health Care Study

Med J Aust

Commentary. Harvard medical Practice Study

Qual Saf Health Care

Original Article
The inter-rater agreement of retrospective assessments of adverse events does not improve with two reviewers per patient record