Original ArticleThe inter-rater agreement of retrospective assessments of adverse events does not improve with two reviewers per patient record
Introduction
Patient record review of hospital admissions is by far the most widely applied and thoroughly studied method for measurement of patient safety [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. It is a standard method by which adverse events (AEs) of clinical care and their degree of preventability are measured and it forms the basis for patient safety policy in several countries [11]. This method was proven valid to identify AEs and estimate their incidence in hospitals nationwide [2]. However, previous AE studies showed poor to moderate inter-rater agreement for the determination of AEs and their preventability [1], [2], [3], [5], [7], [8], [9], [10]. Therefore, standing on the shoulders of our predecessors and keeping the method and instruments maximally comparable, we have tried to improve the inter-rater agreement of the measurement of AEs and their preventability within the Dutch Adverse Event study.
Inter-rater agreement refers to the consistency of ratings or to the ability of various raters to reach the same conclusion about a specific case [2], [12]. Strategies to enhance inter-rater agreement are standardization of the measurement and consensus procedure between the reviewers [12], [13]. To improve the inter-rater agreement for the assessment of AEs in the Dutch Adverse Event study all records were independently reviewed by two physicians instead of one and in case of disagreement, the two physicians discussed and reconsidered their review to obtain consensus. We hypothesized that the involvement of two physicians per patient record including a consensus procedure would give a more reliable assessment of AEs and their preventability. Within the Dutch Adverse Event study a reliability study was conducted to evaluate the inter-rater agreement of the patient record review. The objective was twofold. First, to examine the inter-rater agreement of the original review by two independent physician reviewers before the consensus procedure. This is called the inter-rater agreement within pairs of physicians (physician A vs. B). Second, to examine the inter-rater agreement of the complete record review process, including the consensus procedure, with a second pair of physicians. This is called the inter-rater agreement between pairs of physicians (physician A+B vs. C+D). The Harvard Medical Practice Study in the United States and the Australian study on the occurrence of AEs also involved two physician reviewers and the Australian study also used a consensus procedure in case of disagreement between the two physicians [3], [10]. However, these studies only evaluated the inter-rater agreement of the original review within pairs of physicians (physician A and B) and not of the ultimate decisions made by pair of physicians. To gain insight in the reliability of the record review procedure with two physicians per patient record including a consensus procedure in case of disagreement, the inter-rater agreement between pairs of physicians is more relevant and has not yet been studied thoroughly.
Section snippets
Study design and setting
A retrospective patient record review study was conducted to determine the incidence and preventability of AEs among hospitalized patients in the Netherlands [14]. The method of this study was based on a protocol and instruments originally developed by the Harvard Medical Practice Study. They studied the incidence of AEs in New York state hospitals in 1984, based on analysis of information in patient records [3], [15]. This method, with modifications, was used in subsequent studies in
Inter-rater agreement within pairs of physicians
The inter-rater agreement within pairs of physicians (physician A and B) was determined for 2,757 (65%) records of deceased patients and for 1,515 (35%) records of discharged patients. The inter-rater agreement for the determination of AEs was substantial (κ = 0.64, 95% CI: 0.61, 0.68). Also for the determination of their preventability the inter-rater agreement was substantial (κ = 0.72, 95% CI: 0.66, 0.79) (Table 2).
Physician A and physician B separately determined 592 and 621 AEs before a
Discussion
We hypothesized that the involvement of two physicians per patient record including a consensus procedure in case of disagreement between their reviews would improve the reliability of the review process to assess AEs. However, the inter-rater agreement of the complete medical review process (inter-rater agreement between pairs of physicians), including the consensus procedure, was only fair, although the inter-rater agreement within pairs of physicians was substantial.
More consensus procedures
Conclusion
Although judgment of presence of AEs is difficult, retrospective patient record studies currently offer the best method available to assess the incidence of AEs and their preventability, nature, and types [6]. The results of record review studies provide urgently needed insight in the current state of patient safety and possibilities for improvement of patient safety and are therefore generally highly appreciated.
Involvement of two physicians per patient record and consensus procedure in case
Acknowledgments
The authors thank everyone who contributed to the study—the physicians who reviewed the patient records; the researchers for the coordination of the data collection; and the 21 participating hospitals and their staff who facilitated the patient records.
Funding: The Dutch Patient Safety Research Program has been initiated by the Dutch Society of Medical Specialists (in Dutch: Orde van Medisch Specialisten) and the Dutch Institute for Health care Improvement (CBO) with financial support from the
References (27)
- et al.
Current challenges in clinimetrics
J Clin Epidemiol
(2003) - et al.
High agreement but low kappa: I. The problems of two paradoxes
J Clin Epidemiol
(1990) - et al.
The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada
CMAJ
(2004) - et al.
Reliability and validity of judgments concerning adverse events suffered by hospitalized patients
Med Care
(1989) - et al.
Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I
N Engl J Med
(1991) - et al.
Adverse events regional feasibility study: methodological results
N Z Med J
(2001) - et al.
Adverse events in New Zealand public hospitals I: occurrence and impact
N Z Med J
(2002) - et al.
The measurement of active errors: methodological issues
Qual Saf Health Care
(2003) - et al.
Identifying adverse events caused by medical care: degree of physician agreement in a retrospective chart review
Ann Intern Med
(1996) - et al.
Incidence and types of adverse events and negligent care in Utah and Colorado
Med Care
(2000)
The reliability of medical record review for estimating adverse event rates
Ann Intern Med
The quality in Australian Health Care Study
Med J Aust
Commentary. Harvard medical Practice Study
Qual Saf Health Care
Cited by (69)
The Twelve Ds: An Update to Edwards and Benson’s Reasons for Non-Parental Caregiving
2023, International Journal of Environmental Research and Public HealthAdverse Events in Pediatric Critical Care Nonsurvivors with a Low Predicted Mortality Risk: A Multicenter Case Control Study∗
2023, Pediatric Critical Care MedicineSegmented 3D Echo Planar Acquisition for Rapid Susceptibility-Weighted Imaging: Application to Microhemorrhage Detection in Traumatic Brain Injury
2022, Journal of Magnetic Resonance Imaging