Original Article
The inter-rater agreement of retrospective assessments of adverse events does not improve with two reviewers per patient record

https://doi.org/10.1016/j.jclinepi.2009.03.004Get rights and content

Abstract

Objective

To evaluate the inter-rater agreement of the record review process of the Dutch Adverse Event study, which we aimed to improve by the involvement of two independent physician reviewers per record instead of one including a consensus procedure in case of disagreement.

Methods

The inter-rater agreement within pairs of physicians (independent review between physician A+B) and between pairs of physicians (independent review between physician A+B and C+D) was measured to evaluate the record review process with two physicians including a consensus procedure, with 4,272 and 119 records, respectively.

Results

The inter-rater agreement within pairs of physicians was substantial for the determination of adverse events (AEs) with a kappa of 0.64 (95% confidence interval [CI]: 0.61, 0.68). The inter-rater agreement between pairs of physicians was fair for the determination of AEs with a kappa of 0.25 (95% CI: 0.05, 0.45).

Conclusion

A record review process with two physicians per record including a consensus procedure to assess AEs is not more reliable than a record review process with one physician. Retrospective estimates of incidence of AEs from record review studies should be interpreted with caution. Improvement of the method is necessary for monitoring incidence of AEs over time at a national level.

Introduction

Patient record review of hospital admissions is by far the most widely applied and thoroughly studied method for measurement of patient safety [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. It is a standard method by which adverse events (AEs) of clinical care and their degree of preventability are measured and it forms the basis for patient safety policy in several countries [11]. This method was proven valid to identify AEs and estimate their incidence in hospitals nationwide [2]. However, previous AE studies showed poor to moderate inter-rater agreement for the determination of AEs and their preventability [1], [2], [3], [5], [7], [8], [9], [10]. Therefore, standing on the shoulders of our predecessors and keeping the method and instruments maximally comparable, we have tried to improve the inter-rater agreement of the measurement of AEs and their preventability within the Dutch Adverse Event study.

Inter-rater agreement refers to the consistency of ratings or to the ability of various raters to reach the same conclusion about a specific case [2], [12]. Strategies to enhance inter-rater agreement are standardization of the measurement and consensus procedure between the reviewers [12], [13]. To improve the inter-rater agreement for the assessment of AEs in the Dutch Adverse Event study all records were independently reviewed by two physicians instead of one and in case of disagreement, the two physicians discussed and reconsidered their review to obtain consensus. We hypothesized that the involvement of two physicians per patient record including a consensus procedure would give a more reliable assessment of AEs and their preventability. Within the Dutch Adverse Event study a reliability study was conducted to evaluate the inter-rater agreement of the patient record review. The objective was twofold. First, to examine the inter-rater agreement of the original review by two independent physician reviewers before the consensus procedure. This is called the inter-rater agreement within pairs of physicians (physician A vs. B). Second, to examine the inter-rater agreement of the complete record review process, including the consensus procedure, with a second pair of physicians. This is called the inter-rater agreement between pairs of physicians (physician A+B vs. C+D). The Harvard Medical Practice Study in the United States and the Australian study on the occurrence of AEs also involved two physician reviewers and the Australian study also used a consensus procedure in case of disagreement between the two physicians [3], [10]. However, these studies only evaluated the inter-rater agreement of the original review within pairs of physicians (physician A and B) and not of the ultimate decisions made by pair of physicians. To gain insight in the reliability of the record review procedure with two physicians per patient record including a consensus procedure in case of disagreement, the inter-rater agreement between pairs of physicians is more relevant and has not yet been studied thoroughly.

Section snippets

Study design and setting

A retrospective patient record review study was conducted to determine the incidence and preventability of AEs among hospitalized patients in the Netherlands [14]. The method of this study was based on a protocol and instruments originally developed by the Harvard Medical Practice Study. They studied the incidence of AEs in New York state hospitals in 1984, based on analysis of information in patient records [3], [15]. This method, with modifications, was used in subsequent studies in

Inter-rater agreement within pairs of physicians

The inter-rater agreement within pairs of physicians (physician A and B) was determined for 2,757 (65%) records of deceased patients and for 1,515 (35%) records of discharged patients. The inter-rater agreement for the determination of AEs was substantial (κ = 0.64, 95% CI: 0.61, 0.68). Also for the determination of their preventability the inter-rater agreement was substantial (κ = 0.72, 95% CI: 0.66, 0.79) (Table 2).

Physician A and physician B separately determined 592 and 621 AEs before a

Discussion

We hypothesized that the involvement of two physicians per patient record including a consensus procedure in case of disagreement between their reviews would improve the reliability of the review process to assess AEs. However, the inter-rater agreement of the complete medical review process (inter-rater agreement between pairs of physicians), including the consensus procedure, was only fair, although the inter-rater agreement within pairs of physicians was substantial.

More consensus procedures

Conclusion

Although judgment of presence of AEs is difficult, retrospective patient record studies currently offer the best method available to assess the incidence of AEs and their preventability, nature, and types [6]. The results of record review studies provide urgently needed insight in the current state of patient safety and possibilities for improvement of patient safety and are therefore generally highly appreciated.

Involvement of two physicians per patient record and consensus procedure in case

Acknowledgments

The authors thank everyone who contributed to the study—the physicians who reviewed the patient records; the researchers for the coordination of the data collection; and the 21 participating hospitals and their staff who facilitated the patient records.

Funding: The Dutch Patient Safety Research Program has been initiated by the Dutch Society of Medical Specialists (in Dutch: Orde van Medisch Specialisten) and the Dutch Institute for Health care Improvement (CBO) with financial support from the

References (27)

  • H.C.W. de Vet et al.

    Current challenges in clinimetrics

    J Clin Epidemiol

    (2003)
  • A.R. Feinstein et al.

    High agreement but low kappa: I. The problems of two paradoxes

    J Clin Epidemiol

    (1990)
  • G.R. Baker et al.

    The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada

    CMAJ

    (2004)
  • T.A. Brennan et al.

    Reliability and validity of judgments concerning adverse events suffered by hospitalized patients

    Med Care

    (1989)
  • T.A. Brennan et al.

    Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I

    N Engl J Med

    (1991)
  • P. Davis et al.

    Adverse events regional feasibility study: methodological results

    N Z Med J

    (2001)
  • P. Davis et al.

    Adverse events in New Zealand public hospitals I: occurrence and impact

    N Z Med J

    (2002)
  • R.J. Lilford et al.

    The measurement of active errors: methodological issues

    Qual Saf Health Care

    (2003)
  • A.R. Localio et al.

    Identifying adverse events caused by medical care: degree of physician agreement in a retrospective chart review

    Ann Intern Med

    (1996)
  • E.J. Thomas et al.

    Incidence and types of adverse events and negligent care in Utah and Colorado

    Med Care

    (2000)
  • E.J. Thomas et al.

    The reliability of medical record review for estimating adverse event rates

    Ann Intern Med

    (2002)
  • R.M. Wilson et al.

    The quality in Australian Health Care Study

    Med J Aust

    (1995)
  • G.R. Baker

    Commentary. Harvard medical Practice Study

    Qual Saf Health Care

    (2004)
  • Cited by (69)

    • The Twelve Ds: An Update to Edwards and Benson’s Reasons for Non-Parental Caregiving

      2023, International Journal of Environmental Research and Public Health
    View all citing articles on Scopus
    View full text