Interrater agreement in the evaluation of discrepant imaging findings with the Radpeer system

Leila C Bender; Ken F Linnau; Eric N Meier; Yoshimi Anzai; Martin L Gunn

doi:10.2214/AJR.12.8972

Interrater agreement in the evaluation of discrepant imaging findings with the Radpeer system

AJR Am J Roentgenol. 2012 Dec;199(6):1320-7. doi: 10.2214/AJR.12.8972.

Authors

Leila C Bender¹, Ken F Linnau, Eric N Meier, Yoshimi Anzai, Martin L Gunn

Affiliation

¹ Department of Radiology, University of Washington, Box 359728, 325 9th Ave, Seattle, WA 98104, USA.

PMID: 23169725
DOI: 10.2214/AJR.12.8972

Abstract

Objective: The Radpeer system is central to the quality assurance process in many radiology practices. Previous studies have shown poor agreement between physicians in the evaluation of their peers. The purpose of this study was to assess the reliability of the Radpeer scoring system.

Materials and methods: A sample of 25 discrepant cases was extracted from our quality assurance database. Images were made anonymous; associated reports and identities of interpreting radiologists were removed. Indications for the studies and descriptions of the discrepancies were provided. Twenty-one subspecialist attending radiologists rated the cases using the Radpeer scoring system. Multirater kappa statistics were used to assess interrater agreement, both with the standard scoring system and with dichotomized scores to reflect the practice of further review for cases rated 3 and 4. Subgroup analyses were conducted to assess subspecialist evaluation of cases.

Results: Interrater agreement was slight to fair compared with that expected by chance. For the group of 21 raters, the kappa values were 0.11 (95% CI, 0.06-0.16) with the standard scoring system and 0.20 (95% CI, 0.13-0.27) with dichotomized scores. There was disagreement about whether a discrepancy had occurred in 20 cases. Subgroup analyses did not reveal significant differences in the degree of interrater agreement.

Conclusion: The identification of discrepant interpretations is valuable for the education of individual radiologists and for larger-scale quality assurance and quality improvement efforts. Our results show that a ratings-based peer review system is unreliable and subjective for the evaluation of discrepant interpretations. Resources should be devoted to developing more robust and objective assessment procedures, particularly those with clear quality improvement goals.

MeSH terms

Clinical Competence*
Diagnostic Errors / statistics & numerical data
Diagnostic Imaging*
Humans
Peer Review, Health Care*
Quality Assurance, Health Care*
Radiology / standards*
Reproducibility of Results