A systematic review of the reliability of objective structured clinical examination scores

Michael T Brannick; H Tugba Erol-Korkmaz; Matthew Prewett

doi:10.1111/j.1365-2923.2011.04075.x

A systematic review of the reliability of objective structured clinical examination scores

Med Educ. 2011 Dec;45(12):1181-9. doi: 10.1111/j.1365-2923.2011.04075.x. Epub 2011 Oct 11.

Authors

Michael T Brannick¹, H Tugba Erol-Korkmaz, Matthew Prewett

Affiliation

¹ Department of Psychology, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620-7200, USA. mbrannick@usf.edu

PMID: 21988659
DOI: 10.1111/j.1365-2923.2011.04075.x

Abstract

Context: The objective structured clinical examination (OSCE) is comprised of a series of simulations used to assess the skill of medical practitioners in the diagnosis and treatment of patients. It is often used in high-stakes examinations and therefore it is important to assess its reliability and validity.

Methods: The published literature was searched (PsycINFO, PubMed) for OSCE reliability estimates (coefficient alpha and generalisability coefficients) computed either across stations or across items within stations. Coders independently recorded information about each study. A meta-analysis of the available literature was computed and sources of systematic variance in estimates were examined.

Results: A total of 188 alpha values from 39 studies were coded. The overall (summary) alpha across stations was 0.66 (95% confidence interval [CI] 0.62-0.70); the overall alpha within stations across items was 0.78 (95% CI 0.73-0.82). Better than average reliability was associated with a greater number of stations and a higher number of examiners per station. Interpersonal skills were evaluated less reliably across stations and more reliably within stations compared with clinical skills.

Conclusions: Overall scores on the OSCE are often not very reliable. It is more difficult to reliably assess communication skills than clinical skills when considering both as general traits that should apply across situations. It is generally helpful to use two examiners and large numbers of stations, but some OSCEs appear more reliable than others for reasons that are not yet fully understood.

Publication types

Meta-Analysis
Review
Systematic Review

MeSH terms

Clinical Competence / standards*
Education, Medical
Education, Medical, Undergraduate / standards
Educational Measurement / methods*
Educational Measurement / standards*
Humans
Medical History Taking / standards
Reproducibility of Results