Article Text

Download PDFPDF
Validity evidence for Quality Improvement Knowledge Application Tool Revised (QIKAT-R) scores: consequences of rater number and type using neurology cases
  1. Charles Kassardjian1,
  2. Yoon Soo Park2,
  3. Sherri Braksick3,
  4. Jeremy Cutsforth-Gregory4,
  5. Carrie Robertson4,
  6. Nathan Young4,5,
  7. Andrea Leep Hunderfund4
  1. 1 Department of Neurology, University of Toronto, Toronto, Ontario, Canada
  2. 2 Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois, USA
  3. 3 Department of Neurology, University of Kansas Medical Center, Kansas City, Kansas, USA
  4. 4 Department of Neurology, Mayo Clinic, Rochester, Minnesota, USA
  5. 5 Department of Neurology, Mayo Clinic Department of Health Sciences Research, Rochester, Minnesota, USA
  1. Correspondence to Dr Andrea Leep Hunderfund, Neurology, Mayo Clinic, Rochester, Minnesota 55905, USA; leep.andrea{at}


Objectives To develop neurology scenarios for use with the Quality Improvement Knowledge Application Tool Revised (QIKAT-R), gather and evaluate validity evidence, and project the impact of scenario number, rater number and rater type on score reliability.

Methods Six neurological case scenarios were developed. Residents were randomly assigned three scenarios before and after a quality improvement (QI) course in 2015 and 2016. For each scenario, residents crafted an aim statement, selected a measure and proposed a change to address a quality gap. Responses were scored by six faculty raters (two with and four without QI expertise) using the QIKAT-R. Validity evidence from content, response process, internal structure, relations to other variables and consequences was collected. A generalisability (G) study examined sources of score variability, and decision analyses estimated projected reliability for different numbers of raters and scenarios and raters with and without QI expertise.

Results Raters scored 163 responses from 28 residents. The mean QIKAT-R score was 5.69 (SD 1.06). G-coefficient and Phi-coefficient were 0.65 and 0.60, respectively. Interrater reliability was fair for raters without QI expertise (intraclass correlation = 0.53, 95% CI 0.30 to 0.72) and acceptable for raters with QI expertise (intraclass correlation = 0.66, 95% CI 0.02 to 0.88). Postcourse scores were significantly higher than precourse scores (6.05, SD 1.48 vs 5.22, SD 1.5; p < 0.001). Sufficient reliability for formative assessment (G-coefficient > 0.60) could be achieved by three raters scoring six scenarios or two raters scoring eight scenarios, regardless of rater QI expertise.

Conclusions Validity evidence was sufficient to support the use of the QIKAT-R with multiple scenarios and raters to assess resident QI knowledge application for formative or low-stakes summative purposes. The results provide practical information for educators to guide implementation decisions.

  • graduate medical education
  • medical education
  • quality improvement

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Contributors CDK contributed to study concept, data collection, data analysis and drafting of the manuscript. YSP contributed to the study design, data analysis and critical revision of the manuscript. SAB, JKC-G, CER and NPY contributed to data collection and critical revision of the manuscript. ANL contributed to the study concept, study design, data collection, data analysis and drafting of the manuscript. All authors approve the submission and agree to be accountable for all aspects of the work.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not Required

  • Provenance and peer review Not commissioned; externally peer reviewed.