Table 4

Reliability of assessment of insightful practice (AIP) questions 1–5

RatersAIP questions 1–3 (engagement, insight and action) 1–7 scale reliability (G)AIP question 4 (global assessment) 1–7 scale reliability (G) (ICC)*AIP question 5 (binary yes/no recommendation on revalidation) reliability (G) (ICC)*
Internal consistencyInter-raterInter-raterInter-rater (95% CI)Inter-raterInter-rater (95% CI)*
10.940.710.660.54
20.960.830.79(0.68 to 0.88)0.7(0.54 to 0.83)
30.960.880.85(0.78 to 0.91)0.78(0.69 to 0.86)
40.970.910.89(0.84 to 0.93)0.83(0.75 to 0.89)
50.970.920.91(0.87 to 0.94)0.86(0.80 to 0.91)
60.970.940.92(0.89 to 0.95)0.88(0.83 to 0.92)
  • Reliabilities greater than 0.8, as required for high-stakes assessment, are given in bold.9

  • * Intraclass correlation coefficients (ICCs) are G coefficients when you have a one facet design (rater).

  • Inter-rater reliability is the extent to which one rater's assessments (or when based on multiple raters, the average of raters' assessments) are predictive of another rater's assessments.

  • 95% CIs for reliabilities (ICCs) were calculated using Fisher's ZR transformation which is dependent on raters (k) with a denominator value of (k-1), and so cannot be calculated when there is only one rater.9