Table 5

Inter-rater reliability between criterion-based scores (proportion of criteria stated as being met) for the same record by different reviewers

Reviewer pairs	Condition	Sit^*	No. of paired reviews	ICC between scores (95% CI)	Weighted mean ICC^† (95% CI)
Doctor vs doctor	Heart failure	F	14	0.96 (0.87 to 0.99)	0.88 (0.83 to 0.93)
	COPD	G	50	0.65 (0.46 to 0.79)
	Heart failure	B	46	0.65 (0.50 to 0.77)
	Heart failure	E^‡	12	0.64 (0.13 to 0.88)
Nurse/clinical vs nurse/clinical	COPD	J	25	0.86 (0.71 to 0.94)	0.74 (0.66 to 0.82)
	COPD	D	48	0.70 (0.52 to 0.82)
	Heart failure	D	21	0.69 (0.38 to 0.86)
	Heart failure	H	50	0.27 (0.00 to 0.51)
Non-clinical audit staff vs non-clinical audit staff	COPD	E	40	0.69 (0.49 to 0.82)	0.61 (0.47 to 0.76)
Non-clinical audit staff vs non-clinical audit staff	COPD	A	29	0.33 (−0.04 to 0.61)	0.61 (0.47 to 0.76)

COPD, chronic obstructive pulmonary disease; ICC, intraclass correlation.
↵* Only sites with more than one reviewer are included in reliability analysis; therefore, some sites do not appear on this table.
↵† Mean ICC per staff type, weighted by inverse variances to account for differing numbers of paired reviews. A single ICC was calculated for the three doctors at site B and this was combined with the other doctor pairs in the weighted mean ICC.
↵‡ Non-specialist doctors.