Table 4

Summary of research into the reliability of adverse event measures of healthcare quality

StudyDimension of reliabilityMethodsResults and conclusions
Panniers and Newlander (1986)30Inter-rater reliabilityUsed 2 raters to apply modified form of adverse patient occurrence inventory to sample of 200 cases from 426 patients with myocardial infarctionsRaw agreement of 99–100% for 10 items of adverse patient occurrence inventory; other 5 items ranged 72–96% (κ 0.29 to 0.83). Concluded adverse patient occurrence inventory was reliable
Schumacher et al (1987)32Inter-rater reliabilityUsed 7 raters to apply adverse patient occurrence inventory to 752 cases (each being reviewed 2 or 3 times) drawn from 7 hospitalsPearson correlation coefficients cited, measuring association between raters. Mean correlation for adverse patient occurrence score was 0.33 (ranged from −0.05 to 0.58). Concluded adverse patient occurrence inventory insufficiently reliable
Richards et al (1988)28Inter-rater reliabilityUsed multiple raters to apply adverse patient occurrence inventory to 516 cases drawn from 5 hospitals, each reviewed by 2 ratersΚ statistics for adverse patient occurrence numerator items had mean of 0.33 (ranged −0.18 to 0.73); for adverse patient occurrence denominator items mean was 0.50 (ranged 0.28 to 0.83). For adverse patient occurrence score, found within-patient variability much less than overall variability. ANOVA showed differences between raters responsible for 2% of adverse patient occurrence score variability. Concluded adverse patient occurrence inventory “at best moderately reliable”
Harvard Medical Practice Study (1990)10Inter-rater reliabilityUsed multiple raters to apply own adverse event measure to 282 cases (random 1% sample of total study), each reviewed by 2 ratersRaw agreement on presence/absence of adverse event in each case of 93.6%, κ of 0.85. Concluded measure was sufficiently reliable for use in study
Walshe (1998)15Inter-rater reliabilityUsed multiple raters to apply adverse event measure to 374 admissions across three specialties, each reviewed by 2 ratersOverall κ statistics of 0.46, 0.63, and 0.65 in three specialties, suggesting “moderate to good reliability” but much dependent on rater training
Walshe (1998)15Intra-rater reliabilityUsed a single rater to apply adverse event measure to 110 admissions in obstetrics, and then rescreened same records 4 months laterOverall κ statistic of 0.56 suggesting moderate reliability. Significantly more adverse events found on second screening when rater aware study being undertaken
Walshe (1998)15Inter-rater reliabilityObservational study of 6095 admissions in 8 specialties screened by four different raters for adverse eventsSignificant differences in rates of adverse events detected by different raters found in 6 specialties