Summary of research into the reliability of adverse event measures of healthcare quality
Study | Dimension of reliability | Methods | Results and conclusions |
---|---|---|---|
Panniers and Newlander (1986)30 | Inter-rater reliability | Used 2 raters to apply modified form of adverse patient occurrence inventory to sample of 200 cases from 426 patients with myocardial infarctions | Raw agreement of 99–100% for 10 items of adverse patient occurrence inventory; other 5 items ranged 72–96% (κ 0.29 to 0.83). Concluded adverse patient occurrence inventory was reliable |
Schumacher et al (1987)32 | Inter-rater reliability | Used 7 raters to apply adverse patient occurrence inventory to 752 cases (each being reviewed 2 or 3 times) drawn from 7 hospitals | Pearson correlation coefficients cited, measuring association between raters. Mean correlation for adverse patient occurrence score was 0.33 (ranged from −0.05 to 0.58). Concluded adverse patient occurrence inventory insufficiently reliable |
Richards et al (1988)28 | Inter-rater reliability | Used multiple raters to apply adverse patient occurrence inventory to 516 cases drawn from 5 hospitals, each reviewed by 2 raters | Κ statistics for adverse patient occurrence numerator items had mean of 0.33 (ranged −0.18 to 0.73); for adverse patient occurrence denominator items mean was 0.50 (ranged 0.28 to 0.83). For adverse patient occurrence score, found within-patient variability much less than overall variability. ANOVA showed differences between raters responsible for 2% of adverse patient occurrence score variability. Concluded adverse patient occurrence inventory “at best moderately reliable” |
Harvard Medical Practice Study (1990)10 | Inter-rater reliability | Used multiple raters to apply own adverse event measure to 282 cases (random 1% sample of total study), each reviewed by 2 raters | Raw agreement on presence/absence of adverse event in each case of 93.6%, κ of 0.85. Concluded measure was sufficiently reliable for use in study |
Walshe (1998)15 | Inter-rater reliability | Used multiple raters to apply adverse event measure to 374 admissions across three specialties, each reviewed by 2 raters | Overall κ statistics of 0.46, 0.63, and 0.65 in three specialties, suggesting “moderate to good reliability” but much dependent on rater training |
Walshe (1998)15 | Intra-rater reliability | Used a single rater to apply adverse event measure to 110 admissions in obstetrics, and then rescreened same records 4 months later | Overall κ statistic of 0.56 suggesting moderate reliability. Significantly more adverse events found on second screening when rater aware study being undertaken |
Walshe (1998)15 | Inter-rater reliability | Observational study of 6095 admissions in 8 specialties screened by four different raters for adverse events | Significant differences in rates of adverse events detected by different raters found in 6 specialties |