TY - JOUR T1 - Peer review of quality of care: methods and metrics JF - BMJ Quality & Safety JO - BMJ Qual Saf SP - 1 LP - 5 DO - 10.1136/bmjqs-2022-014985 VL - 32 IS - 1 AU - Julian Bion AU - Joseph Edward Alderman Y1 - 2023/01/01 UR - http://qualitysafety.bmj.com/content/32/1/1.abstract N2 - The privilege of professional self-regulation rests on clinical peer review, a long-established method for assuring quality of care, training, management and research. In clinical peer review, healthcare professionals evaluate each other’s clinical performance. Based originally on the personal experience and expertise (and prejudices and biases) of one’s peers, the process has gradually been formalised by the development of externally verifiable standards of practice, audit of care processes and outcomes and benchmarking of individual, group and organisational performance and patient outcomes. The spectrum of clinical peer review ranges from local quality improvement activities such as morbidity and mortality reviews, to medical opinion offered in courts of law. Peer review can therefore have different purposes ranging from collaborative reflective learning to identification of malpractice.Given the ubiquity and importance of clinical peer review, it would be reasonable to expect some evidence of reliability of judgements made by different reviewers. And yet the literature tells a rather different story. A systematic review1 of the inter-rater reliability of audited case records reported mean kappa values ranging from 0.32 to 0.7, with higher reliability when reviewers employed explicit criteria. Reviewers may give inconsistent judgements, change their opinions over time2 and be susceptible to a variety of biases including implicit,3 cognitive4 and outcome or hindsight bias.5 To some extent, this may be mitigated and reliability improved by using a combination of both criterion-based and implicit (global) assessment6 combined with structured judgement templates,7 8 or when a smaller group of reviewers is employed to detect well-characterised signals such as adverse events.9 In a comparison of weekend and weekday quality of care across two epochs of time, using a combination of structured judgement and global (implicit) reviews of case records,10 we found modest levels of agreement between reviewers examining … ER -