Measurement of patient safety: a systematic review of the reliability and validity of adverse event detection with record review

Mirelle Hanskamp-Sebregts; Marieke Zegers; Charles Vincent; Petra J van Gurp; Henrica C W de Vet; Hub Wollersheim

doi:10.1136/bmjopen-2016-011078

Article Text

PDF

XML

Health services research

Research

Measurement of patient safety: a systematic review of the reliability and validity of adverse event detection with record review

Mirelle Hanskamp-Sebregts1,
Marieke Zegers2,
Charles Vincent3,
Petra J van Gurp1,
Henrica C W de Vet3,4,
Hub Wollersheim2

¹Radboud University Medical Center, Institute of Quality Assurance and Patient Safety, Nijmegen, The Netherlands
²Radboud University Medical Center, Radboud Institute for Health Sciences, IQ healthcare, Nijmegen, The Netherlands
³Department of Experimental Psychology, University of Oxford, Oxford, UK
⁴Department of Epidemiology and Biostatistics, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands

Correspondence to Mirelle Hanskamp-Sebregts; Mirelle.Hanskamp-Sebregts{at}radboudumc.nl

Abstract

Objectives Record review is the most used method to quantify patient safety. We systematically reviewed the reliability and validity of adverse event detection with record review.

Design A systematic review of the literature.

Methods We searched PubMed, EMBASE, CINAHL, PsycINFO and the Cochrane Library and from their inception through February 2015. We included all studies that aimed to describe the reliability and/or validity of record review. Two reviewers conducted data extraction. We pooled κ values (κ) and analysed the differences in subgroups according to number of reviewers, reviewer experience and training level, adjusted for the prevalence of adverse events.

Results In 25 studies, the psychometric data of the Global Trigger Tool (GTT) and the Harvard Medical Practice Study (HMPS) were reported and 24 studies were included for statistical pooling. The inter-rater reliability of the GTT and HMPS showed a pooled κ of 0.65 and 0.55, respectively. The inter-rater agreement was statistically significantly higher when the group of reviewers within a study consisted of a maximum five reviewers. We found no studies reporting on the validity of the GTT and HMPS.

Conclusions The reliability of record review is moderate to substantial and improved when a small group of reviewers carried out record review. The validity of the record review method has never been evaluated, while clinical data registries, autopsy or direct observations of patient care are potential reference methods that can be used to test concurrent validity.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

https://doi.org/10.1136/bmjopen-2016-011078

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

We have reviewed ∼4000 articles across five databases on psychometric data regarding the record review as a method to detect adverse events.
We evaluated the methodological quality of the included studies on measurement properties with the validated COSMIN checklist.
Two instruments for record review, the Global Trigger Tool and the Harvard Medical Practice Study, were extensively tested on their reliability, but data regarding the validity of these instruments completely lack.
The subgroup analyses were limited to the variables that were reported by the authors in the studies that were included in our systematic review.

Introduction

Healthcare professionals are faced with the challenge of improving patient safety by detecting, preventing and mitigating the occurrence of adverse events (AEs).1 ,2 An AE is defined as an injury that is caused by healthcare management (rather than the underlying disease) and results in prolonged hospitalisation, disability at the time of discharge or even in patient's death.3 Besides improving patient safety, transparency with reliable and valid data is necessary for accountability purposes.4 ,5 Non-valid or unreliable instruments for quantifying patient safety can lead to inadequate diagnosis of patient safety problems and subsequently to the implementation of inadequate patient safety improvement interventions.

Patient record review is the most thoroughly studied method used to measure the prevalence of AEs.6 Incidents, complaints and claims reporting systems are less suitable for counting AEs, because the amount of AEs strongly depends on the willingness of healthcare providers and patients to report them. Only 3–5% of the AEs detected in patient records are reported by healthcare providers in hospitals.7–11 In addition, the denominator, the related number of patients, is difficult to determine. These systems are therefore inadequate to count the actual number of incidents.12–14

Although record review is widely accepted as the method for quantifying AEs, data about the psychometric aspects of this method reported in previous literature reviews are limited12 ,13 ,15 or outdated.16 Therefore, we systematically reviewed the reliability and validity of record review and which factors are associated with these psychometric measures. We assumed that the inter-rater reliability of record review was higher for studies with a small number of reviewers, more reviewer experience and a higher training level.

Methods

Search strategy and databases

Our literature search strategy was prespecified and aligned with recommendations outlined in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses).17 We included the study protocol in online supplementary appendix 1.

Supplementary appendix

[bmjopen-2016-011078supp_appendix1.pdf]

We searched for full-text studies published until October 2013 and updated our search in February 2015 using the following databases: PubMed (including MEDLINE), EMBASE, CINAHL, PsycINFO and the Cochrane Library. The references of the included studies were manually checked, and the authors' personal files and bibliographies of previously published related reviews were searched to identify additional relevant studies (snowballing). There were no language restrictions. Online supplementary appendix 2 provides a detailed listing of search strings.

Supplementary appendix

[bmjopen-2016-011078supp_appendix2.pdf]

Selection criteria and process

Two researchers (MH-S and MZ) independently screened the titles and abstracts of all studies identified by the search strategy for their eligibility. Studies were included if (1) the record review method was described in detail, (2) AEs were measured in a wide variety of patient groups and (3) data about reliability and validity were reported. Studies not available in full-text were excluded.

When the title and abstract did not clearly indicate whether the inclusion criteria were met, the full text (meaning the complete article) was obtained and reviewed by two researchers (MH-S and MZ). The previously described inclusion criteria were applied again, and a final set of studies was identified for data extraction. Disagreement about inclusion was solved by discussion. When no consensus could be achieved, a third researcher (HW) made the final decision.

Terminology and definitions

Different types of reliability and validity of measurement instruments can be distinguished. Focus of our systematic review was on the inter-rater reliability, content (face) validity and concurrent validity of record review. Definitions are described in table 1.

View this table:

Table 1

Definitions of reliability and validity in the context of record review

Quality assessment

Assessment of the methodological quality of the selected studies was carried out using the COSMIN checklist.20 The COSMIN checklist facilitates a separate judgement of the methodological quality of the included studies and their results.21 The COSMIN checklist consists of nine boxes with methodological standards for how each measurement property should be assessed. Three of the nine boxes were relevant for this systematic review regarding inter-rater reliability, content validity and concurrent validity. There are no standards for assessing face validity, because face validity requires a subjective judgement of experts.22 Each item in these relevant boxes was scored on a four-point rating scale (ie, ‘poor’, ‘fair’, ‘good’ or ‘excellent’).20 ,21 An overall score for the methodological quality of a study was determined by taking the lowest rating of any of the items in a box. The methodological quality of a study was assessed per measurement property by MH-S, and 10% of the studies were assessed independently by MZ. In cases of disagreement, a third reviewer (HW) was consulted for a final decision.

Data extraction

Each article that met study eligibility criteria was independently abstracted by one reviewer (MH-S), and a second reviewer (MZ) crosschecked the data extraction of the first reviewer. Both reviewers used a standardised form, which compromised a description of objectives, study population, design and methods used and the results of the analysis of the reliability and validity, including statistical parameters (see online supplementary appendix 1).

Data synthesis and analysis

We tabulated study characteristics and outcomes such as setting, number of records, percentage AEs and data about reliability and validity of record review. In some studies, percentage agreement was calculated from source data by MH-S and confirmed by MZ. To be able to rate the reliability of record review, we classified the κ values as ‘slight’ (κ=0.00–0.20), ‘fair’ (κ=0.21–0.40), ‘moderate’ (κ=0.41–0.60), ‘substantial’ (κ=0.61–0.80) and ‘almost perfect’ (κ=0.81–1.00).23

We pooled the outcomes statistically by calculating the mean percentage agreement and the mean and pooled κ on the presence of AEs to draw conclusions about the reliability of record review. We used the number of records on which the κ value is calculated as weighing factor in the statistical pooling as a proxy for accuracy, since we missed information about the 95% CIs of the κ values in the included studies.

To examine differences in κ values depending on the number of reviewers, reviewer experience and reviewer training, we present descriptive statics per subgroup (mean with SD or median with IQR for non-normal distributions, minimum and maximum). In order to better interpret the results, we classified the number of reviewers per study, reviewer experience and reviewer training into three proportional classes: maximum 5 reviewers, >5–20 reviewers, >20 reviewers; <100 records per reviewer, 100–300 records per reviewer, >300 records per reviewer and <1 day training, 1 day training, >1 day training, respectively. We used the non-parametric Kruskal-Wallis test for the group characteristics, which are not normally distributed and an ANOVA for the group characteristics with a normal distribution. We checked whether the assumptions for ANCOVA were met. It was not possible to incorporate all variables (the number of reviewers, reviewer experience and reviewer training) in one ANCOVA, because the number of studies in our analyses was limited (n=20). Therefore, we performed three separate ANCOVAs, with prevalence of AE as covariate. We adjusted for prevalence of AEs, since a previous study of Lilford et al16 showed correlation between prevalence and κ. Additionally, we studied the influence of the aim of the study and the type of instrument (Global Trigger Tool (GTT) vs Harvard Medical Practice Study (HMPS)) on κ with two separate ANCOVAs adjusted for prevalence. A p value of <0.05 was regarded as statistically significant. Statistical software IBM SPSS V.22 was used for all statistical analyses and data processing.

Results

Results of the literature search

Our literature study yielded 3915 citations (see online supplementary appendix 3, flow chart), of which 1790 were in PubMed, 1153 were in EMBASE, 515 were in CINAHL, 30 were in PsycINFO and 427 were in the Cochrane Library. After removing duplicates, 3415 studies remained, of which 148 were selected for full-text selection. A total of 137 studies were excluded after reading the full text, because these studies did not meet the inclusion criteria, including studies that did not focus on the reliability or validity of record review,24–26 did not have AEs as outcome27 or reported a different method than retrospective reviewing of medical records.28 ,29 We collected eight additional articles through manual searching of articles' bibliographies. In February 2015, we updated our search and found six additional studies. The final set consisted of 25 record review studies; 24 studies were used for calculating the mean κ, and 20 studies were appropriate for the subgroup analysis. Five studies were excluded because only the intraclass correlation coefficient was calculated,30 the prevalence was an outlier,31 the prevalence was not reported32 ,33 or the number of reviewers was not reported.3

Supplementary appendix

[bmjopen-2016-011078supp_appendix3.pdf]

Description of the GTT and the HMPS

We found two record review instruments for detecting AEs, namely, the GTT and the HMPS. Both instruments use an implicit review style, meaning that the AE assessment relies on expert judgement instead of using well-defined criteria on a checklist (explicit review style).6 ,16 The GTT and the HMPS consist of a two-stage review process conducted by nurses and physicians (table 2). The GTT is primarily used as a quality improvement tool for clinical practice and for estimating and tracking AE rates over time in a hospital or a clinic. The HMPS is commonly used to measure the prevalence rate of AEs on a national level. The GTT is not meant to identify every single AE in a patient record, and, therefore, assessments have a time limit of 20 min per record.34 The GTT consists of 47–55 triggers to identify potential AEs. Reviewing the preventability of adverse events is originally no part of the GTT method, but has been recently included in the studies of Schildmeijer et al,35 Kennerly et al,36 Najjar et al37 and Hwang et al.38 In contrast, the HMPS consists of 16–18 screening criteria (triggers), 27 leading questions for AE detection, of which three questions are crucial for AE determination: injury present; resulting in prolongation of hospital stay, temporary or permanent disability or death and caused by healthcare management. Determination of preventability of AEs is standard within the HMPS method. The HMPS is more time-consuming and labour-intensive in assessing AEs (stage 2) than the GTT, due to the number of questions.

View this table:

Table 2

Description of the Global Trigger Tool and Harvard Medical Practice Study

Characteristics and methodological quality of included studies

Most of the identified studies were carried out in the USA, UK, Canada, Europe and Australia (see online supplementary appendices 4 and 5). In these studies, the GTT (n=10 studies) and HMPS (n=15 studies) were all tested in hospitals. The percentage AEs in GTT studies ranged from 7.2% to 27.0% (see online supplementary appendix 4). The total number of reviewers varied from 2 to 20 reviewers per study. Reviewers assessed 50 to 4043 records on average. The percentage AEs in HMPS studies ranged from 2.9% to 18.0%, and for preventable AEs they ranged from 1% to 8.6% (see online supplementary appendix 5). The total number of reviewers varied from 2 to 127 reviewers per study. Average records per reviewer ranged from 38 to 3872 records. The primary aim of most of the GTT studies included in this review was to examine the inter-rater reliability, whereas the primary aim of the HMPS studies reporting inter-rater reliability data was measuring AE rates.

Supplementary appendix

[bmjopen-2016-011078supp_appendix4.pdf]

Supplementary appendix

[bmjopen-2016-011078supp_appendix5.pdf]

The methodological quality of the included studies3 ,11 ,30–33 ,35–58 was good. In all these studies, the inter-rater reliability was evaluated. In one study, the face validity was evaluated.32

Reliability of the GTT

The percentage agreement for reviewers of AE assessment was reported in four studies,31 ,38 ,43 ,47 ranging from 83% to 94% with a mean of 87.5% (SD 4.8%) (see online supplementary appendix 4). One study showed fair inter-rater reliability (κ=0.34),47 two studies showed moderate inter-rater reliability (κ=0.45),35 ,43 five studies showed substantial inter-rater reliability (κ=0.62–0.74)31 ,36 ,38 ,45 ,46 and two studies showed almost perfect inter-rater reliability (κ=0.85–0.89).37 ,44 The mean κ and pooled κ are 0.65 (SD 0.19), meaning that the overall inter-rater reliability of the GTT is substantial.23

Reliability of the HMPS

The percentage agreement of AE assessment was reported in 10 studies and ranged from 73% to 91% with a mean of 83% (SD 6.1%);3 ,11 ,39–42 ,49 ,50 ,52–54 percentage agreement for preventability of AE was assessed in six studies and ranged from 58% to 93% with a mean of 81% (SD 13%)3 ,11 ,39 ,40 ,49 ,54 (see online supplementary appendix 5).

Ten studies showed moderate inter-rater reliability for AE detection (κ=0.40–0.57)32 ,39 ,41 ,42 ,48–52 ,54 and in four studies the inter-rater reliability was substantial (κ=0.61–0.80).3 ,11 ,40 ,49 In 10 studies, the κ for assessing preventable AEs was reported and ranged from 0.19 to 0.76.3 ,11 ,32 ,39 ,40 ,48 ,49 ,51 ,53 ,54 One study showed slight inter-rater reliability (κ=0.19),53 three studies showed fair inter-rater reliability (κ=0.24–0.34),3 ,32 ,54 three studies showed moderate inter-rater reliability (κ=0.44–0.49)11 ,39 ,48 and three studies showed substantial inter-rater reliability (κ=0.69–0.76)40 ,49 ,51 for assessing preventable AEs. The mean κ and pooled κ of the HMPS for AE assessment are 0.54 (SD 0.10) and 0.55 (SD 0.07), respectively, and, for assessing preventability, they are 0.47 (SD 0.20) and 0.48 (SD 0.20), respectively. The inter-rater reliability of the HMPS is classified as moderate.23

Subgroup analysis inter-rater reliability

The number of GTT studies (n=9) and HMPS studies (n=11) were too small to perform the subgroup analysis for the methods separately. Therefore, we used the κ statistics of all studies (n=20) to carry out the subgroup analysis. The assumptions for ANCOVA were met. Prevalence was not statistically significant associated with the κ values (p=0.069, p=0.189 and p=0.726, respectively). We found a statistically significant difference in the pooled κ values, p=0.006, among subgroups according to the number of reviewers (table 3). There were no differences in κ values between subgroups according to reviewer experience (p=0.062) and reviewer training (p=0.809). The group of maximum five reviewers detected more AEs (average 17.1%) in comparison with the other two groups of reviewers (table 4). This group received the least training (median 6 hours) and assessed the largest number of records (median 213 records). There was no significant difference in the reviewer experience (p=0.351), the reviewer training (p=0.317) and the prevalence of AEs (p=0.480) between the three groups of reviewers (maximum 5 reviewers, >5–20 reviewers and >20 reviewers).

View this table:

Table 3

Differences in pooled κ values (n=20) among subgroups according to number of reviewers, reviewer experience and reviewer training

View this table:

Table 4

The reviewer experience, reviewer training and the prevalence of AEs in the three groups of reviewers

The number of studies that reported the κ of preventable AEs (n=8) was too small for subgroup analysis. The aim of the study and the type of instrument (GTT vs HMPS) were not statistically significantly associated with κ (p=0.572 and p=0.086, respectively).

Validity

The face validity of the HMPS was reported in one study as being a valid method to identify AEs.32 We found no studies in which the concurrent validity of the GTT or HMPS has been studied.

Discussion

The inter-rater reliability of record review to detect AEs is moderate to substantial;23 with a pooled κ of 0.65 and 0.55 for the GTT method and the HMPS method, respectively. The pooled κ for preventability, measured with the HMPS method, is moderate, 0.48. The fact that there are no studies looking at concurrent validity is alarming, given the statements that record review is accepted worldwide as the ‘best’ means of measuring incidence rates of AEs (even called ‘the gold standard’).15 ,59 Even if the inter-rater reliability of record review is acceptable, there is no evidence that record review really detects AEs. Possible methods to test the concurrent validity of record review are clinical data registries, autopsy or direct observations of patient care. No single, even a small study experimented with above listed reference methods, although these methods capture valuable (real-time), accurate and precise patient data.13 ,60–63

We found statistically significant higher inter-rater reliability in subgroups in which the group of reviewers consisted of five reviewers or less. An explanation for this difference is that when the group of reviewers is small, the assessment of the presence of an AE becomes more standardised.40 ,64 Having a small group of reviewers stimulates (un)intentionally working closer together, resulting in less variation in the review methodology and more consensus about the definition of what constitutes harm in order to be counted as an AE. Additional advantages of having a small group of reviewers are that intensive review training can be organised, and the review process can be better monitored.40 In our review however, the group of maximum five reviewers received less training hours. Probably, they were better supervised or communicate better with each other during the study, which could increase the inter-rater agreement.

The inter-rater reliability was higher when reviewers assess a substantial number of records.40 We found no statistically significant differences between subgroups according to reviewer experience, despite the group of maximum five reviewers assessed a notable number of records compared to the groups of reviewers, which consist of 6–20 reviewers or more than 20 reviewers.

From other studies, we know that training improves the performance of review teams and the application of record review.65 ,66 We found no evidence for this in our review. In fact, the group of maximum five reviewers had half the training hours compared to the group of 6–20 reviewers but achieved a higher inter-rater agreement.

The systematic review of Lilford et al16 showed that there was an association between κ and the prevalence of AEs. We found no statistically significantly association between κ and the prevalence of AEs. The smaller range of the prevalence rate (2.9–27.0%) in our review compared to the review of Lilford et al16 (2.8–58.9%) could explain why we did not find an association between κ and the prevalence of AEs.

Our systematic review has some strengths and limitations. First, the evidence of the results of the statistical pooling depends on the quality of the therein contained studies. We used the validated COSMIN tool20 to evaluate the methodological quality of the included studies. Second, it was not possible to formally estimate the pooled κ statistics for the GTT and Medical Record Review (MRR) to assess between-study heterogeneity or to carry out analyses of the likelihood of publication bias, because CIs were lacking in approximately half of the reliability studies. Third, the subgroup analyses were limited to the variables that were reported by the authors in the included studies of our systematic review. Other factors that possibly influence the inter-rater agreement between reviewers, such as the level of cooperation between the reviewers during the review process, could therefore not be studied. Fourth, our review may have been influenced by publication bias, as studies reporting low reliability or validity may be less likely to be published than those with more positive results. Fifth, we statistically pooled the κ values. However, specific agreement on the presence of AE, expressing the agreement separately for the positive and negative ratings, is recommended.67 After all, inter-rater reliability concerns when one reviewer finds an AE, and this AE is also found by a second reviewer. Unfortunately, in most of the studies, information about the number of records for which there was agreement, presented in a 2×2 cross table, was missing. Therefore, we could not perform a statistical pooling of the proportion of specific agreement.

In conclusion, users of the record review method to assess (preventable) AEs should be aware that the inter-rater agreement between reviewers is moderate to substantial and increases when using a smaller group of reviewers. More studies are needed to explore which factors increase the inter-rater reliability of record review. Most importantly, concurrent validity should be tested, otherwise it remains an imperfect, never evaluated method.

Acknowledgments

The authors thank Ir Reinier Akkermans, statistician, for his recommendations by the statistical pooling.

References

↵
1. Andermann A,
2. Wu AW,
3. Lashoher A, et al
. Case studies of patient safety research classics to build research capacity in low- and middle-income countries. Jt Comm J Qual Patient Saf 2013;39:553–60.
OpenUrl
↵
1. Duckers M,
2. Faber M,
3. Cruijsberg J, et al
. Safety and risk management interventions in hospitals: a systematic review of the literature. Med Care Res Rev 2009;66(6 Suppl):90S–119S. doi:10.1177/1077558709345870
OpenUrl Abstract/FREE Full Text
↵
1. Brennan TA,
2. Leape LL,
3. Laird NM, et al
. Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I. N Engl J Med 1991;324:370–6.
OpenUrl CrossRef PubMed Web of Science
↵
1. Denis J-L
. Accountability in healthcare organizations and systems. Healthc Policy 2014;10:8–11.
OpenUrl
↵
1. Werner RM,
2. Asch DA
. The unintended consequences of publicly reporting quality information. JAMA 2005;293:1239–44. doi:10.1001/jama.293.10.1239
OpenUrl CrossRef PubMed Web of Science
↵
1. Weingart SN,
2. Davis RB,
3. Palmer RH, et al
. Discrepancies between explicit and implicit review: physician and nurse assessments of complications and quality. Health Serv Res 2002;37:483–98. doi:10.1111/1475-6773.033
OpenUrl CrossRef PubMed Web of Science
↵
1. Kennerly DA,
2. Kudyakov R,
3. da Graca B, et al
. Characterization of adverse events detected in a large health care delivery system using an enhanced Global Trigger Tool over a five-year interval. Health Serv Res 2014;49:1407–25. doi:10.1111/1475-6773.12163
OpenUrl
↵
1. Rutberg H,
2. Borgstedt Risberg M,
3. Sjodahl R, et al
. Characterisations of adverse events detected in a university hospital: a 4-year study using the Global Trigger Tool method. BMJ Open 2014;4:e004879. doi:10.1136/bmjopen-2014-004879
OpenUrl Abstract/FREE Full Text
↵
1. Christiaans-Dingelhoff I,
2. Smits M,
3. Zwaan L, et al
. To what extent are adverse events found in patient records reported by patients and healthcare professionals via complaints, claims and incident reports? BMC Health Serv Res 2011;11:49. doi:10.1186/1472-6963-11-49
OpenUrl CrossRef PubMed
↵
1. Classen DC,
2. Resar R,
3. Griffin F, et al
. ‘Global Trigger Tool’ shows that adverse events in hospitals may be ten times greater than previously measured. Health Aff (Millwood) 2011;30:581–9. doi:10.1377/hlthaff.2011.0190
OpenUrl Abstract/FREE Full Text
↵
1. Sari AB,
2. Sheldon TA,
3. Cracknell A, et al
. Extent, nature and consequences of adverse events: results of a retrospective casenote review in a large NHS hospital. Qual Saf Health Care 2007;16:434–9. doi:10.1136/qshc.2006.021154
OpenUrl Abstract/FREE Full Text
↵
1. Vincent C,
2. Burnett S,
3. Carthey J
. The measurement and monitoring of safety. The Health Foundation, 2013.
↵
1. Thomas EJ,
2. Petersen LA
. Measuring errors and adverse events in health care. J Gen Intern Med 2003;18:61–7. doi:10.1046/j.1525-1497.2003.20147.x
OpenUrl CrossRef PubMed Web of Science
↵
1. Tsang C,
2. Aylin P,
3. Palmer W
. Patient safety indicators: a systematic review of the literature. London, UK: Dr. Foster Unit, Imperial College. October 2008.
↵
1. Murff HJ,
2. Patel VL,
3. Hripcsak G, et al
. Detecting adverse events for patient safety research: a review of current methodologies. J Biomed Inform 2003;36:131–43. doi:10.1016/j.jbi.2003.08.003
OpenUrl CrossRef PubMed Web of Science
↵
1. Lilford R,
2. Edwards A,
3. Girling A, et al
. Inter-rater reliability of case-note audit: a systematic review. J Health Serv Res Policy 2007;12:173–80. doi:10.1258/135581907781543012
OpenUrl Abstract/FREE Full Text
↵
1. Moher D,
2. Liberati A,
3. Tetzlaff J, et al
. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: the PRISMA statement. Ann Intern Med 2009;151:264–9. doi:10.7326/0003-4819-151-4-200908180-00135
OpenUrl CrossRef PubMed Web of Science
1. Mokkink LB,
2. Terwee CB,
3. Patrick DL, et al
. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010;63:737–45. doi:10.1016/j.jclinepi.2010.02.006
OpenUrl CrossRef PubMed Web of Science
1. Vet HCWd,
2. Terwee CB,
3. Mokkink LB, et al
. Measurement in medicine. A practical guide. Cambridge: Cambridge University Press, 2011.
↵
Cosmin. Secondary. http://www.cosmin.nl/ (accessed 4 Dec 2015).
↵
1. Terwee CB,
2. Mokkink LB,
3. Knol DL, et al
. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res 2012;21:651–7. doi:10.1007/s11136-011-9960-1
OpenUrl CrossRef PubMed Web of Science
↵
1. Mokkink LB,
2. Terwee CB,
3. Patrick DL, et al
. The COSMIN checklist manual. Amsterdam: VU University Medical Centre, 2009.
↵
1. Landis JR,
2. Koch GG
. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 1977;33:363–74. doi:10.2307/2529786
OpenUrl CrossRef PubMed Web of Science
↵
1. Flynn EA,
2. Barker KN,
3. Pepper GA, et al
. Comparison of methods for detecting medication errors in 36 hospitals and skilled-nursing facilities. Am J Health Syst Pharm 2002;59:436–46.
OpenUrl Abstract/FREE Full Text
↵
1. Forster AJ,
2. Taljaard M,
3. Bennett C, et al
. Reliability of the peer-review process for adverse event rating. PLoS One 2012;7:e41239. doi:10.1371/journal.pone.0041239
OpenUrl PubMed
↵
1. Forster AJ,
2. O'Rourke K,
3. Shojania KG, et al
. Combining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification. J Clin Epidemiol 2007;60:892–901. doi:10.1016/j.jclinepi.2006.11.019
OpenUrl CrossRef PubMed Web of Science
↵
1. Nettleman MD,
2. Nelson AP
. Adverse occurrences during hospitalization on a general medicine service. Clin Perform Qual Health Care 1994;2:67–72.
OpenUrl PubMed
↵
1. Michel P,
2. Quenon JL,
3. Djihoud A, et al
. French national survey of inpatient adverse events prospectively assessed with ward staff. Qual Saf Health Care 2007;16:369–77. doi:10.1136/qshc.2005.016964
OpenUrl Abstract/FREE Full Text
↵
1. Michel P,
2. Quenon JL,
3. de Sarasqueta AM, et al
. Comparison of three methods for estimating rates of adverse events and rates of preventable adverse events in acute care hospitals. BMJ 2004;328:199. doi:10.1136/bmj.328.7433.199
OpenUrl Abstract/FREE Full Text
↵
1. Hayward RA,
2. Hofer TP
. Estimating hospital deaths due to medical errors: preventability is in the eye of the reviewer. JAMA 2001;286:415–20. doi:10.1001/jama.286.4.415
OpenUrl CrossRef PubMed Web of Science
↵
1. Classen DC,
2. Lloyd RC,
3. Provost L, et al
. Development and evaluation of the Institute for Healthcare Improvement Global Trigger Tool. J Patient Saf 2008;4:169–77. doi:10.1097/PTS.0b013e318183a475
OpenUrl CrossRef
↵
1. Brennan TA,
2. Localio RJ,
3. Laird NL
. Reliability and validity of judgments concerning adverse events suffered by hospitalized patients. Med Care 1989;27:1148–58. doi:10.1097/00005650-198912000-00006
OpenUrl CrossRef PubMed Web of Science
↵
1. Hofer TP,
2. Bernstein SJ,
3. DeMonner S, et al
. Discussion between reviewers does not improve reliability of peer review of hospital quality. Med Care 2000;38:152–61. doi:10.1097/00005650-200002000-00005
OpenUrl CrossRef PubMed Web of Science
↵
GriffinFAResarRK I. Global Trigger Tool for measuring adverse events. 2nd edn. IHI Innovation Series white paper. Cambridge, MA: Institute for Healthcare Improvement, 2009.
↵
1. Schildmeijer K,
2. Nilsson L,
3. Arestedt K, et al
. Assessment of adverse events in medical care: lack of consistency between experienced teams using the Global Trigger Tool. BMJ Qual Saf 2012;21:307–14. doi:10.1136/bmjqs-2011-000279
OpenUrl Abstract/FREE Full Text
↵
1. Kennerly DA,
2. Saldana M,
3. Kudyakov R, et al
. Description and evaluation of adaptations to the Global Trigger Tool to enhance value to adverse event reduction efforts. J Patient Saf 2013;9:87–95. doi:10.1097/PTS.0b013e31827cdc3b
OpenUrl CrossRef PubMed Web of Science
↵
1. Najjar S,
2. Hamdan M,
3. Euwema MC, et al
. The Global Trigger Tool shows that one out of seven patients suffers harm in Palestinian hospitals: challenges for launching a strategic safety plan. Int J Qual Health Care 2013;25:640–7. doi:10.1093/intqhc/mzt066
OpenUrl Abstract/FREE Full Text
↵
1. Hwang JI,
2. Chin HJ,
3. Chang YS
. Characteristics associated with the occurrence of adverse events: a retrospective medical record review using the Global Trigger Tool in a fully digitalized tertiary teaching hospital in Korea. J Eval Clin Pract 2014;20:27–35. doi:10.1111/jep.12075
OpenUrl
↵
1. Baines RJ,
2. Langelaan M,
3. de Bruijne MC, et al
. Changes in adverse event rates in hospitals over time: a longitudinal retrospective patient record review study. BMJ Qual Saf 2013;22:290–8. doi:10.1136/bmjqs-2012-001126
OpenUrl Abstract/FREE Full Text
↵
1. Zegers M,
2. de Bruijne MC,
3. Wagner C, et al
. The inter-rater agreement of retrospective assessments of adverse events does not improve with two reviewers per patient record. J Clin Epidemiol 2010;63:94–102. doi:10.1016/j.jclinepi.2009.03.004
OpenUrl CrossRef PubMed Web of Science
↵
1. Thomas EJ,
2. Studdert DM,
3. Burstin HR, et al
. Incidence and types of adverse events and negligent care in Utah and Colorado. Med Care 2000;38:261–71. doi:10.1097/00005650-200003000-00003
OpenUrl CrossRef PubMed Web of Science
↵
1. Localio AR,
2. Weaver SL,
3. Landis JR, et al
. Identifying adverse events caused by medical care: degree of physician agreement in a retrospective chart review. Ann Intern Med 1996;125:457–64. doi:10.7326/0003-4819-125-6-199609150-00005
OpenUrl CrossRef PubMed Web of Science
↵
1. Mattsson TO,
2. Knudsen JL,
3. Lauritsen J, et al
. Assessment of the Global Trigger Tool to measure, monitor and evaluate patient safety in cancer patients: reliability concerns are raised. BMJ Qual Saf 2013;22:571–9. doi:10.1136/bmjqs-2012-001219
OpenUrl Abstract/FREE Full Text
↵
1. Kirkendall ES,
2. Kloppenborg E,
3. Papp J, et al
. Measuring adverse events and levels of harm in pediatric inpatients with the Global Trigger Tool. Pediatrics 2012;130:e1206–14. doi:10.1542/peds.2012-0179
OpenUrl Abstract/FREE Full Text
↵
1. Naessens JM,
2. O'Byrne TJ,
3. Johnson MG, et al
. Measuring hospital adverse events: assessing inter-rater reliability and trigger performance of the Global Trigger Tool. Int J Qual Health Care 2010;22:266–74. doi:10.1093/intqhc/mzq026
OpenUrl Abstract/FREE Full Text
↵
1. Sharek PJ,
2. Parry G,
3. Goldmann D, et al
. Performance characteristics of a methodology to quantify adverse events over time in hospitalized patients. Health Serv Res 2011;46:654–78. doi:10.1111/j.1475-6773.2010.01156.x
OpenUrl CrossRef PubMed Web of Science
↵
1. Matlow AG,
2. Cronin CM,
3. Flintoft V, et al
. Description of the development and validation of the Canadian Paediatric Trigger Tool. BMJ Qual Saf 2011;20:416–23. doi:10.1136/bmjqs.2010.041152
OpenUrl Abstract/FREE Full Text
↵
1. Hogan H,
2. Healey F,
3. Neale G, et al
. Preventable deaths due to problems in care in English acute hospitals: a retrospective case record review study. BMJ Qual Saf 2012;21:737–45. doi:10.1136/bmjqs-2011-001159
OpenUrl Abstract/FREE Full Text
↵
1. Soop M,
2. Fryksmark U,
3. Köster M, et al
. The incidence of adverse events in Swedish hospitals: a retrospective medical record review study. Int J Qual Health Care 2009;21:285–91. doi:10.1093/intqhc/mzp025
OpenUrl Abstract/FREE Full Text
↵
1. Forster AJ,
2. Asmis TR,
3. Clark HD, et al
. Ottawa Hospital Patient Safety Study: incidence and timing of adverse events in patients admitted to a Canadian teaching hospital. CMAJ 2004;170:1235–40. doi:10.1503/cmaj.1030683
OpenUrl Abstract/FREE Full Text
↵
1. Baker GR,
2. Norton PG,
3. Flintoft V, et al
. The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada. CMAJ 2004;170:1678–86. doi:10.1503/cmaj.1040498
OpenUrl Abstract/FREE Full Text
↵
1. Davis P,
2. Lay-Yee R,
3. Briant R, et al
. Adverse events in New Zealand public hospitals: principal findings from a national survey. Occasional paper no 3. New Zealand: Ministry of Health, 2001.
↵
1. Thomas EJ,
2. Lipsitz SR,
3. Studdert DM, et al
. The reliability of medical record review for estimating adverse event rates. Ann Intern Med 2002;136:812–16. doi:10.7326/0003-4819-136-11-200206040-00009
OpenUrl CrossRef PubMed Web of Science
↵
1. Wilson RM,
2. Runciman WB,
3. Gibberd RW, et al
. The quality in Australian health care study. Med J Aust 1995;163:458–71.
OpenUrl PubMed Web of Science
↵
1. Langelaan M,
2. Baines R,
3. Broekens M, et al
. Monitor zorggerelateerde schade 2008. Dossieronderzoek in Nederlandse ziekenhuizen (Patient files study in Dutch hospitals). Report of EMGO Institute & VUmc/NIVEL, Amsterdam/Utrecht. 2010.
↵
1. Zegers M,
2. De Bruijne M,
3. Wagner C, et al
. Adverse events and potentially preventable deaths in Dutch hospitals: results of a retrospective patient record review study. Qual Saf Health Care 2009;18:297–302. doi:10.1136/qshc.2007.025924
OpenUrl Abstract/FREE Full Text
↵
1. Davis P,
2. Lay-Yee R,
3. Briant R, et al
. Adverse events in New Zealand public hospitals II: preventability and clinical context. NZ Med J 2003;116:U624.
OpenUrl PubMed
↵
1. Leape LL,
2. Brennan TA,
3. Laird N, et al
. The nature of adverse events in hospitalized patients: results of the Harvard Medical Practice Study II. N Engl J Med 1991;324:377–84. doi:10.1056/NEJM199102073240605
OpenUrl CrossRef PubMed Web of Science
↵
1. Zegers M,
2. de Bruijne MC,
3. Spreeuwenberg P, et al
. Quality of patient record keeping: an indicator of the quality of care? BMJ Qual Saf 2011;20:314–18. doi:10.1136/bmjqs.2009.038976
OpenUrl Abstract/FREE Full Text
↵
1. Association AH,
2. Assiociation AS
. Facts Clinical Registries. Secondary Facts Clinical Registries. http://www.heart.org/idc/groups/heart-public/@wcm/@adv/documents/downloadable/ucm_432451.pdf (accessed 4 Dec 2015).
↵
1. Shojania KG,
2. Burton EC,
3. McDonald KM, et al
. Changes in rates of autopsy-detected diagnostic errors over time: a systematic review. JAMA 2003;289:2849–56. doi:10.1001/jama.289.21.2849
OpenUrl CrossRef PubMed Web of Science
↵
1. Michel P
. Strengths and weaknesses of available methods for assessing the nature and scale of harm caused by the health system: literature review. World Health Organization, 2003.
↵
Group WW. Patient safety: rapid assessment methods for estimating hazards. Report of the WHO Working Group meeting. Geneva, 17–19 December 2002.
↵
1. Lilford RJ,
2. Mohammed MA,
3. Braunholtz D, et al
. The measurement of active errors: methodological issues. Qual Saf Health Care 2003;12(Suppl 2):ii8–12. doi:10.1136/qhc.12.suppl_2.ii8
OpenUrl Abstract/FREE Full Text
↵
1. von Plessen C,
2. Kodal AM,
3. Anhøj J
. Experiences with Global Trigger Tool reviews in five Danish hospitals: an implementation study. BMJ Open 2012;2:e001324. doi:10.1136/bmjopen-2012-001324
OpenUrl Abstract/FREE Full Text
↵
1. Schildmeijer K,
2. Nilsson L,
3. Perk J, et al
. Strengths and weaknesses of working with the Global Trigger Tool method for retrospective record review: focus group interviews with team members. BMJ Open 2013;3:e003131. doi:10.1136/bmjopen-2013-003131
OpenUrl Abstract/FREE Full Text
↵
1. de Vet HC,
2. Mokkink LB,
3. Terwee CB, et al
. Clinicians are right not to like Cohen's κ. BMJ 2013;346:f2125. doi:10.1136/bmj.f2125
OpenUrl Abstract/FREE Full Text

Footnotes

Contributors MZ and HW conceived the idea for the study. MH-S and MZ led the writing of the paper as well as analysed and interpreted the data. CV advised on study design and approach. HCWdV supervised the data analysis. HCWdV and CV contributed to the writing of the paper. PJvG and HW participated in revising this manuscript. All authors contributed substantially to the writing of the paper, and all reviewed and approved the final draft.
Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.

[1] ↵
Andermann A,
Wu AW,
Lashoher A, et al
. Case studies of patient safety research classics to build research capacity in low- and middle-income countries. Jt Comm J Qual Patient Saf 2013;39:553–60.
OpenUrl

[2] Andermann A,

[3] Wu AW,

[4] Lashoher A, et al

[5] ↵
Duckers M,
Faber M,
Cruijsberg J, et al
. Safety and risk management interventions in hospitals: a systematic review of the literature. Med Care Res Rev 2009;66(6 Suppl):90S–119S. doi:10.1177/1077558709345870
OpenUrl Abstract/FREE Full Text

[6] Duckers M,

[7] Faber M,

[8] Cruijsberg J, et al

[9] ↵
Brennan TA,
Leape LL,
Laird NM, et al
. Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I. N Engl J Med 1991;324:370–6.
OpenUrl CrossRef PubMed Web of Science

[10] Brennan TA,

[11] Leape LL,

[12] Laird NM, et al

[13] ↵
Denis J-L
. Accountability in healthcare organizations and systems. Healthc Policy 2014;10:8–11.
OpenUrl

[14] Denis J-L

[15] ↵
Werner RM,
Asch DA
. The unintended consequences of publicly reporting quality information. JAMA 2005;293:1239–44. doi:10.1001/jama.293.10.1239
OpenUrl CrossRef PubMed Web of Science

[16] Werner RM,

[17] Asch DA

[18] ↵
Weingart SN,
Davis RB,
Palmer RH, et al
. Discrepancies between explicit and implicit review: physician and nurse assessments of complications and quality. Health Serv Res 2002;37:483–98. doi:10.1111/1475-6773.033
OpenUrl CrossRef PubMed Web of Science

[19] Weingart SN,

[20] Davis RB,

[21] Palmer RH, et al

[22] ↵
Kennerly DA,
Kudyakov R,
da Graca B, et al
. Characterization of adverse events detected in a large health care delivery system using an enhanced Global Trigger Tool over a five-year interval. Health Serv Res 2014;49:1407–25. doi:10.1111/1475-6773.12163
OpenUrl

[23] Kennerly DA,

[24] Kudyakov R,

[25] da Graca B, et al

[26] ↵
Rutberg H,
Borgstedt Risberg M,
Sjodahl R, et al
. Characterisations of adverse events detected in a university hospital: a 4-year study using the Global Trigger Tool method. BMJ Open 2014;4:e004879. doi:10.1136/bmjopen-2014-004879
OpenUrl Abstract/FREE Full Text

[27] Rutberg H,

[28] Borgstedt Risberg M,

[29] Sjodahl R, et al

[30] ↵
Christiaans-Dingelhoff I,
Smits M,
Zwaan L, et al
. To what extent are adverse events found in patient records reported by patients and healthcare professionals via complaints, claims and incident reports? BMC Health Serv Res 2011;11:49. doi:10.1186/1472-6963-11-49
OpenUrl CrossRef PubMed

[31] Christiaans-Dingelhoff I,

[32] Smits M,

[33] Zwaan L, et al

[34] ↵
Classen DC,
Resar R,
Griffin F, et al
. ‘Global Trigger Tool’ shows that adverse events in hospitals may be ten times greater than previously measured. Health Aff (Millwood) 2011;30:581–9. doi:10.1377/hlthaff.2011.0190
OpenUrl Abstract/FREE Full Text

[35] Classen DC,

[36] Resar R,

[37] Griffin F, et al

[38] ↵
Sari AB,
Sheldon TA,
Cracknell A, et al
. Extent, nature and consequences of adverse events: results of a retrospective casenote review in a large NHS hospital. Qual Saf Health Care 2007;16:434–9. doi:10.1136/qshc.2006.021154
OpenUrl Abstract/FREE Full Text

[39] Sari AB,

[40] Sheldon TA,

[41] Cracknell A, et al

[42] ↵
Vincent C,
Burnett S,
Carthey J
. The measurement and monitoring of safety. The Health Foundation, 2013.

[43] Vincent C,

[44] Burnett S,

[45] Carthey J

[46] ↵
Thomas EJ,
Petersen LA
. Measuring errors and adverse events in health care. J Gen Intern Med 2003;18:61–7. doi:10.1046/j.1525-1497.2003.20147.x
OpenUrl CrossRef PubMed Web of Science

[47] Thomas EJ,

[48] Petersen LA

[49] ↵
Tsang C,
Aylin P,
Palmer W
. Patient safety indicators: a systematic review of the literature. London, UK: Dr. Foster Unit, Imperial College. October 2008.

[50] Tsang C,

[51] Aylin P,

[52] Palmer W

[53] ↵
Murff HJ,
Patel VL,
Hripcsak G, et al
. Detecting adverse events for patient safety research: a review of current methodologies. J Biomed Inform 2003;36:131–43. doi:10.1016/j.jbi.2003.08.003
OpenUrl CrossRef PubMed Web of Science

[54] Murff HJ,

[55] Patel VL,

[56] Hripcsak G, et al

[57] ↵
Lilford R,
Edwards A,
Girling A, et al
. Inter-rater reliability of case-note audit: a systematic review. J Health Serv Res Policy 2007;12:173–80. doi:10.1258/135581907781543012
OpenUrl Abstract/FREE Full Text

[58] Lilford R,

[59] Edwards A,

[60] Girling A, et al

[61] ↵
Moher D,
Liberati A,
Tetzlaff J, et al
. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: the PRISMA statement. Ann Intern Med 2009;151:264–9. doi:10.7326/0003-4819-151-4-200908180-00135
OpenUrl CrossRef PubMed Web of Science

[62] Moher D,

[63] Liberati A,

[64] Tetzlaff J, et al

[65] Mokkink LB,
Terwee CB,
Patrick DL, et al
. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010;63:737–45. doi:10.1016/j.jclinepi.2010.02.006
OpenUrl CrossRef PubMed Web of Science

[66] Mokkink LB,

[67] Terwee CB,

[68] Patrick DL, et al

[69] Vet HCWd,
Terwee CB,
Mokkink LB, et al
. Measurement in medicine. A practical guide. Cambridge: Cambridge University Press, 2011.

[70] Vet HCWd,

[71] Terwee CB,

[72] Mokkink LB, et al

[73] ↵
Cosmin. Secondary. http://www.cosmin.nl/ (accessed 4 Dec 2015).

[74] ↵
Terwee CB,
Mokkink LB,
Knol DL, et al
. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res 2012;21:651–7. doi:10.1007/s11136-011-9960-1
OpenUrl CrossRef PubMed Web of Science

[75] Terwee CB,

[76] Mokkink LB,

[77] Knol DL, et al

[78] ↵
Mokkink LB,
Terwee CB,
Patrick DL, et al
. The COSMIN checklist manual. Amsterdam: VU University Medical Centre, 2009.

[79] Mokkink LB,

[80] Terwee CB,

[81] Patrick DL, et al

[82] ↵
Landis JR,
Koch GG
. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 1977;33:363–74. doi:10.2307/2529786
OpenUrl CrossRef PubMed Web of Science

[83] Landis JR,

[84] Koch GG

[85] ↵
Flynn EA,
Barker KN,
Pepper GA, et al
. Comparison of methods for detecting medication errors in 36 hospitals and skilled-nursing facilities. Am J Health Syst Pharm 2002;59:436–46.
OpenUrl Abstract/FREE Full Text

[86] Flynn EA,

[87] Barker KN,

[88] Pepper GA, et al

[89] ↵
Forster AJ,
Taljaard M,
Bennett C, et al
. Reliability of the peer-review process for adverse event rating. PLoS One 2012;7:e41239. doi:10.1371/journal.pone.0041239
OpenUrl PubMed

[90] Forster AJ,

[91] Taljaard M,

[92] Bennett C, et al

[93] ↵
Forster AJ,
O'Rourke K,
Shojania KG, et al
. Combining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification. J Clin Epidemiol 2007;60:892–901. doi:10.1016/j.jclinepi.2006.11.019
OpenUrl CrossRef PubMed Web of Science

[94] Forster AJ,

[95] O'Rourke K,

[96] Shojania KG, et al

[97] ↵
Nettleman MD,
Nelson AP
. Adverse occurrences during hospitalization on a general medicine service. Clin Perform Qual Health Care 1994;2:67–72.
OpenUrl PubMed

[98] Nettleman MD,

[99] Nelson AP

[100] ↵
Michel P,
Quenon JL,
Djihoud A, et al
. French national survey of inpatient adverse events prospectively assessed with ward staff. Qual Saf Health Care 2007;16:369–77. doi:10.1136/qshc.2005.016964
OpenUrl Abstract/FREE Full Text

[101] Michel P,

[102] Quenon JL,

[103] Djihoud A, et al

[104] ↵
Michel P,
Quenon JL,
de Sarasqueta AM, et al
. Comparison of three methods for estimating rates of adverse events and rates of preventable adverse events in acute care hospitals. BMJ 2004;328:199. doi:10.1136/bmj.328.7433.199
OpenUrl Abstract/FREE Full Text

[105] Michel P,

[106] Quenon JL,

[107] de Sarasqueta AM, et al

[108] ↵
Hayward RA,
Hofer TP
. Estimating hospital deaths due to medical errors: preventability is in the eye of the reviewer. JAMA 2001;286:415–20. doi:10.1001/jama.286.4.415
OpenUrl CrossRef PubMed Web of Science

[109] Hayward RA,

[110] Hofer TP

[111] ↵
Classen DC,
Lloyd RC,
Provost L, et al
. Development and evaluation of the Institute for Healthcare Improvement Global Trigger Tool. J Patient Saf 2008;4:169–77. doi:10.1097/PTS.0b013e318183a475
OpenUrl CrossRef

[112] Classen DC,

[113] Lloyd RC,

[114] Provost L, et al

[115] ↵
Brennan TA,
Localio RJ,
Laird NL
. Reliability and validity of judgments concerning adverse events suffered by hospitalized patients. Med Care 1989;27:1148–58. doi:10.1097/00005650-198912000-00006
OpenUrl CrossRef PubMed Web of Science

[116] Brennan TA,

[117] Localio RJ,

[118] Laird NL

[119] ↵
Hofer TP,
Bernstein SJ,
DeMonner S, et al
. Discussion between reviewers does not improve reliability of peer review of hospital quality. Med Care 2000;38:152–61. doi:10.1097/00005650-200002000-00005
OpenUrl CrossRef PubMed Web of Science

[120] Hofer TP,

[121] Bernstein SJ,

[122] DeMonner S, et al

[123] ↵
GriffinFAResarRK I. Global Trigger Tool for measuring adverse events. 2nd edn. IHI Innovation Series white paper. Cambridge, MA: Institute for Healthcare Improvement, 2009.

[124] ↵
Schildmeijer K,
Nilsson L,
Arestedt K, et al
. Assessment of adverse events in medical care: lack of consistency between experienced teams using the Global Trigger Tool. BMJ Qual Saf 2012;21:307–14. doi:10.1136/bmjqs-2011-000279
OpenUrl Abstract/FREE Full Text

[125] Schildmeijer K,

[126] Nilsson L,

[127] Arestedt K, et al

[128] ↵
Kennerly DA,
Saldana M,
Kudyakov R, et al
. Description and evaluation of adaptations to the Global Trigger Tool to enhance value to adverse event reduction efforts. J Patient Saf 2013;9:87–95. doi:10.1097/PTS.0b013e31827cdc3b
OpenUrl CrossRef PubMed Web of Science

[129] Kennerly DA,

[130] Saldana M,

[131] Kudyakov R, et al

[132] ↵
Najjar S,
Hamdan M,
Euwema MC, et al
. The Global Trigger Tool shows that one out of seven patients suffers harm in Palestinian hospitals: challenges for launching a strategic safety plan. Int J Qual Health Care 2013;25:640–7. doi:10.1093/intqhc/mzt066
OpenUrl Abstract/FREE Full Text

[133] Najjar S,

[134] Hamdan M,

[135] Euwema MC, et al

[136] ↵
Hwang JI,
Chin HJ,
Chang YS
. Characteristics associated with the occurrence of adverse events: a retrospective medical record review using the Global Trigger Tool in a fully digitalized tertiary teaching hospital in Korea. J Eval Clin Pract 2014;20:27–35. doi:10.1111/jep.12075
OpenUrl

[137] Hwang JI,

[138] Chin HJ,

[139] Chang YS

[140] ↵
Baines RJ,
Langelaan M,
de Bruijne MC, et al
. Changes in adverse event rates in hospitals over time: a longitudinal retrospective patient record review study. BMJ Qual Saf 2013;22:290–8. doi:10.1136/bmjqs-2012-001126
OpenUrl Abstract/FREE Full Text

[141] Baines RJ,

[142] Langelaan M,

[143] de Bruijne MC, et al

[144] ↵
Zegers M,
de Bruijne MC,
Wagner C, et al
. The inter-rater agreement of retrospective assessments of adverse events does not improve with two reviewers per patient record. J Clin Epidemiol 2010;63:94–102. doi:10.1016/j.jclinepi.2009.03.004
OpenUrl CrossRef PubMed Web of Science

[145] Zegers M,

[146] de Bruijne MC,

[147] Wagner C, et al

[148] ↵
Thomas EJ,
Studdert DM,
Burstin HR, et al
. Incidence and types of adverse events and negligent care in Utah and Colorado. Med Care 2000;38:261–71. doi:10.1097/00005650-200003000-00003
OpenUrl CrossRef PubMed Web of Science

[149] Thomas EJ,

[150] Studdert DM,

[151] Burstin HR, et al

[152] ↵
Localio AR,
Weaver SL,
Landis JR, et al
. Identifying adverse events caused by medical care: degree of physician agreement in a retrospective chart review. Ann Intern Med 1996;125:457–64. doi:10.7326/0003-4819-125-6-199609150-00005
OpenUrl CrossRef PubMed Web of Science

[153] Localio AR,

[154] Weaver SL,

[155] Landis JR, et al

[156] ↵
Mattsson TO,
Knudsen JL,
Lauritsen J, et al
. Assessment of the Global Trigger Tool to measure, monitor and evaluate patient safety in cancer patients: reliability concerns are raised. BMJ Qual Saf 2013;22:571–9. doi:10.1136/bmjqs-2012-001219
OpenUrl Abstract/FREE Full Text

[157] Mattsson TO,

[158] Knudsen JL,

[159] Lauritsen J, et al

[160] ↵
Kirkendall ES,
Kloppenborg E,
Papp J, et al
. Measuring adverse events and levels of harm in pediatric inpatients with the Global Trigger Tool. Pediatrics 2012;130:e1206–14. doi:10.1542/peds.2012-0179
OpenUrl Abstract/FREE Full Text

[161] Kirkendall ES,

[162] Kloppenborg E,

[163] Papp J, et al

[164] ↵
Naessens JM,
O'Byrne TJ,
Johnson MG, et al
. Measuring hospital adverse events: assessing inter-rater reliability and trigger performance of the Global Trigger Tool. Int J Qual Health Care 2010;22:266–74. doi:10.1093/intqhc/mzq026
OpenUrl Abstract/FREE Full Text

[165] Naessens JM,

[166] O'Byrne TJ,

[167] Johnson MG, et al

[168] ↵
Sharek PJ,
Parry G,
Goldmann D, et al
. Performance characteristics of a methodology to quantify adverse events over time in hospitalized patients. Health Serv Res 2011;46:654–78. doi:10.1111/j.1475-6773.2010.01156.x
OpenUrl CrossRef PubMed Web of Science

[169] Sharek PJ,

[170] Parry G,

[171] Goldmann D, et al

[172] ↵
Matlow AG,
Cronin CM,
Flintoft V, et al
. Description of the development and validation of the Canadian Paediatric Trigger Tool. BMJ Qual Saf 2011;20:416–23. doi:10.1136/bmjqs.2010.041152
OpenUrl Abstract/FREE Full Text

[173] Matlow AG,

[174] Cronin CM,

[175] Flintoft V, et al

[176] ↵
Hogan H,
Healey F,
Neale G, et al
. Preventable deaths due to problems in care in English acute hospitals: a retrospective case record review study. BMJ Qual Saf 2012;21:737–45. doi:10.1136/bmjqs-2011-001159
OpenUrl Abstract/FREE Full Text

[177] Hogan H,

[178] Healey F,

[179] Neale G, et al

[180] ↵
Soop M,
Fryksmark U,
Köster M, et al
. The incidence of adverse events in Swedish hospitals: a retrospective medical record review study. Int J Qual Health Care 2009;21:285–91. doi:10.1093/intqhc/mzp025
OpenUrl Abstract/FREE Full Text

[181] Soop M,

[182] Fryksmark U,

[183] Köster M, et al

[184] ↵
Forster AJ,
Asmis TR,
Clark HD, et al
. Ottawa Hospital Patient Safety Study: incidence and timing of adverse events in patients admitted to a Canadian teaching hospital. CMAJ 2004;170:1235–40. doi:10.1503/cmaj.1030683
OpenUrl Abstract/FREE Full Text

[185] Forster AJ,

[186] Asmis TR,

[187] Clark HD, et al

[188] ↵
Baker GR,
Norton PG,
Flintoft V, et al
. The Canadian Adverse Events Study: the incidence of adverse events among hospital patients in Canada. CMAJ 2004;170:1678–86. doi:10.1503/cmaj.1040498
OpenUrl Abstract/FREE Full Text

[189] Baker GR,

[190] Norton PG,

[191] Flintoft V, et al

[192] ↵
Davis P,
Lay-Yee R,
Briant R, et al
. Adverse events in New Zealand public hospitals: principal findings from a national survey. Occasional paper no 3. New Zealand: Ministry of Health, 2001.

[193] Davis P,

[194] Lay-Yee R,

[195] Briant R, et al

[196] ↵
Thomas EJ,
Lipsitz SR,
Studdert DM, et al
. The reliability of medical record review for estimating adverse event rates. Ann Intern Med 2002;136:812–16. doi:10.7326/0003-4819-136-11-200206040-00009
OpenUrl CrossRef PubMed Web of Science

[197] Thomas EJ,

[198] Lipsitz SR,

[199] Studdert DM, et al

[200] ↵
Wilson RM,
Runciman WB,
Gibberd RW, et al
. The quality in Australian health care study. Med J Aust 1995;163:458–71.
OpenUrl PubMed Web of Science

[201] Wilson RM,

[202] Runciman WB,

[203] Gibberd RW, et al

[204] ↵
Langelaan M,
Baines R,
Broekens M, et al
. Monitor zorggerelateerde schade 2008. Dossieronderzoek in Nederlandse ziekenhuizen (Patient files study in Dutch hospitals). Report of EMGO Institute & VUmc/NIVEL, Amsterdam/Utrecht. 2010.

[205] Langelaan M,

[206] Baines R,

[207] Broekens M, et al

[208] ↵
Zegers M,
De Bruijne M,
Wagner C, et al
. Adverse events and potentially preventable deaths in Dutch hospitals: results of a retrospective patient record review study. Qual Saf Health Care 2009;18:297–302. doi:10.1136/qshc.2007.025924
OpenUrl Abstract/FREE Full Text

[209] Zegers M,

[210] De Bruijne M,

[211] Wagner C, et al

[212] ↵
Davis P,
Lay-Yee R,
Briant R, et al
. Adverse events in New Zealand public hospitals II: preventability and clinical context. NZ Med J 2003;116:U624.
OpenUrl PubMed

[213] Davis P,

[214] Lay-Yee R,

[215] Briant R, et al

[216] ↵
Leape LL,
Brennan TA,
Laird N, et al
. The nature of adverse events in hospitalized patients: results of the Harvard Medical Practice Study II. N Engl J Med 1991;324:377–84. doi:10.1056/NEJM199102073240605
OpenUrl CrossRef PubMed Web of Science

[217] Leape LL,

[218] Brennan TA,

[219] Laird N, et al

[220] ↵
Zegers M,
de Bruijne MC,
Spreeuwenberg P, et al
. Quality of patient record keeping: an indicator of the quality of care? BMJ Qual Saf 2011;20:314–18. doi:10.1136/bmjqs.2009.038976
OpenUrl Abstract/FREE Full Text

[221] Zegers M,

[222] de Bruijne MC,

[223] Spreeuwenberg P, et al

[224] ↵
Association AH,
Assiociation AS
. Facts Clinical Registries. Secondary Facts Clinical Registries. http://www.heart.org/idc/groups/heart-public/@wcm/@adv/documents/downloadable/ucm_432451.pdf (accessed 4 Dec 2015).

[225] Association AH,

[226] Assiociation AS

[227] ↵
Shojania KG,
Burton EC,
McDonald KM, et al
. Changes in rates of autopsy-detected diagnostic errors over time: a systematic review. JAMA 2003;289:2849–56. doi:10.1001/jama.289.21.2849
OpenUrl CrossRef PubMed Web of Science

[228] Shojania KG,

[229] Burton EC,

[230] McDonald KM, et al

[231] ↵
Michel P
. Strengths and weaknesses of available methods for assessing the nature and scale of harm caused by the health system: literature review. World Health Organization, 2003.

[232] Michel P

[233] ↵
Group WW. Patient safety: rapid assessment methods for estimating hazards. Report of the WHO Working Group meeting. Geneva, 17–19 December 2002.

[234] ↵
Lilford RJ,
Mohammed MA,
Braunholtz D, et al
. The measurement of active errors: methodological issues. Qual Saf Health Care 2003;12(Suppl 2):ii8–12. doi:10.1136/qhc.12.suppl_2.ii8
OpenUrl Abstract/FREE Full Text

[235] Lilford RJ,

[236] Mohammed MA,

[237] Braunholtz D, et al

[238] ↵
von Plessen C,
Kodal AM,
Anhøj J
. Experiences with Global Trigger Tool reviews in five Danish hospitals: an implementation study. BMJ Open 2012;2:e001324. doi:10.1136/bmjopen-2012-001324
OpenUrl Abstract/FREE Full Text

[239] von Plessen C,

[240] Kodal AM,

[241] Anhøj J

[242] ↵
Schildmeijer K,
Nilsson L,
Perk J, et al
. Strengths and weaknesses of working with the Global Trigger Tool method for retrospective record review: focus group interviews with team members. BMJ Open 2013;3:e003131. doi:10.1136/bmjopen-2013-003131
OpenUrl Abstract/FREE Full Text

[243] Schildmeijer K,

[244] Nilsson L,

[245] Perk J, et al

[246] ↵
de Vet HC,
Mokkink LB,
Terwee CB, et al
. Clinicians are right not to like Cohen's κ. BMJ 2013;346:f2125. doi:10.1136/bmj.f2125
OpenUrl Abstract/FREE Full Text

[247] de Vet HC,

[248] Mokkink LB,

[249] Terwee CB, et al

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Strengths and limitations of this study

Introduction

Methods

Search strategy and databases

Supplementary appendix

Supplementary appendix

Selection criteria and process

Terminology and definitions

Quality assessment

Data extraction

Data synthesis and analysis

Results

Results of the literature search

Supplementary appendix

Description of the GTT and the HMPS

Characteristics and methodological quality of included studies

Supplementary appendix

Supplementary appendix

Reliability of the GTT

Reliability of the HMPS

Subgroup analysis inter-rater reliability

Validity

Discussion

Acknowledgments

References

Footnotes

Read the full text or download the PDF:

Log in using your username and password