Automated detection of harm in healthcare with information technology: a systematic review
- 1The Dartmouth Institute for Health Policy and Clinical Practice, Center for Leadership and Improvement, Dartmouth Medical School, Lebanon, New Hampshire, USA
- 2Department of Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire, USA
- 3Society for Hospital Medicine, Philadelphia, Pennsylvania, USA
- 4Department of Pediatrics, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire, USA
- Correspondence to Dr Gautham Suresh, Dartmouth-Hitchcock Medical Center, One Medical Center Drive, Lebanon, NH 03756, USA;
- Accepted 14 July 2009
- Published Online First 29 July 2010
Context To improve patient safety, healthcare facilities are focussing on reducing patient harm. Automated harm-detection methods using information technology show promise for efficiently measuring harm. However, there have been few systematic reviews of their effectiveness.
Objective To perform a systematic literature review to identify, describe and evaluate effectiveness of automated inpatient harm-detection methods.
Methods Data sources included MEDLINE and CINAHL databases indexed through August 2008, extended by bibliographic review and search of citing articles. The authors included articles reporting effectiveness of automated inpatient harm-detection methods, as compared with other detection methods. Two independent reviewers used a standardised abstraction sheet to extract data about automated and comparison harm-detection methods, patient samples and events identified. Differences were resolved by discussion.
Results From 176 articles, 43 articles met inclusion criteria: 39 describing field-defined methods, two using natural language processing and two using both methods. Twenty-one studies used automated methods to detect adverse drug events, 10 detected general adverse events, eight detected nosocomial infections, and four detected other specific adverse events. Compared with gold standard chart review, sensitivity and specificity of automated harm-detection methods ranged from 0.10 to 0.94 and 0.23 to 0.98, respectively. Studies used heterogeneous methods that often were flawed.
Conclusion Automated methods of harm detection are feasible and some can potentially detect patient harm efficiently. However, effectiveness varied widely, and most studies had methodological weaknesses. More work is needed to develop and assess these tools before they can yield accurate estimates of harm that can be reliably interpreted and compared.
It is widely recognised that harm caused by the healthcare system is a major source of morbidity and mortality in hospitalised patients.1 An estimated 15 million instances of medical harm occur in the USA every year.2 However, the lack of simple, practical and accurate methods to identify adverse events in hospitals has hampered efforts to develop routine monitoring systems, assess the impact of interventions to prevent harm and compare interhospital performance.
Detecting incidence and types of patient harm are prerequisites for implementing strategies to prevent harm. Manual, comprehensive chart review by trained professionals has been used in key studies and can be considered the gold-standard harm-detection method.3–6 However, this approach requires time and trained abstractors, thereby decreasing its feasibility as a pragmatic method for routine measurement of adverse events.
Several organisations are currently using the Institute for Healthcare Improvement's Global Trigger Tool, which is based on manual chart review, and allows targeted chart review to identify harm more efficiently than comprehensive chart review and more extensively than voluntary reporting of harm.
Automated strategies of harm detection that use computerised methods to scan patient records may require fewer time and personnel resources than traditional methods, and can potentially provide real-time surveillance alerts. We performed this review to: (1) identify types of automated methods of inpatient harm detection described in published literature, (2) describe types of events identified by these methods and (3) evaluate accuracy of these methods in identifying harm. We also independently evaluated the quality and validity of key studies.
In this review, we used the terms harm, automated harm detection and gold standard chart review as defined in Box 1.
Poor patient outcome resulting from medical care rather than the natural history of the disease, whether or not it was preventable. This term includes adverse medical events (ie, falls, nosocomial infections), adverse drug events and adverse surgical events (ie, postoperative infections, surgical complications). It excludes medical errors that did not result in injury to patients.
Automated harm-detection method
A method of rapidly searching a large number of patient medical records with a computerised tool to identify actual harm, or indicators (associations) of harm. Records and events identified through computerised screening may then be subjected to further scrutiny by electronic or manual means to verify harm. We defined two degrees of automation: (1) fully automated methods, in which identification of harm was not followed by further chart review, and (2) partially automated methods, in which identified patient records were manually reviewed to verify harm.
Gold standard chart review
Manual review of the medical record initially by trained personnel, with subsequent review by either a physician or clinical pharmacist to confirm the presence or absence of harm and characteristics of such harm.
Data sources/study selection
We (MG and AVC) identified articles for this review through a literature search of MEDLINE (start date 1950) and CINAHL (start date 1982) using the following search terms: (harm OR adverse event OR adverse drug event OR nosocomial infection) AND (automated OR computerised OR electronic) AND (identify OR detect OR detection OR recognise OR recognition). We identified additional articles using bibliographic review of key articles, the ‘related articles’ feature of Medline, and the ‘find similar’ and ‘find citing articles’ feature of CINAHL. We reviewed the title and abstract of each article, and obtained the full text of relevant articles. We limited our search to English language articles indexed through 31 August 2008.
We included studies that: (a) occurred in an inpatient setting, (b) described an automated harm-detection method, (c) measured actual harm and (d) compared the automated method to an alternative method of harm detection.
Data extraction and analysis
We developed and tested a standardised data form and extracted the following variables from included articles: details of patient sample, methodology used for automated harm detection, nature of events identified, description of alternative method of harm detection and comparisons of events detected by automated and alternative methods. Data were extracted by MG and AVC, with uncertainties resolved by discussion and consensus.
We critically appraised each study that compared the automated method of harm detection to a gold standard chart review using published criteria for validity of diagnostic test studies.7 We assessed each study for: (a) independent, blind comparison of the automated method with a gold standard method, (b) performance of the gold standard assessment regardless of the automated method's results and (c) validation of the assessment in a second, independent set of patients.
If studies provided adequate data, we independently calculated the sensitivity, specificity and positive and negative predictive values of the automated harm-detection method.
Selection of articles
One hundred and seventy-six articles were reviewed for potential inclusion, of which 43 provided information on validity of automated methods of harm detection.8–50 The remaining articles were excluded because they: were review articles on harm-detection methodologies (n=9)51–59; did not focus on detection of harm (n=26) or automated methods (n=22); did not include a comparison group (n=17); were not limited to inpatients (n=13); were descriptive papers of a program, incident reporting system, algorithm or computer simulation (n=33); were commentaries or editorials (n=11); or were repeat publications (n=2).
The methodologies and results from the 43 included studies are described in online appendix 1. Of these, 14 studies compared the automated harm-detection methodology to a gold standard chart review, and their methods and results are summarised in tables 1 2.
As shown in online appendix 1, 20 studies were conducted among adult populations, three in paediatric patients, two among all age groups, one in geriatric patients, one among Medicare beneficiaries and one among patients 14 years and older. The most common hospital settings were general medical units (n=14), followed by general surgical units (n=8), medical, surgical or general intensive care units (n=8), medical subspecialties (n=3), neonatal and paediatric intensive care units (n=3) and obstetric units (n=2). The target population and setting were unstated in 15 studies.
Data sources for automated harm-detection methods
Automated harm-detection methods were classified into field-defined and natural language-processing systems. Field-defined systems relied on computerised detection using pre-existing numeric or coded data stored in medical records. Natural language processing relied on computerised analysis of free text within a medical record to detect language indicative of harm. Field-defined and natural language-processing systems are described in table 3.
Forty-one of 43 studies used field-defined systems for automated harm detection. The nature of the programs, databases used, data fields used and types of harm detected within this category were source-specific. Typical sources of data for field-defined programs included laboratory, radiology, microbiology, pharmacy, and administrative and billing databases. Five of 43 studies used natural language-processing systems. The most common source of data was discharge summaries. Radiology reports, chart text, daily progress notes, consultation notes, nursing records, and procedure or operative reports also were used.
Degree of automation
Twenty-five studies (58%) reported on detection tools that were partially automated,8–14 21–25 ,31 ,32 ,34–38 ,40 ,45–48 ,50 14 studies (33%) described fully automated tools,15–17 ,19 ,26–30 ,33 ,41 ,42 ,44 ,49 and one study (2%) reported both fully and partially automated systems.20 The degree of automation was unclear in three reports (7%).18 ,39 ,43
Types of events identified
Automated methods for detecting harm predominantly focused on identification of adverse drug events (ADEs) (n=21, 49%).11 ,12 ,18 ,21–26 29–32 35–38 ,43 ,45 ,50 Ten automated methods (23%) focused on general adverse events,8–10 ,19 ,33 ,34 ,40 ,46–48 eight (19%) focused on nosocomial infection,14 ,20 ,28 ,39 ,41 ,42 ,44 ,49 and four (9%) focused on other specific adverse events (eg, decubitus ulcers, surgical complications).13 ,15 ,17 ,27
Accuracy of automated harm-detection methods
Only 14 studies15 ,17 ,18 ,20 ,22 ,23 ,26 ,30 ,32–34 ,44 ,47 ,48 compared an automated harm-detection method with ‘gold-standard’ adverse event detection and were eligible for critical appraisal of validity (table 2). Methodologies used to evaluate these automated systems were heterogeneous. Seven studies (50%) applied the gold standard using independent, blind evaluators. Eight studies (57%) applied the gold standard independently of the outcome from the automated method. One study (7%) validated the results of the automated method in an independent, second set of patients.
Table 4 shows the sensitivity, specificity, and positive and negative predictive values of the automated methods that were compared against a gold standard chart review. Sensitivities of different methods ranged from 0.10 to 0.94, and specificities ranged from 0.23 to 0.98. Positive predictive values ranged from 0.03 to 0.84, and negative predictive values ranged from 0.70 to 0.96. Our independent assessment of validity allowed us to verify all published values for nine of the 14 studies that reported validity data.15 ,17 ,22 ,23 ,30 ,33 ,34 ,48 Figure 1 displays the sensitivity and 1-specificity intersection points of methods used in these studies in a format similar to that of a receiver-operating characteristic curve.
Strategies to improve patient safety require efficient and accurate detection of patient harm. Automated methods of harm detection have been used for this purpose because they offer the potential to rapidly scan patient records with minimal human effort. This systematic review describes types of automated methods of harm detection used in inpatient settings, events identified by these methods and their accuracy.
We found two categories of automated harm detection described in the literature: field-defined systems (used in most studies) and natural language-processing systems. Most frequently laboratory, pharmacy and administrative databases were used to identify adverse drug events, general adverse events and nosocomial infections.
We found that the validity of studies describing automated harm-detection methods was variable. Of these studies, those attempting to identify ADEs 18 ,30 and nosocomial infections20 ,44 using field-defined methods, and one attempting to identify multiple types of adverse events33 using natural language processing satisfied more validity criteria than others. We believe that automated harm-detection methods will have more validity if they attempt to identify events that are discrete, easily and reliably detected, and consistently documented in the chart, such as adverse drug events, nosocomial infections, pressure ulcers and postoperative complications.
Automated harm detection has the potential to positively impact clinical practice. While most automated methods retrospectively identified harm, eight were paired with real-time surveillance alerts that informed physicians or pharmacists of an adverse event. Such prospective surveillance systems can alert the clinical team of impending or ongoing harm, thus allowing early intervention to limit harm. Real-time alerts were present within methods for detecting adverse drug events,11 ,21 ,23 ,26 ,35 ,45 general adverse events40 and nosocomial infection.14 Automated alerts were a component of the Health Evaluation through Logical Processing system11 ,14 and were incorporated within methods using automated lab signal detection,23 ,26 ,45 computer algorithms21 and other automated triggers.35 ,40
Another potential benefit of automated detection is the reduction of person-hours required for harm surveillance. Few studies14 ,21 ,22 ,32 ,34 ,38 ,40 ,44 provided information on financial or human resource requirements for implementing and maintaining automated detection tools. In general, the automated methods reviewed here require fewer person-hours than manual chart review. Field-defined strategies appear to be less technologically demanding than natural language-processing strategies. Sophisticated computer algorithms and natural language-processing programs require specialised subject knowledge, skill and time to develop, and require installation and instruction by experts.18 ,48 Whether costs to implement such programs are offset by savings from eliminating manual chart review and decreased patient harm is unknown and should be studied. Future studies also should quantify differences in time and personnel resources needed for the automated detection method, relative to other detection strategies.
To our knowledge, four of the 43 unique articles report on commercially available automated harm-detection systems (MedLEE,48 dtsearch desktop,34 Nosocomial Infection Marker (NIM)44 and Dynamic Pharmaco-Monitoring System45). Other articles report on systems that employ data elements common across medical institutions (ie, ICD-9 codes used in the Complications Screening Program8–10) use software available to the VA or specific states (ie, RADARx, NY Antimicrobial Resistance Project21 ,42) or are available through the Agency for Healthcare Research and Quality (ie, Patient Safety Indicators15–17). The availability of the remaining detection systems is either institution-specific or not made clear by their developers.
While automated tools offer promise for efficient and accurate harm detection, there are important limitations that currently make them unsuitable for widespread application, particularly for interhospital comparisons. The reported sensitivity and specificity are variable and often low, suggesting that many episodes of harm may go undetected, and that many events identified will be false positives. Low accuracy may result from limited capability of the tool to detect events, or from flawed sources of data used for automated harm detection. For example, the reliability of field-defined systems can be affected by data entry errors or limited availability and accuracy of administrative codes, while natural language processing is sensitive to spelling and grammatical errors in free text. Both systems may include irrelevant or erroneous information, or exclude necessary information. For example, perhaps driven by medical-legal concerns, health professionals often do not include information about medical errors and resulting adverse events in their progress notes, problem lists and discharge summaries. Thus, an electronic medical record containing accurate, complete and easily accessible information can enhance the performance of an automated detection tool. Understanding these factors is important when evaluating the technological requirements, feasibility and inherent limitations of automated detection methods.
The variety of distinct automated methodologies makes comparisons between studies and between automated tools difficult and unreliable. Differences in the quality and content of data sources, as well as other unknowns such as accuracy of hospital documentation and coding practices, also complicate comparisons. The performance and methods of automated tools also may be institution-specific, making it difficult to generalise to other organisations or patient populations. For example, the Health Evaluation through Logical Processing system used by LDS Hospital in Salt Lake City, Utah relies on an advanced, highly integrated and dynamic information system that is not widely available.11–14
We speculate that field-defined methods of automated harm detection will prove superior to natural language-processing methods, particularly if information about harm is accurately documented in electronic medical record systems in prespecified fields, thus allowing rapid and reliable detection of harm events.
The methodological rigour of studies was variable. Only two-thirds of the 14 studies that compared an automated method with a gold standard chart review had verifiable validity results. Moreover, most studies compared automated harm-detection methods with other sources of data on patient harm (eg, voluntary reporting,11–13 ,24 ,25 ,29 ,31 ,37 ,38 ,50 unstandardised chart reviews,8 ,10 ,14 ,28 ,36 ,41 ,43 ,45 and prospective surveillance records42 ,49). The validity of data from studies without chart review comparison is questionable given the absence of a defined denominator of events against which to measure the performance of the automated tool. The use of different methods, statistical analyses, denominator values and outcomes precludes a comparison of one automated method with another, as well as any attempt to statistically pool their results in a meta-analysis.
Other authors have summarised the literature on automated harm-detection methods, but most have focused on automated methods specific to a type of harm (ie, adverse drug events51 ,54 or nosocomial infections),59 patient population (ie, paediatrics),52 source of data (ie, administrative data)57 or automated technology (ie, natural language processing).58 Our systematic review included all types of automated methods, harm events and sources of data evaluated in an inpatient setting. Furthermore, we provide an additional level of critical appraisal compared with other systematic reviews.55 ,56 For example, while Bates et al55 address differences between study methodologies by noting the presence or absence of gold standard comparison, they do not assess validity of studies or independently verify reported data. To our knowledge, this is the first systematic review to critically assess methodological rigour and study validity.
While our review has several strengths, it also has limitations. First, the search strategy was limited to published English language articles. Second, we did not evaluate scientific meeting abstracts, nor did we contact investigators to identify unpublished studies. Third, publication bias must be considered in which studies with negative findings may not have reached dissemination venues. Fourth, most of the articles evaluated automated methods of harm detection among adults in general medical or surgical units, which may limit application to other populations and settings. Finally, our independent appraisal of the methodology and validity of key studies relied on information available within published articles. Our inability to verify the rigour and validity of all studies highlights the variation among even the most rigorous evaluations.
In conclusion, our review identified numerous automated methods of harm detection in two broad categories—field-defined methods and natural language processing—that identified a broad range of harm events, but particularly adverse drug events and nosocomial infections. Although many of these studies described the accuracy (sensitivity and specificity) of automated harm detection when compared with chart review, these results may not be valid due to methodological flaws in the conduct of many of these studies. Future studies assessing the performance of automated harm-detection methods should ensure that the gold-standard assessment (usually chart review) is performed by a blinded assessor, the gold-standard is applied independently of the results of the automated method (ie, charts not flagged by the automated method are reviewed for false negatives), and the automated method is tested in a set of patients that is independent of the set used to develop the automated method. Finally, efforts should be made to improve documentation of harm episodes in the patient record, in problem lists and when generating diagnosis codes, in order to improve automated harm detection. Future research should also focus on developing methods for real-time harm detection. In this way, automated harm-detection tools will realise their potential to describe accurately the incidence of harm in hospitalised patients, monitor changes from preventive interventions, and compare institutions and individual health professionals. Establishing universal standards and guidelines for the development, testing and utilisation of automated harm-detection methods, perhaps through a centralised agency, would allow data to be collected and compared in a rigorous, systematic fashion.
Automated methods of harm detection are feasible, allow rapid scanning of a large number of patient records with minimal effort and have the potential to identify events as they occur or soon thereafter. However, the heterogeneity of automated methodologies, the spectrum of study rigour and the widely varying accuracy data suggest that currently available automated methods poorly measure the true incidence of harm. These methods cannot replace chart review as the gold standard but can provide estimates of the frequency of harm that can allow hospitals to identify priorities for action, make decisions about safety interventions and potentially monitor change over time. As automated harm-detection tools and scientific methods to test them evolve, there exists a great potential to positively impact patient safety.
We are grateful for the administrative support provided by the Institute for Healthcare Improvement.
Funding Funding for the literature review was provided by the Institute for Healthcare Improvement (IHI) to MG and ADVC. Subsequent data analysis and interpretation, as well as conceptualisation, preparation, and review of the manuscript were not financially supported.
Competing interests JK-C was employed by Premier Inc. from 31 March 2007 to 2 July 2008. Premier has developed an automated event detection product, SafetySurveillor. This study does not reference or endorse this product. No other authors disclosed any potential conflicts of interest.
Provenance and peer review Not commissioned; externally peer reviewed.