Article Text

Adverse diagnostic events in hospitalised patients: a single-centre, retrospective cohort study
Free
  1. Anuj K Dalal1,2,3,
  2. Savanna Plombon1,3,
  3. Kaitlyn Konieczny1,
  4. Daniel Motta-Calderon1,4,
  5. Maria Malik1,5,
  6. Alison Garber1,6,
  7. Alyssa Lam1,
  8. Nicholas Piniella1,
  9. Marie Leeson1,
  10. Pamela Garabedian1,3,
  11. Abhishek Goyal1,3,
  12. Stephanie Roulier1,3,
  13. Cathy Yoon1,
  14. Julie M Fiskio3,
  15. Kumiko O Schnock1,2,
  16. Ronen Rozenblum1,2,
  17. Jacqueline Griffin7,
  18. Jeffrey L Schnipper1,2,3,
  19. Stuart Lipsitz1,2,
  20. David W Bates1,2,3
  21. Patient Safety Learning Laboratory Adjudicator Group
      1. 1 Department of Medicine, Division of General Internal Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
      2. 2 Harvard Medical School, Boston, Massachusetts, USA
      3. 3 Mass General Brigham, Boston, Massachusetts, USA
      4. 4 Vanderbilt University Medical Center, Nashville, Tennessee, USA
      5. 5 Dartmouth-Hitchcock Medical Center, Lebanon, Pennsylvania, USA
      6. 6 Columbia University Vagelos College of Physicians and Surgeons, New York, New York, USA
      7. 7 Department of Industrial Engineering, Northeastern University - Boston Campus, Boston, Massachusetts, USA
      1. Correspondence to Dr Anuj K Dalal; adalal1{at}bwh.harvard.edu

      Abstract

      Background Adverse event surveillance approaches underestimate the prevalence of harmful diagnostic errors (DEs) related to hospital care.

      Methods We conducted a single-centre, retrospective cohort study of a stratified sample of patients hospitalised on general medicine using four criteria: transfer to intensive care unit (ICU), death within 90 days, complex clinical events, and none of the aforementioned high-risk criteria. Cases in higher-risk subgroups were over-sampled in predefined percentages. Each case was reviewed by two adjudicators trained to judge the likelihood of DE using the Safer Dx instrument; characterise harm, preventability and severity; and identify associated process failures using the Diagnostic Error Evaluation and Research Taxonomy modified for acute care. Cases with discrepancies or uncertainty about DE or impact were reviewed by an expert panel. We used descriptive statistics to report population estimates of harmful, preventable and severely harmful DEs by demographic variables based on the weighted sample, and characteristics of harmful DEs. Multivariable models were used to adjust association of process failures with harmful DEs.

      Results Of 9147 eligible cases, 675 were randomly sampled within each subgroup: 100% of ICU transfers, 38.5% of deaths within 90 days, 7% of cases with complex clinical events and 2.4% of cases without high-risk criteria. Based on the weighted sample, the population estimates of harmful, preventable and severely harmful DEs were 7.2% (95% CI 4.66 to 9.80), 6.1% (95% CI 3.79 to 8.50) and 1.1% (95% CI 0.55 to 1.68), respectively. Harmful DEs were frequently characterised as delays (61.9%). Severely harmful DEs were frequent in high-risk cases (55.1%). In multivariable models, process failures in assessment, diagnostic testing, subspecialty consultation, patient experience, and history were significantly associated with harmful DEs.

      Conclusions We estimate that a harmful DE occurred in 1 of every 14 patients hospitalised on general medicine, the majority of which were preventable. Our findings underscore the need for novel approaches for adverse DE surveillance.

      • Diagnostic errors
      • Adverse events, epidemiology and detection
      • Patient safety
      • Hospital medicine
      • Information technology

      Data availability statement

      Data are available upon reasonable request.

      Statistics from Altmetric.com

      Request Permissions

      If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

      WHAT IS ALREADY KNOWN ON THIS TOPIC

      • Adverse diagnostic events (DEs) are underrecognised in hospitalised patients using current surveillance approaches.

      WHAT THIS STUDY ADDS

      • Based on a weighted random sample and a structured electronic health record-based case review process using validated instruments for assessing the likelihood of harmful DE, it was estimated that about 7% of hospitalised patients who received general medical care experienced an adverse DE within 90 days of admission.

      HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

      • Patient safety research and institutional quality and safety programmes should consider structured case reviews and novel approaches for improving detection of adverse DEs for hospitalised patients.

      Introduction

      Diagnostic errors (DEs), defined by the National Academy of Medicine (NAM) as “the failure to (a) establish an accurate and timely explanation of the patient’s health problem(s) or (b) communicate that explanation to the patient”,1 are difficult to detect and characterise. Consequently, their spectrum of harm is variable and underrecognised in patient safety research.1 In hospitalised patients, failures in diagnostic processes such as history-taking, testing or assessment often lead to adverse events (AEs) with severe and immediate impact, such as care escalation or death.2 They can also lead to less severe or more delayed impact.

      Systematic reviews of retrospective studies estimate that adverse diagnostic events occur in 0.7% of inpatients, but these are largely based on cohorts with severe outcomes only3; thus, these are likely underestimates.4 In a recent study which relied on the Institute for Healthcare Improvement (IHI) global trigger tool,5 Bates et al estimated that one in four hospitalisations were associated with an AE. Of nearly 1000 AEs detected across 11 hospitals, just 10 DEs (1%) were identified as culprits.6 These results suggest that current trigger tools alone are likely insensitive for detecting harmful DEs, including cases with less severe outcomes.5–7 Indeed, studies using the validated Safer Dx instrument have observed higher percentages of harmful DEs for hospitalised patients who were critically ill or readmitted.8–10

      Using the Safer Dx framework,11 we developed and validated a structured case review process to train clinicians to use the electronic health record (EHR) to evaluate the diagnostic process during the hospital encounter, assess the likelihood of DE, and characterise the impact and severity of harm.2 12 13 This process was validated in two cohorts of patients who expired in the hospital and detected harmful DEs in cases of death judged to be preventable and non-preventable by our institutional mortality review process.12 This process was further validated in a retrospective, multicentre study and found that harmful DEs occurred in 18% of hospitalised patients who died or transferred to the intensive care unit (ICU).13 14

      In this study, we evaluated a weighted sample of patients hospitalised on general medicine identified by querying the EHR using clinical screening criteria for inpatient DEs to estimate the prevalence of harmful DEs in this population.11 15 16 Secondarily, we sought to characterise the types of process failures associated with harmful DEs to enhance surveillance approaches and develop preventative interventions.17

      Methods

      Study design, setting and eligibility

      We conducted a retrospective cohort study, approved by the Mass General Brigham Humans Subjects Committee. Adult patients (>18 years old) hospitalised on the general medicine service for at least 24 hours at a large academic medical centre in Boston, MA, USA were identified by querying our enterprise data warehouse (EDW), which received nightly updates from our EHR (Epic Systems, Inc.) between July 2019 and September 2021. Patients who were admitted directly to hospice care (comfort measures only) were not identified by our queries.

      Cases were excluded if the patient had a length of stay (LOS) greater than 21 days as described in our validation study.12 Because of our focus on the diagnostic process related to general medical care, we excluded cases in which the patient was admitted to a general medicine team but received subspecialty care under supervision of a subspecialty attending. For example, we excluded patients typically admitted to a dedicated oncology team for chemotherapy or management of an oncologic complication but overflowed to a general medicine team and received care under the direction of an oncology service attending.

      Cases were also excluded if the patient was admitted between April 2020 and December 2020, the time period when our institution experienced major disruptions in hospital operations due to the COVID-19 pandemic. These disruptions included major changes to admitting teams, services and units requiring changes to our EHR starting in early April 2020, precluding our ability to accurately query our EDW and stratify eligible cases using the criteria defined below. For example, because ICU care was moved to a different location in our hospital, cases of patients who transferred to the ICU were misclassified by our EDW queries as many ICU units were repurposed as general medical-surgical units in the EHR. Other factors considered included changes in patient population (eg, demographics changes, declines in admissions18) and abrupt changes in team structures and composition (ie, clinicians caring for patients on a dedicated COVID-19 team or outside their expertise19) during the initial pandemic waves prior to the availability of vaccines.20 21 By December 2020, the infrastructure for caring for COVID-19 patients was established, hospital operations had started to normalise (COVID-19 patients were admitted to dedicated teams; elective surgeries resumed; declining need to repurpose clinical staff), vaccines were aggressively being distributed, and our EDW queries reflected correct team, service and unit assignments.

      Stratified case sampling approach

      Given the limited availability of data regarding harmful DE rates in hospitalised patients, we used a weighted case sampling approach to examine specific subgroups, drawing from the literature, emerging research and expert consensus regarding the predictive value of clinical screening criteria reported by Shenvi and El-Kareh.10 12 22 23 Cases that met eligibility criteria (figure 1) during each month of the study period were categorised into one of four subgroups. Cases were considered high-risk if the patient (1) transferred to ICU 24 hours or more after admission independent of death, (2) expired within 90 days of admission (either during the hospital encounter or after discharge identified using our EHR’s death date) or (3) had complex clinical events but did not transfer to the ICU or expire within 90 days of admission. Cases with complex clinical events had one of a group of triggers including clinical deterioration signs (eg, new or worsening oxygen requirement, acute kidney injury, etc.), multiple providers or consultants, unexpected events (eg, rapid response, surgery) or uncertain or discrepant diagnostic documentation, all which were more frequently observed in cases with DE in our validation study.12 23 Cases without the aforementioned high-risk criteria were considered low-risk (though these could still have an unexpected post-discharge event such as an emergency department visit or readmission). All classifications were made 90 days after admission to ensure appropriate categorisation.

      Figure 1

      Eligibility and sampling of cases to detect adverse diagnostic events. DEER, Diagnostic Error Evaluation and Research; EHR, electronic health record; ICU, intensive care unit; LOS, length of stay.

      A priori we hypothesised that each subgroup would have a different percentage of harmful DEs. Informed by subanalyses of local data from our validation study and autopsy studies, we expected these percentages to be 30% in the ICU transfer and 20% in the death within 90 days subgroups, respectively.12 22 For subgroups in which patients did not transfer to the ICU or expire, we assumed the expected percentage would be approximately 5%, largely based on a study of early readmissions by Raffel et al.10 We assumed a higher range (5–10%) in patients with complex clinical events,23 and a lower range (2–5%) in patients with no events.

      Our primary goal was to oversample higher-risk subgroups to gain more information about the probability of harmful DEs while creating a high-yield process for our adjudicators. Our secondary goal was to have sufficient sample size within each subgroup while oversampling the complex clinical events subgroup (on an absolute basis) relative to the others as it comprised a large subpopulation for which there was a dearth of information. Thus, eligible cases within each subgroup were randomly selected during each month of the study period in the following predefined percentages: all (100%) cases of patients who transferred to the ICU; one-third (33.3%) of cases of patients who expired; one-tenth (10%) of cases of patients with complex clinical events; and one-thirtieth (3.3%) of cases of patients without any of the aforementioned events. The number of cases sampled in any given month varied slightly due to different number of eligible cases in that month (ie, different monthly denominators for each subgroup). Over the full study period, our goal was that the percentages of cases sampled within each subgroup would approximate the target sampling percentage for that subgroup (see 'Sample size estimates').

      Structured case record reviews

      We employed NAM’s definition of DEs as missed diagnostic opportunities during the hospital encounter. As previously reported, adjudicators (hospitalists, advanced practice providers) were trained to use the revised Safer Dx instrument and modified Diagnostic Error Evaluation and Research (DEER) Taxonomy adapted for acute care to judge the presence of DE and identify process failures, respectively.2 12 24 25 Each adjudicator was required to review five training cases, complete a review form (online supplemental appendix A) and discuss each case with an expert hospitalist reviewer (AKD, JLS). Both expert reviewers were senior hospitalists with 20+ years of clinical experience and led development of the structured case review process used in this and concurrent studies.12–14 20

      Supplemental material

      All sampled cases were sent for independent review by two trained adjudicators. Case information, including the primary diagnoses at admission and discharge, and secondary diagnoses were abstracted. Admission notes, discharge summaries, consultant notes, escalation events (rapid response, code), nursing documentation preceding an escalation event, and objective data such as vital signs, medications orders and laboratory results were reviewed. Reviewers assessed the likelihood of DE independently of harm assessment.24 Using Likert ratings, harm severity was categorised as minor (mild symptoms, short-term loss of function, minimal intervention), moderate (symptomatic, requiring intervention, increased length of stay, long-term loss of function), major (symptomatic, requiring life-saving, surgical or medical intervention, shortening life expectancy, permanent loss of function) or fatal; and harm preventability was classified as definitely not, probably not, probably or definitely.2 12 13 24 26 27

      Using the modified DEER Taxonomy which we previously adapted for acute care,2 adjudicators identified any of 44 process failures in nine diagnostic domains: access and presentation, history, physical exam, assessment, diagnostic testing, diagnostic information and follow-up, subspecialty consultation, team communication and collaboration, and patient experience (online supplemental appendix A).12 For example, to attribute a diagnostic process failure related to the patient experience, the adjudicator would use available clinical documentation to judge whether the patient had received an accurate and timely explanation of their health problem; whether there was a delay in communicating test results, assessments or consultant findings; or whether the care team did not address patient concerns, preferences or non-adherence.22

      Reviewers met virtually to resolve discrepancies and complete the consensus review form while re-reviewing the case in the EHR. Major deterioration events (eg, rapid responses, increasing oxygen requirement) and relevant laboratory tests and orders (eg, blood cultures, antibiotics) were reviewed to identify delays in care (eg, ordering lactate levels within accepted timeframes for sepsis). When available, autopsy findings were reviewed. For cases in which the diagnosis was uncertain at discharge, a DE designation was borderline or multiple process failures were selected, reviewers “looked forward” at emergency department visits, readmissions, unexpected surgeries or deaths up to 90 days after admission to assess whether a missed opportunity during index hospitalisation was responsible.28 All DE descriptions were characterised as missed, incorrect or delayed.

      In the validation study, we determined that while independent reviews yielded moderate agreement (typical for AE studies),29 the consensus and expert panel conducted tertiary reviews yielded strong agreement (Cohen's kappa>0.7).12 Thus, to ensure consistency in consensus reviews, all cases that had discrepancies in DE designation between reviewers; inconsistencies in DE description (did not meet the operational definition); multiple Safer Dx items that were borderline (slightly agree or disagree); multiple process failures selected; or an uncertain diagnosis (no clear aetiology) were flagged for tertiary review during the consensus review between independent reviewers. An expert panel composed of five clinician adjudicators (AKD, JLS, ABG, SR, DM) met bimonthly to review individual and consensus forms, pertinent EHR data, and final DE designation and descriptions for all cases flagged for tertiary review. To participate on the expert panel, clinician adjudicators had to independently review and discuss at least 25 cases with an expert reviewer (AKD, JLS).

      The primary outcome was defined as the presence of a DE judged to have caused harm of any severity up to 90 days after admission. Secondary outcomes included preventable DEs and severely harmful DEs (having major or fatal harm). Demographic variables included age, sex, race, ethnicity, insurance, risk cohort and International Classification of Diseases, 10th Revision (ICD-10) problem group for admission diagnoses. Independent variables were defined as the presence of process failures overall and within each diagnostic domain. Other measures included the type of diagnosis (primary or secondary) and ICD-10 diagnosis code associated with each harmful DE, and characterisation of DE as missed, incorrect or delayed.

      Statistical analysis

      Demographic characteristics, harmful DE rates within each of the four subgroups, and characteristics of harmful DEs were reported as numbers and percentages for categorical variables and means (standard deviations) for continuous variables as appropriate. Cohen’s kappa was calculated to assess inter-rater reliability between individual reviews, and between final consensus and expert panel tertiary reviews.12

      For the primary analysis, we oversampled cases with high-risk criteria (figure 1) to gain more information on the probability of DE to obtain weighted estimates of outcomes in the population. To calculate unbiased estimates of harmful DEs, each case was weighted by the inverse probability of being sampled from its subgroup.30 Thus, the study can be considered a stratified complex survey (one strata for each of the four subgroups) with weighting defined as the inverse probability of being sampled. Complex survey weighting for stratified samples was applied to obtain unbiased estimates of population characteristics including standard errors and 95% confidence intervals (95% CIs). Complex survey weighting was similarly applied to estimate DEs, harmful DE and preventable DE rates within subgroups (eg, age, sex, race, ethnic group, insurance, risk cohort).31 Thus, the weighted stratified sampling design offered flexibility in oversampling certain subgroups, while allowing for complex survey methods to reweight the observations in the sample to obtain unbiased estimates of any population characteristic.

      For secondary analyses, we used multivariable logistic regression to model the primary outcome using the presence of process failures within each diagnostic domain as independent variables to understand the extent to which each domain was associated with harmful DEs. Cases were weighted by their inverse probability of being sampled, and standard errors and 95% CIs from the logistic regression were calculated using complex survey weighting for stratified samples.32 33 Because assessment failures frequently contribute to DE, we developed two models, one with and one without assessment failures.14 All analyses were performed using complex survey procedures in SAS version 9.4 (SAS Institute).

      Sample size estimate

      To obtain an estimate and CI for the percentage of hospitalised patients with a harmful DE, we based our required sample size on having a 95% weighted binomial complex survey CI that was at most 4% wide. Thus, we required at least 700 patients overall to ensure that the resulting 95% CI would be ±2% (4% wide) of the estimated percentage. To ensure a sample size of at least 100 cases within each subgroup while oversampling cases from the complex clinical event subgroup relative to others, we aimed for a 95% binomial CI that would be ±8% (16% wide) of the estimated percentage for each subgroup.

      Results

      Of the 9147 eligible cases (figure 1), 675 were randomly sampled from each subgroup during each month of the study period in the following percentages: 130 (100%) ICU transfers after 24 hours, 141 (38.5%) cases of patients who expired within 90 days of admission, 298 (7%) cases with complex clinical events and 106 (2.4%) cases without the aforementioned high-risk criteria. Demographics of eligible and sampled cases are reported in table 1. The values for Cohen’s kappa for DE determination were 0.52 between individual reviewers and 0.87 between final consensus and expert panel tertiary reviews. By individual subgroup, harmful DEs were identified in 37 cases of ICU transfers after 24 hours (28.5%, 95% CI 20.70 to 36.22), 18 cases of deaths within 90 days of admission (12.8%, 95% CI 7.26 to 18.27), 23 cases with complex clinical events (7.7%, 95% CI 4.69 to 10.75) and 6 cases with no events (5.7%, 95% CI 1.26 to 10.06).

      Table 1

      Demographic characteristics of eligible cases, random sample and population based on weighted study sample

      The population estimates of harmful, preventable and severely harmful DEs (table 2) based on the weighted sample were 7.2% (95% CI 4.66 to 9.80), 6.1% (95% CI 3.79 to 8.50) and 1.1% (95% CI 0.55 to 1.68), respectively. Based on these estimates, 84.7% of harmful DEs were preventable. Harmful DE estimates were higher for older, White, non-Hispanic, non-privately insured and high-risk patients.

      Table 2

      Population estimates (n=9147) of harmful, preventable and severely harmful diagnostic errors for general medicine patients based on weighted sample (n=675)

      The presence of process failures was significantly associated with harmful DEs when assessment failures were included (1.74, 95% CI 1.45 to 2.09, p<0.01) and excluded (1.94, 95% CI 1.62 to 2.33, p<0.01). In multivariable model 1 (table 3), assessment failures had the largest association with harmful DEs (7.34, 95% CI 3.86 to 13.95, p<0.01). In multivariable model 2 without assessment failures (table 3), harmful DEs were significantly associated with failures in diagnostic testing (odds ratio (OR) 4.24), subspecialty consultation (OR 3.11), patient experience (OR 2.93) and history (OR 2.50).

      Table 3

      Diagnostic process failures associated with harmful diagnostic errors: multivariable logistic regression models

      The severity of harm attributed to DEs experienced by 84 patients (table 4) was characterised as minor in 5 (6.0%), moderate in 36 (42.9%), major in 25 (29.8%) and fatal in 18 (21.4%). Forty (47.6%) were related to the primary diagnosis at admission or discharge and 44 (52.4%) were related to a secondary diagnosis. Fifty-two (61.9%) were characterised as delays. Errors associated with major or fatal harm were frequent in the high-risk cohort (55.1%, 43/78) and infrequent in the low-risk cohort (0%, 0/6). The most frequent diagnoses (ICD-10) associated with these events included heart failure (I50.X), acute kidney failure (N17.9), sepsis (A41.X), pneumonia (J18.9), respiratory failure (J96.X), altered mental status (R41.82), abdominal pain (R10.9) and hypoxaemia (R09.02). Examples of harmful DEs, primary and secondary diagnoses, harm severity and preventability, and process failures are provided in online supplemental appendix B for selected cases in each of the four subgroups.

      Table 4

      Characteristics of harmful diagnostic errors, harm severity and International Classification of Diseases,10th Revision (ICD-10) codes by risk cohort (n=84)

      Discussion

      We evaluated a weighted random sample of patients hospitalised on general medicine and estimate that about 1 in 14 patients (~7%) in this population experienced a harmful DE related to the primary diagnosis at either admission or discharge, and an equivalent percentage of secondary diagnoses. The majority of these harmful DEs were judged to be preventable. In multivariable analysis excluding assessment failures, failures in diagnostic testing, subspecialty consultation, patient experience, and history were associated with harmful DEs. These data suggest that DEs are frequent on general medicine, associated with certain process failures, and cause substantial harm.

      While our observed severely harmful DE estimate (cases with major or fatal outcomes) of 1% is consistent with prior studies,3 6 our overall estimate (including cases with less severe impact) was higher. Gunderson et al estimated that the incidence of harmful DEs in inpatients to be at least 0.7%.3 This systematic review generated an estimate based on harmful DEs pooled from retrospective studies of enriched cohorts such as autopsy studies, many of which used an AE screening process similar to the Harvard Medical Malpractice Study and the recent study by Bates et al.6 34 Experts have suggested that such methods are not well-suited to detect DEs.6 7 Conversely, studies that screened for DEs by rigorously evaluating the diagnostic process have yielded higher event rates (5–18%) in cohorts of patients who expired or transferred to ICU or expired in the hospital, were under investigation for COVID-19 or were readmitted.8 10 20

      Unlike prior studies that screened for DEs, our estimate reflects harmful DE rates related to exposure to hospital care received on the general medicine service, not limited to specific or enriched cohorts.4 13 20 35 36 For example, the recent multicentre study by Auerbach et al (which was based on our approach) observed that harmful DEs occurred in 26% of patients who transferred to the ICU 24 hours or more after admission.13 14 While our weighted sample included patients who transferred to the ICU and observed a similar rate (28.5%), we also sampled cases without these high-risk events to obtain a population estimate for hospitalised patients who received care on the general medicine service. By querying the EHR using clinical screening criteria and ensuring adequate sampling of each subgroup,12 23 our sample broadly represented clinical trajectories typically encountered for hospitalised patients who received general medical care and was not limited to a specific disease process (eg, epidural abscess, myocardial infarction).20 35 As might be expected, in cases with a major deterioration event, the harm was frequently characterised as major or fatal. In contrast, in cases without such events, the harm was frequently characterised as mild or moderate. Yet, the impact of the harmful DE did not necessarily correlate with the event. For example, for cases in which the patient expired after hospitalisation, the harmful DE was not always associated with the outcome as illustrated in Case 2 (online supplemental appendix B).

      Additionally, many of the harmful DEs identified were frequently related to a secondary diagnosis or a diagnostic delay (ie, a missed diagnostic opportunity early during hospital encounter). In Case 2, the harmful DE was associated with an undetected pleural effusion, a secondary diagnosis that was missed during the index hospitalisation but identified during subsequent readmission for hepatic hydrothorax. In Case 3, the harmful DE was related to both the primary diagnosis (sepsis) and secondary diagnosis (methicillin-susceptible Staphylococcus aureus bacteremia); however, identifying pelvic abscess as the source of bacteremia was identified late in the hospital course after obtaining dedicated imaging.

      Our ability to detect harmful DEs beyond those captured by traditional methods such as the IHI Global Trigger tool can be explained by our structured case review process that empowered adjudicators to use the EHR to rigorously assess the diagnostic process and consider the impact of identified DEs both during the course of hospitalisation and afterwards for cases with uncertainty or multiple process failures.12 27 This approach enabled detection of events with both severe and less severe outcomes, such as a delayed diagnosis of pelvic abscess in Case 3 (online supplemental appendix B) in which the patient did not expire or transfer to the ICU. Unlike studies focused on sampling highest-risk events,14 our inclusion of subgroups with complex clinical events and no events (large and understudied subpopulations) and use of targeted reviews of post-hospitalisation documentation using a “look forward” approach28 generated new insights about the spectrum of harms associated with faulty diagnostic processes during the hospital encounter. For example, DEs were judged to be present in cases in which the diagnosis was uncertain (unclear aetiology of altered mental status) and specific process failures (misinterpretation of electroencephalogram results documented at discharge in relation to final report, a caregiver-reported concern about being discharged too soon) were identified from review of an unanticipated event after index hospitalisation (readmission for seizure for initiation of anticonvulsants) in Case 4 (online supplemental appendix B). Interestingly, despite the heterogeneity of patient experiences, certain patient- or caregiver-reported concerns when documented could serve as important clues about a faulty diagnostic process.2 15

      Our multivariable analyses suggest that certain process failures are frequently associated with harmful DEs. These include uncertainty in initial assessments, complex diagnostic testing and interpretation, suboptimal subspecialty consultation, patient-reported concerns and history-taking. A thorough analysis of such events should yield insights for optimising triggers for surveillance3 37–39 and developing preventative interventions.2 17 For example, triggers or interventions could be developed by analysing the timing, sequence and pattern of consultation or test orders in EHR audit logs,40 41 capturing diagnostic concerns from patients who review online notes using a patient diagnostic questionnaire15 42 and applying machine learning and natural language processing to model uncertainty expressed in documentation.43–46

      Additionally, the rapid adoption of artificial intelligence (AI) and large language models (GPT-4) has much potential to facilitate prospective case surveillance by detecting complex patterns of risk factors and clinical events that represent markers of risk or suboptimal diagnostic processes. For example, once trained on large cohorts and inclusive of data retrieved from various sources (EHR, institutional safety reporting systems, patients), AI-based tools could facilitate detection of diagnostic uncertainty in initial assessments; complex sequences of diagnostic tests; incorrect study interpretations; discrepancies in consultation recommendations; patient–clinician diagnostic discordance; patient-reported diagnostic concerns; or lack of improvement based on expected clinical course. Furthermore, when embedded in the EHR and integrated into workflow for clinicians, real-time AI-generated insights and diagnostic suggestions could prompt more timely intervention, such as pausing to take a diagnostic time-out and reconsidering the working diagnosis as we recently described.17 47–49

      While our study has several strengths, it has limitations. First, it was conducted using a non-traditional stratified sampling approach at a single institution for patients who received general medical care and had a length of stay <21 days. While our sampling approach was grounded in emerging research, expert consensus and local data, event rates may differ for patients who receive more specialised care delivered on other services and at other institutions, and are likely higher for patients with longer exposure to hospital care. Regarding top disease categories implicated in serious harms from DEs, while we frequently observed harmful DEs related to infection (sepsis, pneumonia), we infrequently identified harmful DEs related to vascular or cancer diagnoses.4 Such cases may be detected in patients who receive specialised care delivered on cardiovascular and oncology services, not a general medicine service. Second, for reasons indicated earlier, we excluded cases during the initial waves of the pandemic including patients hospitalised on COVID-19 teams. While unintended bias is possible, recent data suggest that harmful DE rates are similar in this population.20

      Third, we used the EHR (known to contain inaccurate information about the status of death) to identify patients who expired within 90 days of admission.50 Furthermore, while we did not consider post-discharge events such as readmissions in our queries to stratify lower-risk subgroups, in about one-third of our study sample we relied on a “look forward” approach to identify these events when individual reviewers did not agree on DE determination, when there was uncertainty about the diagnosis, or when multiple process failures where present. Future studies should consider such factors when defining criteria for subgroups.

      Fourth, limiting the period of harm to 90 days from admission may have precluded detection of certain serious DEs with more delayed impact (such as lack of follow-up of incidental pulmonary nodules); however, other safety net systems likely mitigated such faulty diagnostic processes at our institution.51 Lastly, we observed moderate inter-rater reliability between individual reviews. As in our validation study,12 we observed substantial agreement between final consensus reviews and expert panel tertiary reviews, suggesting that the most important step is “talking through” determination of DEs and characterisation of associated harms for independently reviewed cases.

      In summary, we performed a single-centre evaluation to estimate the prevalence of harmful DEs in hospitalised patients who received general medical care. While our results and sampling approach should be validated in larger samples, for different clinical services and at other sites, these data offer direction for improving surveillance approaches and developing preventative interventions. Novel approaches, including the use of AI and machine learning, have potential for facilitating more granular surveillance in large subpopulations without highest-risk events (such as the complex clinical events subgroup) than can be achieved by human review of the EHR alone; assessing uncertainty or risk in diagnostic processes43; and prompting preventative intervention to promote a culture of diagnostic safety.6 7 47

      Data availability statement

      Data are available upon reasonable request.

      Ethics statements

      Patient consent for publication

      References

      Supplementary materials

      • Supplementary Data

        This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

      Footnotes

      • X @tweet_akdMD, @drjschnip, @dbatessafety

      • Collaborators Patient Safety Learning Laboratory Adjudicator Group: David Lee MD; Daniel Palazuelos MD, MPH; Myrna Katalina Serna MD, MPH; Anne Kozak MS, PA-C; Khelsea O’Brien MS, PA-C; Shela Shah MD; Mohammed Wazir MD; Chadi Cortas MD, MD, MBA; Caroline Yang MD.

      • Contributors All authors contributed sufficiently to the conceptualisation (AKD, PG, RR, JG, JLS, SL, DWB), methodology (AKD, SP, DM-C, PG, RR, JG, JLS, SL, DWB); data curation (SP, KK, DM-C, AG, SR, MM, AL, NP, JMF); analysis (AKD, SP, KK, DM-C, MM, NP, CY, JMF, JLS, SL, DWB); project administration (AKD, SP, KK, DM-C, MM, AG, SR, AL, NP, ML); writing, editing and review of the manuscript (AKD, SP, KK, DM-C, AG, SR, MM, AG, AL, NP, ML, CY, JMF, PG, KOS, RR, JG, JLS, SL, DWB); and/or supervision and acquisition of funding (AKD, DWB).The guarantor (AKD) accepts full responsibility for the conduct of the study, analysis and access to data, decision to publish, and the finished work product.

      • Funding This study was funded by Agency for Healthcare Research and Quality (AHRQ) (R18 HS026613).

      • Competing interests None declared.

      • Provenance and peer review Not commissioned; externally peer reviewed.

      • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.