Statistics from Altmetric.com
It looks like diagnosis triggers may be gaining traction. Building on their earlier efforts,1 ,2 a team of investigators based in Houston reports on their latest effort to apply electronic screens—so called ‘triggers’—to large clinical databases, to identify cases of potential diagnostic errors.3 They searched nearly 300 000 patients’ records over a 12-month period at two large health systems with comprehensive electronic health records. They sought patients who had one of four ‘red flag’ findings for prostate or colon cancer—elevated prostate specific antigen (PSA), positive fecal occult blood test (FOBT), rectal bleeding (haematochezia), and iron deficiency anaemia. They then used a refined electronic algorithm to cull out patients who (1) were already known to have prostate or colorectal cancer, or (2) had evidence of appropriate follow-up testing or referral. This process left roughly 1500 patients with one of the four red flags potentially unaddressed. Thus, searching an enormous haystack of 300 000 patients, they found roughly 1500 possible ‘needles’–patients who may have had their diagnosis of colon or prostate cancer delayed or overlooked entirely.
Their next step was manual chart review. They had hoped that the yield of their electronic screen for diagnostic failures (‘positive predictive value’) might approach 35%, meaning that at least one out of every three ‘screen positive’ charts would have evidence for care improvement opportunities. Instead they were pleasantly surprised that fully 2/3 of the charts (positive predictive value of 60–70% depending on which screen for which cancer) had such opportunities, suggesting they could find an estimated 1000+ instances of delayed or missed follow-up representing an estimated 50 actual cancers each year.
The first thing that must be said is that, although the screen ‘worked well’ (to find care improvement opportunities), the outpatient systems of care obviously did not. Since there is no reason to believe their findings are not broadly representative of ambulatory care in general (and the fact that both the institutions had advanced electronic systems should, in theory, put them in a better position for reliable follow-up than those lacking such capability);, the findings mean that healthcare diagnosis, as measured by this one metric at least, is a long way from six-sigma quality (defined as one defect per 3.4 million). This study's rate translates into roughly 13 600 defects per 3.4 million patients. While one could quibble with some of the arbitrary cut-off intervals chosen for this study—a colonoscopy 61 days after a positive FOBT was failed care, whereas, one after 59 days was not; similarly with 91 vs 89 days for follow-up of an elevated PSA—the study unquestionably highlights undesirable delays that more efficient and more reliable care should be able to avoid.
The next important consideration to ponder is whether and how such retrospective ‘triggers’ can be used to minimise diagnostic errors prospectively. As we have noted previously, prospectively applying such triggers as safeguards to ‘find and fix’ actual or potential diagnostic errors and delays should be the ultimate application of such triggers.4 Thus, as impressive as the results of the current application of these cancer electronic trigger screens are, we are still working in what quality improvement practitioners call the ‘inspection’ rather than the ‘re-engineering’ or improvement mode.5 In an earlier effort to pilot electronic screens, our diagnostic error research team screened records for potentially missed elevated thyroid stimulating hormone (TSH) levels and was able to intervene and treat multiple patients with overlooked hypothyroidism.6 The prospect of prospectively intervening on the 1000 patients identified as being at risk for prostate or colorectal cancer in this retrospective study is a tantalising one, but one that awaits a different application and study design (the authors did feed back to the providers any outstanding failed follow-up patients, but the 2-year lag in the study period precluded more ‘real time’ feedback). In addition to the logistical challenges of such massive chart reviews are challenges that application of the electronic screen would face related to the question of timing—when should the screens/triggers be run? If run too early (eg, 2 weeks after the time of documentation of a +FOBT), firing reminders or instituting interventions risks needlessly harassing physicians and patients just embarking on a work-up; if too late (eg, after 6 or 12 months) the protocol misses an opportunity for more timely diagnosis of a growing colon cancer.
Ready, aim, improve: new paradigms to trigger better diagnosis
Thus, we see from Murphy et al that we have widespread diagnostic errors and delays, at least for these two diagnoses, confirming a growing body of literature demonstrating suboptimal diagnosis.7 ,8 We also see a glimpse of ways new tools might aid in overcoming limitations of care systems and human memory and performance reliability.9–11 Over the past decade a small but growing cadre of researchers, educators, and practitioners, have begun to grapple with the millennium-old problem of medical diagnosis in new ways, informed by a larger error-prevention movement outside and within medicine.12–16 Much of this work has coalesced in a series of international conferences on Diagnostic Error in Medicine (now in their 6th year). These conferences (selected proceedings from which appeared in a recent supplement to BMJ Quality & Safety) have planted the seeds for new approaches to diagnostic error.
What will it take to jump-start new thinking, approaches and practices to help fulfil the promise of better diagnosis? Historically, efforts to improve diagnosis have been directed toward improving diagnostic technology—more and better lab and imaging tests. A parallel, potentially offsetting and challenging recent trend is changes in traditional physician–patient relationships. Patients and physicians were previously more likely to intimately know each other over time, and (according to physicians and patients at least) physicians had more time to talk to, examine, and think about their patients.17 Without delving into a host of important related controversies (such as, whether and how technologies are being overused, and ways to ensure they are used more cost-effectively, whether medical homes will make things better or worse), there are ways we need to begin rethinking how we approach diagnosis and diagnosis errors.
From our work with the earlier AHRQ funded Diagnostic Error Evaluation and Research (DEER)12 Project, and more recent opportunities to study malpractice and diagnostic errors with Harvard's malpractice insurer,4 ,18 I offer a series of possibly provocative and certainly oversimplified bullets to contrast where we have come from and where we need to go (table 1). While these artificially dichotomised contrasting paradigms each warrant much more evidence and discussion, they can stimulate discussion about what and how we are thinking, teaching and practicing related to medical diagnosis. We welcome the ‘needles’ Murphy et al have uncovered, and hope some of the provocative jabs offered here can serve to puncture our complacency and force us to rethink our collective approach to better diagnosis.
Competing interests None.
Provenance and peer review Commissioned; internally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.