Article Text
Statistics from Altmetric.com
Imagine conferring with your clinician colleagues and being handed a plateful of all of your missed and delayed diagnoses. But, imagine further that, rather than a nightmare of ghosts returning to haunt you in the form of malpractice claims, sanctions by regulatory boards, insurers pouncing on needless expenditures or hordes (yes, there would be large numbers) of angry finger-pointing patients and families, the experience would instead bring a dream of supportive feedback and learning. Imagine the ways such an idealised non-threatening consultation and conference might be designed to minimise defensiveness and maximise introspection, learn lessons, and rethink habits and standard practices. Rather than prompting incredulous exclamations of “you missed that?!” or “what were you thinking?!”, the process would generate engagement and scrutiny of office and hospital workflow and diagnostic testing practices, realistically grappling with time and cost trade-offs, pressures, uncertainties and diagnostic challenges that practicing clinicians face every day. In short, your plateful of missed diagnoses would initiate a process that combines the best elements of a fun and informative morbidity and mortality conference, an expert second opinion from a generous colleague and productive quality improvement consultation.
I suspect the average clinician could care less about diagnostic ‘triggers’ or a new study to increase their positive predictive value. However, no professional could fail to see the appeal of the ultimate form of continuing medical education imagined above—learning practical lessons from one's own cases and discussing with trusted colleagues the ways care could be improved. How to get there from here poses a fundamental challenge, one that the study by Singh et al on diagnosis error ‘triggers’ in this issue of BMJ Quality and Safety attempts to address.1
The authors, a research team based at the DeBakey Veterans Affairs Medical Center and Baylor College of Medicine in Houston, have continued to refine their approaches to reliably and efficiently screen for diagnostic error cases.1 2 Using the comprehensive longitudinal electronic medical record in the US Veterans Health Administration hospitals (and now another electronic medical record in a second health system), they took advantage of two powerful features of electronic records that will increasingly change the paradigm of quality improvement and health services research: (a) the ability to perform simple or even complex queries to select samples of patients meeting specified criteria and (b) the ability to efficiently retrieve and review electronic patient records identified by the screening criteria. Thus, the next decade of ‘chart review’ (which, as the authors point out, is essential for evaluating potential diagnosis error cases), with this quantitatively eased burden of case selection and review, holds the promise of facilitating qualitative leaps forward in what we can learn from the details of clinical care processes and outcomes.
As the authors and the literature repeatedly point out, diagnostic errors are important but understudied, largely because of difficulties in defining and detecting such errors.3 4
Can generic electronic screens cast an effective and efficient net to pick out/up charts to review for errors in diagnosis with a sufficiently high likelihood of errors (ie, positive predictive value) to make the manual reviews worth the effort? In attempting to ease the burden of detecting diagnostic errors, the authors (and others) have sought to develop methods to more efficiently sift through the tens of thousands of encounters they wanted to screen for diagnostic errors. In the present study,1 they developed two relatively simple screens—admission to an acute care hospital in the 14 days following an index primary care visit, or a second broader screen that identified patients who presented to an emergency department, sought urgent care or had an unscheduled subsequent primary care visit, also in the 14 days following an index visit.
Their rationale, that an unexpected hospitalisation, emergency department visit or urgent/unscheduled clinic visit could result from a missed diagnosis during the initial visit, seems sound, particularly for acute problems. Unfortunately, it overlooks considerable numbers of other situations where diagnosis fails. In fact, it seems unlikely to pick up many of the cases that represent the leading causes of malpractice claims in primary care—missed and delayed diagnoses of cancer. Here the diagnostic failures unfold over a period of weeks and months, rather than hours or days, and, except in relatively rare circumstances (eg, progression of an undiagnosed colon cancer to the point of causing bowel obstruction), would not typically lead to a positive trigger such as hospitalisation within the subsequent 2 weeks. The authors acknowledge this limitation and have, in other studies,5 6 sought complementary screens to detect missed cancer diagnoses. But this shortcoming illustrates the fact that no one screen fits all when it comes to detection of diagnostic errors.
More subtle, but sobering, is the evidence from this study's data that their screen would fail to detect more than 90% of the missed diagnoses in their cohort. Using the trigger increased the yield from roughly 2% of unselected (‘control’) cases, to 20% in the trigger-positive sample (meeting the 14-day hospitalisation screening criteria). However, applying this 2% rate to their cohort of screen-negative patients means that while they picked up a total of 177 error cases in the ‘screen positive’ patients (using both sets of trigger screens), there would have been 1710 additional cases involving missed diagnoses—nearly 10 times as many error cases in the larger cohort overlooked with a negative screen. Depending on the purpose of the error screening, this may or may not be acceptable. Certainly, if I wanted to hear about all of my cases of diagnostic error, I would find my plate 90% empty. On the other hand, if an institution seeks a quick sample of cases to illustrate problems with diagnosis, using the selective triggers to target a manageable number of charts to review might prove quite helpful.
Another important question this sampling strategy raises is the representativeness of the cases found via the screen. In other words, are the 177 cases identified by the triggers and subsequent chart reviews, fairly similar to or systematically different from the 1710 overlooked cases? As mentioned above, to the extent that they overlook missed diagnoses more related to chronic or subacute diseases, the triggers may miss the identification of important system problems related to diagnostic processes. As long as the institution recognises the characteristics of different possible triggers, it could probably develop an efficient, trigger-based strategy for identifying substantially delayed diagnosis of cancers and other subacute illnesses, as opposed to more acute problems.
Are triggers the answer?
‘Triggers’ have become an increasingly used buzzword in adverse event detection and measurement.7–9 The Institute for Healthcare Improvement has developed and promoted a ‘global trigger tool’ as an instrument ‘to surface harm’ and for measuring rates of injury from medical care over time.10 11 In 2008, the Agency for Healthcare Research and Quality convened an expert panel meeting on triggers and targeted injury detection systems to review the trigger literature and discuss challenges in implementing targeted injury detection systems.7 And the Singh study uses the word ‘trigger’ 67 times (excluding the abstract and references) to reference their screening criteria.1
However, more than a century ago, Sir William Osler, the great teacher and diagnostician, promulgated a different use of the word ‘trigger’.12 He urged his students to use ‘triggers’ from the history and physical to create a list of possible diagnoses and then narrow the differential diagnosis based on their knowledge of anatomy and physiology. How effectively do today's clinicians, still largely relying on unaided human memory, trigger the correct diagnosis? A study that my colleagues and I recently published13 suggests that such ‘triggering’ represents a major weak spot and contributor to diagnosis error. In fact, failure or delay in considering the diagnosis constituted the leading factor, found in 110 of 583 cases in a series of self-reported diagnosis errors, far exceeding the second most frequent (and also related to failure to consider) failure mode—failure/delay in ordering needed tests (seen in 63 cases).
Dr Lucian Leape and I have recently proposed six components, ideally aided by computerised decision support in some form, as a model for effectively triggering the thought processes and actions essential for reliable diagnosis.14 These elements for fail-safe diagnostic assessment of a given symptom or problem include: (1) key data elements to be collected (ideally automatically via computer assisted questionnaires); (2) ‘don’t miss' diagnoses—critical diagnoses that one must consider given their seriousness and need for urgent treatment (eg, aortic dissection in patients presenting with acute chest pain); (3) “red flag” symptoms and signs suggesting a potential ‘don't miss diagnosis’ (eg, back pain that wakens a patient at night); (4) potential drug causes (because clinicians frequently overlooked medication-related causes of patients' symptoms); (5) required referral(s) (ie, when to refer for more expert evaluation or diagnostic procedures); and (6) patient follow-up instructions and plans (so the patient knows what to watch for and when to follow-up if they do not feel better).
Designing systems (including information technologies, but other structures and processes of care as well) to hardwire these elements into clinical workflow and diagnosis decision support holds the promise of substantially reducing delayed or missed diagnoses due to the problem of overlooking relevant diagnostic considerations. While there are many other challenges in the diagnostic process (box 1), such as test results lost to follow-up (which many consider a lower hanging fruit candidate for improving diagnosis15), supporting clinicians in triggering relevant diagnostic considerations is an avenue that requires more thorough exploration.
Reliable diagnosis: pitfalls and challenges
Challenging disease presentation
Atypical presentation
Non-specific symptoms and signs
Unfamiliar/outside specialty
Findings masking/mimicking another diagnosis
Red herring misleading findings
Rapidly progressive course
Slowly evolving blunting onset perception
Deceptively benign course
Patient factors
Language/communication barriers
Signal:noise—patients with multiple other symptoms or diagnoses
Failure to share data (to be forthcoming with symptoms or their severity)
Failure to follow-up
Testing challenges
Test not available due geography, access, cost
Logistical issues in scheduling, performing
False positive/negative test limitations
Performance/interpretation failures
Equivocal results/interpretation
Test follow-up issues (eg, tracking pending results)
Stressors
Time constraints for clinicians and patients
Discontinuities of care
Fragmentation of care
Memory reliance/challenges
Broader challenges
Recognition of acuity/severity
Diagnosis of complications
Recognition of failure to respond to therapy
Diagnosis of underlying etiologic cause
Recognising misdiagnosis occurrence
The online supplementary appendix 1 presents the above contributing factors in a grid format along with commonly missed diagnoses.
However, providing clinicians (and patients) with an even longer differential diagnosis list risks simply overwhelming and/or annoying busy practitioners. As with computerised reminders of other kinds, overalerting poses the very real risk that clinicians will end up ignoring the suggestions, particularly if they often represent ‘false alarms’, needlessly slow workflow, or raise impractical suggestions or liability risks.16 Even if we avoid the cognitive problems of excessively long diagnostic lists, clinicians will require thoughtful and parsimonious ways of sorting through the probabilities of these different considerations to avoid harm to patients from false positives and overtesting, not to mention the time and anxiety involved in chasing the false leads.
There are no easy answers here. While reassuring patients and allaying needless anxiety remains an important function and responsibility of all practicing clinicians, studies of high reliability organisations demonstrate that a constant state of awareness and worry about ‘what could go wrong’ represents a fundamental safety requirement.17 For high reliability diagnosis, such worrying requires that clinicians have a heightened ‘situational awareness’ of where the Swiss cheese holes18 lie in diagnosis. We have drafted a matrix (online supplementary appendix 1) to illustrate some of the more frequent, critical, problem-prone diagnoses we observed during a series of diagnosis, morbidity and mortality discussions as well as from several hundred self-reported errors.13 There is a need to recognise the varying mix of factors at play in different diagnoses and situations. This matrix illustrates how the vulnerabilities listed in box 1 that populate the columns of the grid can conspire to make diagnosis difficult. Such a failure-mode vulnerability checklist can be used both retrospectively—to analyse potential error cases—and prospectively—to heighten awareness of pitfalls in approaching a particular diagnosis. Each diagnosis, and diagnostic situation, has its own fingerprint illustrating factors that are the most challenging and frequent contributors to delayed or misdiagnosis. Prospectively, designing safeguards to protect against these pitfalls and challenges will be needed to dramatically reduce diagnosis errors, but an important first step is awareness of the profile of where and how diagnosis fails for these selected diagnoses.
‘Pull’ systems in lean quality improvement represent engineering designs that emphasise paving smooth and reliable paths that can help standardise processes and lead to less friction, waste and error. Can we ‘pull the trigger’ to fire up our diagnosis improvement imaginations, as well as improve our daily clinical cognition by more systematically triggering key diagnoses as Osler advised a century ago? Can ‘triggers pull’ an enriched sample of cases to facilitate finding, serving up and learning from diagnostic error cases? Given requisite culture and focus, and natural hunger for clinicians to learn and improve, we think the answer to these questions is ‘Yes’.
Acknowledgments
Dr Schiff acknowledges the support of the Risk Management Foundation of the Harvard Medical Institutions in his work on diagnostic errors in medicine.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Download Supplementary Data (PDF) - Manuscript file of format pdf
Footnotes
Linked article 000304.
Competing interests None.
Provenance and peer review Commissioned; internally peer reviewed.