Background Diagnostic error incurs enormous human and economic costs. The dual-process model reasoning provides a framework for understanding the diagnostic process and attributes certain errors to faulty cognitive shortcuts (heuristics). The literature contains many suggestions to counteract these and to enhance analytical and non-analytical modes of reasoning.
Aims To identify, describe and appraise studies that have empirically investigated interventions to enhance analytical and non-analytical reasoning among medical trainees and doctors, and to assess their effectiveness.
Methods Systematic searches of five databases were carried out (Medline, PsycInfo, Embase, Education Resource Information Centre (ERIC) and Cochrane Database of Controlled Trials), supplemented with searches of bibliographies and relevant journals. Included studies evaluated an intervention to enhance analytical and/or non-analytical reasoning among medical trainees or doctors.
Findings Twenty-eight studies were included under five categories: educational interventions, checklists, cognitive forcing strategies, guided reflection, instructions at test and other interventions. While many of the studies found some effect of interventions, guided reflection interventions emerged as the most consistently successful across five studies, and cognitive forcing strategies improved accuracy and confidence judgements. Significant heterogeneity of measurement approaches was observed, and existing studies are largely limited to early-career doctors.
Conclusions Results to date are promising and this relatively young field is now close to a point where these kinds of cognitive interventions can be recommended to educators. Further research with refined methodology and more diverse samples is required before firm recommendations may be made for medical education and policy; however, these results suggest that such interventions hold promise, with much current enthusiasm for new research.
- Decision making
- Medical education
- Cognitive biases
- Diagnostic errors
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
In 1999, the Institute of Medicine's seminal report To Err is Human1 claimed that up to 98 000 patients may die annually in American hospitals due to preventable medical errors. Several large-scale international studies have supported this claim; a major 2007 UK review reported that 8%–10% of hospital admissions in the National Health Service result in an adverse event, between 30% and 55% of which are to some extent preventable.2 Studies between 1995 and 2009 in Australia,3 the UK,4 New Zealand,5 Sweden6 and Canada7 ,8 report adverse event rates between 7% and 17% and preventability rates between 30% and 70%.
Estimates from diverse methodologies, such as autopsy audits, standardised patients, second reviews and malpractice claims data, indicate a diagnostic error rate of around 10%–15% where patient outcome is adversely impacted, which appears stable across hospitals, countries and decades.9 ,10 Diagnostic error is among the leading contributors to adverse incidents and malpractice suits against hospitals,11 with significant human and economic costs.
Cognitive factors in diagnostic error
While multiple factors are typically implicated in diagnostic errors, cognitive factors are thought to be at play in around three-quarters of cases; data from internal medicine suggest that these are largely not cases of inadequate knowledge, but of faulty information synthesis12 (though knowledge deficits may present more or less frequently in other disciplines).
A number of experts have pointed to the dual-process model of decision-making and the concepts of heuristics and biases as approaches by which to better understand these cognitive failings.13 ,14 This model posits that two systems, or modes, of thinking contribute to reasoning. One system (analytical reasoning) is conscious, deliberate, explicit, rational and controlled, contrasting with the other (non-analytical reasoning), which is unconscious, associative, implicit, intuitive and automatic.15–17 A vast body of work17 ,18 demonstrates that individuals reasoning in the non-analytical mode routinely use heuristics, or mental rules of thumb, to reach fast decisions with approximate accuracy.
Traditionally, non-analytical reasoning has been regarded as a trade-off between accuracy and expediency, and there are several taxonomies of heuristics leading to biases that have been implicated in faulty diagnostic reasoning.19 ,20 For example, search satisficing (calling off a search for a second diagnosis as soon as one appropriate diagnosis has been reached) and confirmation bias (the failure to seek disconfirming evidence to refute a diagnostic hypothesis) are thought to be key factors in premature closure, where a doctor accepts a diagnosis without consciously considering other possibilities.19
While the attribution of diagnostic errors to such mental shortcuts has intuitive appeal, the truth is probably more complex; both analytical and non-analytical modes of reasoning are prone to error, and it is presently unclear which is truly more reliable and under what circumstances.21 Identifying the contributing factors to a faulty diagnosis using retrospective techniques, used by most such studies to date, is difficult due to measuring, sampling and hindsight biases.21 The vilification of non-analytical reasoning has been criticised as too simplistic, as the successful use of pattern recognition and heuristics is a hallmark of mature decision-making, distinguishing experts from experienced non-experts.22 ,23 From this perspective, encouraging metacognitive skills and judicious switching between the two modes of reasoning (ie, ‘slowing down when you should’24) may be more realistic.25 ,26
Cognitive skills training
Researchers and theorists have recommended a range of options for medical students and doctors to enhance analytical and non-analytical reasoning. In Croskerry’s excellent 2013 review27 of theoretical and empirical work on debiasing strategies, two broad approaches are described: educational strategies (techniques aimed at enhancing future decision-making through increased knowledge and awareness of reasoning styles, eg, educational curricula,28 ,29 simulation training,30 instruction in Bayesian reasoning) and workplace strategies (techniques aimed at enhancing decision-making in the moment, eg, checklists,31 cognitive forcing strategies,32 slowing down,22 diagnostic time-outs33).
Interventions based on the dual-process model were included in McDonald's broad 2013 review34 of patient safety strategies for diagnostic errors and in Graber's comprehensive 2012 review of cognitive interventions to reduce diagnostic error.35 Our paper adds to the literature by presenting a fresh, more up-to-date synthesis (including some studies that have not appeared in previous reviews) and, most importantly, focusing in greater depth than any other review on analytical and non-analytical thinking as an organising principle in considering this area, and offering a more granular analysis of this specific subset of studies.
This review aims to identify, describe and appraise studies that have empirically investigated interventions to enhance analytical and non-analytical reasoning among medical students, residents and doctors, and to assess their effectiveness.
Search parameters and inclusion criteria
The search process was guided by the PRISMA36 guidelines for systematic reviews and documented in advance in a protocol.
The search focused on efforts to improve analytical and non-analytical decision-making among medical trainees and doctors. The search included only interventions aimed at the individual clinician and excluded system-level strategies, new or improved diagnostic tests or technologies, shared decision-making, patient decision-making or clinical decision-making by laypersons or non-medical healthcare professionals.
The search targeted interventions to reduce cognitive-related diagnostic error, rather than studies of the origins or prevalence of such error. Only interventions to enhance analytical and non-analytical reasoning were included; other styles of cognitive interventions (eg, to increase content knowledge or experience, to provide general feedback on diagnostic accuracy or to provide external assistance from other professionals or non-cognitive decision support tools or diagnosis generators) were excluded. All intervention formats were considered, including educational strategies and workplace strategies, as described above.27
A control group, control condition or baseline measure for comparison was required for inclusion.
Educational intervention studies examining outcomes at any level of Kirkpatrick’s adapted hierarchy37 were included. This model categorises outcomes of educational interventions into four levels: Level 1 outcomes describe the learner's reaction to the intervention (eg, satisfaction, acceptability); Level 2a describes changes in attitudes or perceptions; Level 2b describes the acquisition of knowledge or skills; Level 3 describes transfer of behavioural changes from the learning environment to the workplace; Level 4a describes wider changes in the delivery of care; and Level 4b describes direct benefits to the patient. Outcomes around diagnostic accuracy, resource usage and testing behaviour were considered.
We considered randomised controlled trials, quasi-randomised studies, within-subject studies (at least one control condition data point and one intervention condition data point, using a single participant group), between-subject studies (using treatment and control groups), and pretest–post-test (at least one data point before and one data point after the intervention phase, using intervention and control groups) studies, with no minimum length of follow-up. Commentaries, reviews, surveys and audits were not included.
The search included only studies presented, published or indexed after 1990, which takes into account the work that led to the publication of the Institute of Medicine’s To Err Is Human1 and the work that emerged in its wake.
Five databases were searched: Medline (Ovid, 1990–present), PsycInfo (Ovid, 1990–present), Embase (Elsevier, 1990–present), ERIC (Education Resource Information Centre) (ProQuest, 1990–present) and the Cochrane Database of Controlled Trials (Wiley, 1990–present). A sample set of six relevant papers,38–43 two of which were included in the final analysis,39 ,42 was compiled to facilitate the database search. The search combined the most relevant thesaurus terms (provided by the indexed papers in the sample set) and natural language text words (extracted from the abstracts and titles in the sample set). Search terms were selected for four domains: cognitive science (eg, ‘intuition’, ‘metacognition’), medical decision-making (eg, ‘diagnostic reasoning’, ‘medical error’), the populations of interest (eg, ‘physicians’, ‘medical students’, ‘medical education’) and empirical studies (eg, ‘interventions’, ‘experiments’). The main searches were conducted between April and September 2014, with a supplementary follow-up search in February 2015. The search strategy for Medline is included in the online supplementary material; searches for the other databases were modelled on this strategy.
This database search was supplemented in four ways. First, the bibliographies of review articles, book chapters, the sample set of articles and articles identified for inclusion were manually reviewed for additional papers. Second, the contents of two relevant journals (Medical Education and Medical Decision Making), chosen for their scope and frequency in our preliminary searches, were manually reviewed. Third, key researchers in the area were contacted to request their recommendations and any relevant unpublished work. We also received suggested citations from two peer reviewers. Fourth, the ‘Similar Articles’ feature of PubMed was used to identify papers similar to those included.
Selection of studies and data synthesis
Titles were reviewed independently by two authors, with manual searches of journals and reference lists performed by one author. The abstracts of articles flagged as potentially relevant were reviewed, and full texts of articles appearing to meet eligibility criteria were obtained for full review. The inclusion criteria were applied by three authors; disagreements about relevance were resolved by discussion. Studies excluded due to the nature of the intervention were reviewed a second time, following feedback from peer review. The flow diagram (figure 1) shows the results of the literature search. Twenty-eight studies meeting inclusion criteria were identified.
Information on the following variables was extracted from each included study using a form: setting, country, year of data collection/publication, study design, participant characteristics, sample size, intervention components, duration of follow-up, outcome measures, findings and authors’ recommendations.
The Cochrane Collaboration's Tool for Assessing Risk of Bias44 was used to assess the quality of the studies included. Five of the tool's six criteria were used: sequence generation, allocation concealment, blinding of outcome assessment, incomplete outcome data and other source of bias. The remaining criterion, blinding of participants and personnel, was deemed inapplicable, as participants and personnel will by default be aware of whether they are receiving or delivering an intervention, respectively. The sequence generation and allocation concealment criteria were not relevant in every case, as some studies were within-subject ones. Two authors independently rated each criterion for each study as ‘not relevant’, ‘high risk’, ‘low risk’ or ‘unclear’, with 100% agreement.
We attempted to minimise publication bias by searching multiple databases and by contacting 13 authors to seek unpublished work; five authors responded and three supplied references for review. We also received suggested citations from two peer reviewers. No language restriction was imposed.
An overview of the 28 included studies appears in table 1. Two thousand seven hundred and thirty-two participants took part across the 28 studies.
Participants and setting
Nine studies29 ,45–52 included medical students only, 10 included residents only,28 ,39 ,42 ,53–59 3 included practicing doctors only,60–62 4 included both medical students and residents63–66 and 2 included medical students, residents and doctors.67 ,68 Seven studies were based in the USA,28 ,29 ,53 ,60 ,64 ,66 ,67 10 in Canada,47 ,50 ,51 ,54–57 ,59 ,61 ,68 4 in the Netherlands,42 ,58 ,63 ,65 2 in Brazil39 ,52 and 1 each in the UK,46 Japan,48 Korea,49 Israel62 and Switzerland.45
Risk-of-bias assessments for all included studies are shown in table 2. Details of randomised sequence generation, concealment of allocation and blinding of outcome assessment were scarce. Outcome data were complete or adequate in all cases. The heterogeneity of measurement techniques used is a significant issue; while 18 of the studies used clinical vignettes in some form, these were not standardised. The reliability and validity of such measures are therefore difficult to ascertain.
While many of the included studies find some impact of interventions, not all achieve statistical significance; this suggests that publication bias based on statistical significance was not an issue. No formal analyses were conducted to estimate the risk of publication bias.
Synthesis of results
Studies were grouped by intervention style. There were two major categories of intervention, educational and workplace. Workplace interventions were further divided into four categories: checklists, cognitive forcing strategies, guided reflection and instructions at test to use analytical or non-analytical reasoning. Three studies did not fit these categories; each is considered separately below.
Six studies examined the impact of educational interventions.28 ,29 ,45–47 ,53 All examined outcomes at Level 2b of Kirkpatrick's adapted hierarchy (acquisition of knowledge or skills).37 One study28 employed a longitudinal intervention, consisting of a 1-year curriculum delivered to residents, while the remainder were one-off seminars. One study included a 3-month follow-up timepoint.53
The outcomes measured varied considerably, with one study each examining diagnostic accuracy,45 evidence of heuristic use47 and error identification on test cases.53 No significant impact of the interventions was found in any of these studies. One study each employed measures of thinking styles: the Diagnostic Thinking Inventory;46 the Diagnostic error Knowledge Assessment Test34 (a 13-item multiple-choice measure of knowledge of cognitive biases and strategies) and the Inventory of Cognitive Biases in Medicine;35 the latter two measures were composed and validated by the authors of the respective papers. Significant postintervention gains were observed in each of these studies.
Four studies examined the impact of checklist interventions.48 ,54 ,60 ,61 Two studies used a general diagnostic checklist,54 ,61 one used a general checklist and a symptom-specific checklist60 and one used a debiasing checklist and a differential diagnosis checklist.55 Three studies examined the impact of checklist use on diagnostic accuracy for clinical scenarios, while the fourth examined its impact on extensiveness of differential diagnosis and resource usage during physicians’ hospital shifts.60
Findings were mixed. One study confirmed that checklist use led to fewer errors overall and more correction of errors on verification;61 one reported trends towards more extensive differential diagnosis, though this effect was not statistically significant;60 one reported that checklists were beneficial only where participants could review the content of the case;61 and the last reported that a differential diagnosis checklist improved diagnostic accuracy over intuitive reasoning, though the effect was not observed for a debiasing checklist.48
Cognitive forcing strategies
Two studies55 ,56 instructed participants to consider alternative diagnoses, and both found improvements in diagnostic accuracy compared with instructions to diagnose based on first impressions or without specific instruction. One of these studies also found that considering alternatives improved the accuracy of participants’ confidence judgements.55
One study62 instructed participants to reconsider their diagnosis after removing a misleading detail from the case outline, resulting in a significant improvement in diagnostic accuracy; however, merely warning participants to be aware of misleading details had no impact on accuracy.
Five studies39 ,42 ,49 ,58 ,63 examined the impact of interventions instructing participants to diagnose cases through a guided, structured reflective process, compared with instructions to diagnose cases quickly, based on their first impressions, or in the absence of instructions. The outcome measure in all five studies was diagnostic accuracy on test cases; four with clinical vignettes39 ,42 ,58 ,63 and one with OSCE cases.49
All studies revealed some impact of guided reflection on diagnostic accuracy. Two highlighted its utility in overcoming experimental manipulations to induce cognitive biases.49 ,65 One study reported an effect for complex cases only.39 Findings for medical student samples were mixed, with one study finding an effect49 and another study finding none.63
It is worth noting that of the five studies, four39 ,42 ,58 ,63 were conducted by significantly overlapping research teams, with two authors contributing to all four studies and two other authors contributing to three.
Instructions at test
Seven studies50 ,51 ,57 ,59 ,64 ,67 ,68 examined the impact of interventions instructing participants to use a particular reasoning approach. The outcome measure was diagnostic accuracy on clinical scenarios in five cases57 ,59 ,64 ,67 ,68 and on dermatology slides in two.50 ,51 Experimental manipulations varied significantly. Two studies compared the impact of instructions to diagnose through a ‘directed search’ protocol with instructions to diagnose based on first impressions;64 ,67 one study compared instructions to use dual-process reasoning with no instructions68; one study compared instructions to use lists of clinical features with first impressions;57 one study compared instructions to use dual processing, clinical feature-listing, first impressions and no instructions;51 one study compared instructions to diagnose thoughtfully with instructions to diagnose quickly;57 and the final study compared free-choice reasoning with instructions to approach the case in a style that either matched or differed from the participant's initial chosen approach.59
Findings were mixed. Five studies found no statistically significant difference between instruction groups. One study reported that the ‘directed search’ protocol improved diagnostic accuracy,64 and another reported that instructions to use an analytical approach improved accuracy, irrespective of whether this matched participants’ initial approach.59
One study66 examined the impact of a metacognitive feedback intervention on diagnostic accuracy and use of heuristics when diagnosing clinical scenarios; no effect was found.
One study52 examined the effects of free, cued and modelled reflection on diagnostic accuracy when diagnosing clinical scenarios. Diagnostic accuracy was higher under both the cued and modelled reflection conditions compared with free reflection, with no differences between these two conditions.
One study65 compared the impact of conscious versus unconscious deliberation on diagnostic accuracy. Experts who interpreted a clinical case following an elaborate analysis outperformed those who diagnosed the case following a distractor task or based on first impressions; however, this effect held only for complex cases. By contrast, novices benefited from unconscious deliberation for simple cases only.
The present review reveals a varied picture of the literature on dual-process interventions for diagnostic reasoning. The guided reflection approach emerged as the most promising style of intervention, while interventions that provided softer instructions or education on diagnostic reasoning appear less consistently successful. However, further research is required to separate the effect of this structured approach from the simple effect of spending more time on a diagnosis. A number of authors contributed two or more studies to this review and, as mentioned, the research teams on the majority of the guided reflection studies overlapped significantly. Confidence in the effects observed will be greatly increased by replication by new research teams in other settings as the field expands.
The findings from the studies of instructions at test suggest that instructions to use some style of reasoning at the point of diagnosis may amount to little more than exhortations to ‘think harder’, insufficient to alter cognitive behaviour. However, the sample of studies is small, and the patterns should therefore be interpreted with caution.
Studies of cognitive forcing strategies reveal an interesting split; an educational intervention on these strategies did not produce an effect, while studies examining instructions at test to use a specific cognitive forcing strategy offered some evidence for their efficacy. Again, however, the small sample of studies limits the strength of any conclusions we may draw.
It is possible that the reliance on medical students and residents in these samples plays a role in the results. The successful use of non-analytical reasoning is generally achieved with experience, and trainees earlier in their careers are more likely to use slower, more deliberate analytical methods.69 In this way, students and residents, though a convenient sample for academic researchers, may not be the most appropriate targets for interventions aimed at modifying dual-process modes of reasoning; doctors later in their careers may respond differently to such efforts. (Indeed, students and residents are not a homogenous group in this regard and demonstrate different reasoning strategies in diagnosis, suggesting that the field would benefit from a more nuanced understanding of novice vs intermediate reasoning skills.64) Doctors with many years of experience are in a minority in these samples. Developmental differences in reasoning constitute an important area for future research.
The ultimate goal of researchers’ efforts to enhance reasoning is arguably the improvement of patient outcomes. However, on Kirkpatrick's adapted hierarchy,37 none of the educational interventions measured outcomes at a level higher than Level 2b (improvement in knowledge or skills); that is, outcomes were overwhelmingly measured using clinical scenarios and vignettes, not real-world decision-making in the workplace. While clinical vignettes can accurately and reliably detect differences in physicians’ performance in real settings,70 the transfer of any effects to the real-world workplace, and ultimately to the level of patient outcomes, remains unexplored. Notably, only one study60 examined the impact of an intervention in a hospital setting.
The risk-of-bias assessment revealed a mixed picture of the methodological robustness of studies, and many important details were missing from study reports. Outcome measurement approaches varied widely and the field lacks high-quality and broadly accepted tools to measure reasoning styles and error rates; scales composed for this purpose have not been widely validated and the clinical vignette approach favoured by most researchers has its limitations, as mentioned above. The definitions of error and the hallmarks of effective reasoning are themselves still somewhat fluid, and consensus on some basic questions may need to be reached before such tools can be used effectively; does a good or thorough decision-making process count as a successful strategy, whether or not it results in an accurate diagnosis and a positive patient outcome? Does a good outcome result in the appearance of a good process, rather than a good process leading directly to a good outcome?25
The literature contains many recommendations for various interventions,35 but relatively few empirical trials examining their effectiveness, and only one study in this review examined the lasting impact of interventions over a period longer than 4 weeks.53 The vast majority of the studies we found were presented or published in the last 5 years, suggesting that this is still a relatively immature field, but one that is currently accelerating. Future research by this team will attempt to address some of the methodological challenges in this area.
The search was restricted to studies of medical professionals or trainees. As medical decision-making constitutes a special case of decision-making under uncertainty, it is likely that many valuable conceptual contributions and techniques may have been found in other disciplines, including economics, human factors engineering and military scholarship. Much evidence exists to support the notion that deliberate thinking improves intuitive decision-making, and it is entirely probable that the same holds for medical decision-making.71
The quality of methodology and reporting varied considerably across studies, somewhat hampering confident interpretation and synthesis. Medium- and long-term follow-ups in studies with educational components, evidence of adequate randomisation (see table 2), reliable and validated measurement tools, and studies of experienced doctors were lacking. Effect sizes were reported in only four papers.39 ,42 ,57 ,66 It is worth noting that a number of these studies and the validity of their conclusions, such as papers by Hershberger et al,29 Norman et al57 and Ilgen et al,67 have since been queried or critiqued, demonstrating that researchers are continuously engaging with the outstanding challenges and issues facing the field at this time.72 ,73
Due to the relatively small number of studies in each category, we elected not to perform a quantitative synthesis of the data. However, it is arguably too early in the field's development in any case for such an analysis to be truly useful; we believe that the general overview of trends provided by this review and the identification of promising directions for future research are of more benefit to researchers at this time.
This review reveals a burgeoning field of study; there is general enthusiasm for training in analytical and non-analytical reasoning, and trials of interventions have been emerging steadily over the last number of years, along with a wealth of theoretical work. The empirical work is therefore somewhat preliminary, and challenges include a lack of longitudinal work, standardised measurement tools and research on experienced clinicians. Once-off styles of interventions using guided reflection and cognitive strategies appear to have some effectiveness in improving diagnostic performance, and we have found modest evidence to recommend them to educators. Given our findings, there appear to be four important directions for future research to move the field forward: (1) additional studies to confirm the effects of the most promising intervention styles detailed above, particularly by alternative research teams; (2) further methodological refinement to ensure reliable and valid assessment of key outcomes; (3) studies of key outcomes in non-artificial settings; and (4) studies of more diverse populations of medics, particularly of very experienced doctors.
The kind assistance of Angela Rice is gratefully acknowledged.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online supplement
Contributors KAL designed and composed the protocol, designed the search strategy, carried out database searches and data screening, applied inclusion criteria and risk-of-bias assessments and drafted and revised the paper. GO'R codesigned the protocol, applied inclusion criteria and critically revised the paper. BDK codesigned the protocol, applied inclusion criteria, carried out the risk-of-bias assessments and critically revised the paper. SC carried out data screening and revised the paper.
Funding This review was conducted as part of a PhD thesis funded by the Irish Research Council (grant number GOIPG/2014/346). The funders played no role in the review design, the collection, analysis or interpretation of data, the writing of this paper or the decision to submit it for publication.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Full electronic search strategies, the database of titles reviewed and the review protocol are available to researchers on request from the corresponding author.