Background Few studies have investigated whether clinicians can use checklists to verify their diagnostic decisions. Checklists may improve accuracy by prompting clinicians to reconsider or recollect information but might impair decision making by adding to clinicians’ cognitive load. This study assessed whether checklists improve cardiac exam diagnostic accuracy, and whether this benefit is dependent on collecting additional information.
Methods 191 internal medicine residents examined a cardiopulmonary simulator. They provided a diagnosis, subjective rating of certainty, and key findings before and after using a checklist. Residents were randomised; half were allowed access to the simulator and half were prohibited access to the simulator while using the checklist. Residents rated their cognitive load in each step: prechecklist diagnosis, checklist use and postchecklist diagnosis.
Result Verifying with a checklist resulted in improved diagnostic accuracy; 88 residents (46%) made the correct diagnosis before using the checklist compared with 97 (51%) afterwards, p=0.04. The benefit of checklist use was restricted to residents allowed to re-examine the simulator (10 changed to correct diagnosis and one to an incorrect diagnosis) whereas no net benefit was seen among residents unable to re-examine the simulator (two changed to a correct diagnosis and two to an incorrect diagnosis, p=0.03). Those able to re-examine the simulator were slightly more confident after checklist use, whereas those unable to re-examine were slightly less confident after checklist use (p=0.01). The opportunity to re-examine the simulator had no effect on the accuracy of key findings reported. Of the three steps, checklist use was associated with the lowest cognitive load (F1,189=68 p<0.001).
Conclusions Verifying diagnostic decisions with a checklist improved diagnostic accuracy. This benefit was only seen when more information could be collected. Checklist use was not associated with increased cognitive load.
- Decision Making
- Decision Support, Clinical
- Diagnostic Errors
- Medical Education
Statistics from Altmetric.com
Making a diagnosis can be a difficult and error prone task because of the volume of information that clinicians must integrate. Checklists can help manage this information load, thereby allowing clinicians to detect and correct errors. While checklists are widely endorsed to reduce medical error,1–3 studies have focused on their use around medical procedures,4 ,5 not diagnostic decisions. Whether checklists can be applied to diagnostic decisions in clinical tasks, such as taking a history or performing a clinical exam (ie, physical diagnosis), is not well studied.
Cognitive load theory provides a rationale for checklist use in physical diagnosis. Cognitive load refers to the amount of mental effort required to perform a task.6 The cognitive load involved in a task relates to how much information must be simultaneously juggled. Tasks which require the integration of more than seven pieces of information can tax the finite resources of working memory and will typically be associated with high cognitive load.7 Medical diagnosis is associated with high cognitive load as it requires collection and integration of vast amounts of information. Therefore, it is not surprising that a majority of diagnostic errors are attributed to information collection and integration.8 Checklists might help overcome this information overload.3 Checklists encourage clinicians to systematically consider all relevant material, potentially facilitating information collection and integration.1 However, checklists may inadvertently add to the cognitive load. One approach to avoid adding to the cognitive load is to use checklists to verify a decision after it has been made,9 rather than before or during the decision itself.10
This is problematic as many clinical tasks require clinicians to actively seek out information before making a decision. For these tasks, it is unclear how to implement a checklist to verify decisions. Do clinicians need to recollect information when verifying their decision? If so, will the checklist impair decision making by adding to the cognitive load?
Cardiac physical diagnosis is a clinical task involving complex diagnostic decisions that lends itself to controlled study. It is estimated that there are several hundred potential cardiac physical diagnostic findings.11 Yet clinicians report using only three to five findings for most cardiac diagnoses.12 ,13 Do clinicians collect enough information to pick the most useful three to five findings? Prior research suggests that clinicians frequently make mistakes by not collecting enough information.8 Checklists may help clinicians avoid these errors by prompting them to be more thorough and identify relevant findings they have overlooked. However, this would require clinicians to recollect information while using the checklist.
Checklists might also assist clinicians in integrating the available information into a diagnostic decision. Within the cognitive psychology literature, diagnostic decisions are viewed as summative decisions made by two parallel and interacting cognitive systems: systems 1 and 2.14 ,15 System 1 processing is subconscious, requiring little mental effort, whereas system 2 processing is conscious and effortful. Errors in information integration may occur in both systems. While system 1 can integrate large amounts of information, this integration can be significantly influenced by subconscious biases.15 Encouraging system 2 processing can reduce the effect of these biases in decision making. However, the conscious decision making of system 2 is limited by working memory and can be easily overloaded by large amounts of information. Checklists can facilitate information integration in two ways. First, checklists can combat system 1 biases by encouraging oversight of the diagnostic decision by system 2 processing. Second, checklists can combat the information overload involved in system 2 processing by limiting conscious attention to a small number of relevant variables. If checklists assist clinicians in integrating information, clinicians would not have to recollect information when using the checklist to verify a decision.
Therefore, the purpose of this study was twofold: (1) To determine if using a checklist to verify cardiac physical diagnosis improves diagnostic accuracy and (2) To determine whether the mechanism of benefit involves information collection and/or information integration. Delineating the underlying mechanism is of theoretical and practical importance. If the benefit to checklists is contingent on information collection, verification checklists might only be effective when clinicians go back to the patient to identify findings they overlooked.
A total of 193 internal medicine residents with 5–8 years of physical exam experience were recruited during their yearly formative objective structured examination. All residents were approached and all but two provided written consent to participate. The exam was administered over five different days. No attempt was made to sequester residents as the purpose of the exam was entirely formative. Based on prior study with a comparable cohort of residents,12 we calculated a minimum sample size of 156 to detect a 20% difference in diagnostic accuracy assuming a power of 80% and α of 0.05.
Model of cardiac exam
A high-fidelity cardiac exam simulator, Harvey (Miami, Florida, USA), was used. The simulator provides a reproducible model for the assessment of cardiac physical examination skills.16 ,17 It replicates all aspects of the cardiac exam, thereby averting the criticisms of assessing heart sounds in isolation,18 ,19 while preserving reproducibility. The simulator was randomly set to one of six different diagnoses. All diagnoses had a single murmur, and multiple related findings including normal and abnormal heart sounds, lung sounds, carotid pulsations, jugular venous waveforms and precordial pulsations. Diagnoses included mitral stenosis, mitral regurgitation, atrial septal defect, mitral valve prolapse, aortic sclerosis and aortic stenosis.
A checklist was developed using templates from two textbooks11 ,20 and vetted by two clinical experts. Checklist items included the major aspects of a cardiac physical exam: carotid waveform and pulse, jugular venous waveform and pressure, first heart sound, second heart sound, extra sounds, murmur timing, murmur location, murmur radiation, murmur shape and precordial impulses. Residents had 5–8 years of experience with all of these physical exam components. Therefore, no special training was given prior to checklist use. The checklist was presented using an iPad with drop down menus for each physical exam component (figure 1).
Residents completed the study as part of a formative objective structured clinical exam. Residents were randomised into two groups using a computer generated random number. All residents completed three steps; only step 2 differed between the groups (figure 2). All residents completed the study in 15 min. In order to ensure residents completed all the steps, they were told to move to the second step after 7 min and the third step after an additional 4 min. All data were entered directly by the resident on an iPad.
Step 1: the simulator was set to one of six possible diagnoses based on a random number generated by the iPad. Residents were instructed to examine the simulator as they would ordinarily examine a patient. Residents provided a diagnosis, an estimate of their certainty on a subjective scale from 1 to 7 and a list of key findings used to make their diagnosis. Residents could record as many or as few key findings as they thought were important in arriving at their diagnosis.
Step 2 (4 min): residents were instructed to complete a checklist. Half were allowed access to the simulator and half were prohibited access to the simulator while using the checklist based on their random group assignment.
Step 3 (4 min): residents were asked a second time to generate a diagnosis, an estimate of their certainty and list of key findings. During this last step, residents were not allowed access to the simulator.
After completing the simulator station, residents were asked to subjectively rate the cognitive load involved in each of the three steps: (1) deciding on a diagnosis prechecklist use, (2) using the checklist and (3) deciding on a diagnosis postchecklist use. Cognitive load was measured using a previously validated 9-point scale where 1 represented minimal effort and 9 maximal effort.21
Diagnoses and findings were categorised as correct or incorrect. Accuracy of checklist completion was calculated by assigning one point to each item and dividing by the total number of items. Because of their non-normal distribution, data are described in medians, interquartile ranges (IQR) and means (µ).
The primary outcome was diagnostic accuracy. Accuracy prechecklist and postchecklist use was compared using a McNemar exact test, a non-parametric test for paired binomial data. Change in diagnostic accuracy was compared between the two groups of residents: those able and not able to re-examine the simulator using a non-parametric Fisher test for unpaired binomial data.
Certainty in diagnosis, correct and incorrect findings prechecklist and postchecklist were compared with Wilcoxon signed-rank testing. Differences between the two groups in certainty, cognitive load, findings and checklist accuracy were compared using Wilcoxon rank-sum tests. Differences in cognitive load among the three conditions were compared using a repeated measures model with the ability to re-examine entered as a covariate.
All statistics were done using SPSS V.20 (IBM computing, Redmond).
A total of 191 residents completed the study. Verifying decisions with a checklist resulted in improved diagnostic accuracy; 88 residents (46%) made the correct diagnosis before using the checklist compared with 97 (51%) afterwards, McNemar exact test p=0.04. No differences in certainty (5 IQR 4–5 µ 4.6 vs 5 IQR 4–5 µ 4.6, Wilcoxon rank z=−0.43 p=0.7), correct findings (3 IQR 1–4 µ 2.6 vs 2 IQR 1–3 µ 2.4, Wilcoxon rank z=−1.4 p=0.2) or incorrect findings (1 IQR 0–2 µ 1.2 vs 1 IQR 0–2 µ 1.1, Wilcoxon rank z=−1.5 p=0.1) were noted prechecklist and postchecklist use.
The benefit of checklist use on diagnostic accuracy was restricted to residents allowed to re-examine the simulator (10 of 95 changed from incorrect to correct diagnosis vs 2 of 96 in those unable to re-examine the simulator; table 1, Fisher exact test p=0.03). Those able to re-examine were slightly more confident after checklist use, whereas those unable to re-examine were slightly less confident after checklist use (0 IQR 0–1 µ +0.1 vs 0 IQR 0 µ −0.1, Wilcoxon rank z=−2.8, p=0.01). The ability to re-examine on the simulator had no effect on correct or incorrect findings reported (table 2). However, the ability to re-examine on the simulator was associated with better accuracy on checklist items (70 IQR 60–80 µ 70 vs 70 IQR 50–80 µ 64, Wilcoxon rank z=−2.2, p=0.02).
Reported cognitive load varied across each of the three experimental steps (5 IQR 4–7 µ 5.4, 4 IQR 3–5 µ 4.1, 5 IQR 3–6 µ 4.7 for generating a diagnosis prechecklist, using a checklist and generating a diagnosis postchecklist, respectively; F1,189=10.3 p=0.006). Reported cognitive load using the checklist was lower than either of the other two steps (F1,189=68 p<0.0001). Generating a diagnosis before the use of a checklist was associated with a higher cognitive load than generating a diagnosis after using a checklist (F1,189=35 p<0.0001). Ability to re-examine the simulator did not impact reported cognitive load (F1,189=0.1 p=0.7).
Residents who changed their diagnosis after checklist use were compared with residents who did not (table 3). The group of residents who changed their diagnosis from correct to incorrect was too small to be included in the comparison (n=3). Residents who corrected their diagnosis reported significantly fewer incorrect findings after checklist use (0 IQR 0–1 µ 0.3 vs 1 IQR 0–2 µ 1.1, Wilcoxon rank z=−2.3 p=0.02).
Verification with a checklist substantially improved diagnostic accuracy in this study. One diagnostic error was corrected for every 11 times the checklist was used. Furthermore, checklist use was not associated with increased cognitive load. The task of generating a diagnosis after checklist use was associated with lower cognitive load than generating a diagnosis prechecklist use.
Three aspects of our design require closer scrutiny. First, checklist use was sequential and not integrated into the original diagnostic process, which was left uninterrupted. Upfront checklist use can increase cognitive load by forcing clinicians to simultaneously juggle checklist items alongside their usual diagnostic process. We circumvented this problem through sequential checklist use, which did not appear to increase cognitive load. Second, the content of the checklist mirrored the routine diagnostic process. The checklist items followed the standard paradigm clinicians are taught to assess and report cardiac physical exam. Given the familiar nature of the content, and its relation to everyday expert reasoning, it may be more easily integrated into the diagnostic process. Third, the design of the checklist required the user to report on each checklist item rather than just acknowledging it. This more engaging style of checklist might prompt more diagnostic reconsideration.
The utility of checklists in improving diagnostic accuracy was entirely restricted to residents able to re-examine the cardiopulmonary simulator. Therefore, the benefit of the checklist was linked to recollecting information rather than simply reconsidering or integrating information. The precise mechanism of this benefit may be quite complex. While it is possible that a checklist simply allows clinicians to recollect relevant information, it might also allow for a re-interpretation of findings after the clinician has committed to a diagnosis. This may allow a clinician to better recognise any inconsistencies in the physical findings and his or her reasoning.
Interestingly, checklists did not increase the number of findings clinicians reported using in making their diagnosis. This remained at 3–5. The accuracy of these reported findings was likewise similar. However, the subgroup of clinicians who benefited from checklist use reported fewer incorrect findings. The simplest explanation for these results is that checklist use enabled these clinicians to correct an erroneous finding which had derailed their diagnostic decision making. However, an alternate explanation is that clinicians reasoned backward, removing incorrect inconsistent findings when asked to justify their diagnosis.
In fact, checklists had a much greater effect on diagnostic accuracy compared with reported findings. This suggests that while physical findings are the building blocks of diagnostic decisions, measuring how they are integrated is not straightforward. Clinicians are likely unaware of the system 1 processing involved in their decision making. Therefore, retrospective reporting of findings may be insensitive to the subconscious components of their diagnostic decision making. Clinicians may report findings to support their diagnoses, even when these findings are not involved in the diagnostic process. Such explanatory behaviour of intuitive system 1 decisions has been well documented in the psychology literature.22
Four limitations to the generalisation of these results deserve mentioning. First, these findings represent the exploration of a single clinical skill, namely, cardiac physical diagnosis. Replication in other settings would be important prior to widespread adoption. Second, only trainees were involved. How these findings apply to practicing clinicians is unclear and requires further study. Third, cognitive load was measured at the end of the station rather than directly after each step. While this was felt necessary to reduce the complexity involved in each step, it introduces the potential for recall bias. Fourth, cognitive load has several dimensions which can be measured separately (eg, see http://humansystems.arc.nasa.gov/groups/TLX/index.html). Measuring all dimensions of cognitive load was not practical in the current study. However, checklists may have differential effects on different dimensions of cognitive load which are not apparent when measured on aggregate. Of note, time on task was fixed in the current study and therefore unlikely to influence cognitive load greatly.
In summary, these findings are helpful for both practicing clinicians and researchers designing checklists. Verifying decisions with checklists can substantially improve diagnostic accuracy without increasing cognitive load. But, checklists cannot be applied indiscriminately. For a checklist to be helpful, clinicians must be empowered to re-collect data and perhaps be focused on detecting incorrect information in their initial dataset. Future research clarifying how checklists help will be useful in designing checklists for other settings. In addition, exploring how to better integrate checklists into routine practice, while keeping these limitations in mind, is worthwhile.
Contributors Study design was done by MS, AB, RC and JM. Data collection was done by MS and RC. Data were analysed by MS, AB, RC and JM. The manuscript was drafted by MS and revised by AB, RC and JM. No additional data are available.
Funding Dr Sibbald received funding from the Peter Munk Cardiac Center and the Ho Ping Kong Center for Excellence in Education in Practice, University Health Network.
Competing interests None.
Ethics approval Obtained from the University of Toronto Ethics Review Board.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.