Article Text

Application of human factors to improve usability of clinical decision support for diagnostic decision-making: a scenario-based simulation study
Free
1. Pascale Carayon1,
2. Peter Hoonakker2,
3. Ann Schoofs Hundt2,
4. Megan Salwei1,
5. Douglas Wiegmann1,
6. Roger L Brown3,
7. Peter Kleinschmidt4,
8. Clair Novak5,
9. Michael Pulia6,
10. Yudi Wang4,
11. Emily Wirkus7,
12. Brian Patterson6
1. 1 Department of Industrial and Systems Engineering, Wisconsin Institute for Healthcare Systems Engineering, University of Wisconsin-Madison, Madison, Wisconsin, USA
2. 2 Center for Quality and Productivity Improvement, University of Wisconsin-Madison, Madison, Wisconsin, USA
5. 5 UW Health, Madison, Wisconsin, USA
6. 6 Department of Emergency Medicine, University of Wisconsin-Madison, Madison, Wisconsin, USA
7. 7 Department of Population Health Sciences, University of Wisconsin-Madison, Madison, Wisconsin, USA
1. Correspondence to Dr Pascale Carayon, Department of Industrial and Systems Engineering, Wisconsin Institute for Healthcare Systems Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA; pcarayon{at}wisc.edu

## Abstract

Objective In this study, we used human factors (HF) methods and principles to design a clinical decision support (CDS) that provides cognitive support to the pulmonary embolism (PE) diagnostic decision-making process in the emergency department. We hypothesised that the application of HF methods and principles will produce a more usable CDS that improves PE diagnostic decision-making, in particular decision about appropriate clinical pathway.

Materials and methods We conducted a scenario-based simulation study to compare a HF-based CDS (the so-called CDS for PE diagnosis (PE-Dx CDS)) with a web-based CDS (MDCalc); 32 emergency physicians performed various tasks using both CDS. PE-Dx integrated HF design principles such as automating information acquisition and analysis, and minimising workload. We assessed all three dimensions of usability using both objective and subjective measures: effectiveness (eg, appropriate decision regarding the PE diagnostic pathway), efficiency (eg, time spent, perceived workload) and satisfaction (perceived usability of CDS).

Results Emergency physicians made more appropriate diagnostic decisions (94% with PE-Dx; 84% with web-based CDS; p<0.01) and performed experimental tasks faster with the PE-Dx CDS (on average 96 s per scenario with PE-Dx; 117 s with web-based CDS; p<0.001). They also reported lower workload (p<0.001) and higher satisfaction (p<0.001) with PE-Dx.

Conclusions This simulation study shows that HF methods and principles can improve usability of CDS and diagnostic decision-making. Aspects of the HF-based CDS that provided cognitive support to emergency physicians and improved diagnostic performance included automation of information acquisition (eg, auto-populating risk scoring algorithms), minimisation of workload and support of decision selection (eg, recommending a clinical pathway). These HF design principles can be applied to the design of other CDS technologies to improve diagnostic safety.

• diagnostic errors
• human factors
• decision support, clinical
• emergency department
• decision making

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

## Background and significance

Diagnostic errors have emerged as a critical patient safety issue.1–4 Technological solutions such as clinical decision support (CDS) have the potential to improve diagnosis and reduce diagnostic errors.1 5 However, in general, CDS adoption, use and impact have been limited,6 partly because of poor usability and workflow integration.7–10 Human factors (HF) methods and principles can enhance CDS usability, improve workflow integration and positively impact patient safety.8 11 This is especially important in settings such as the emergency department (ED) where diagnostic decisions, for example, making a pulmonary embolism (PE) diagnosis, are often made under time pressure and in chaotic environments,12 therefore leading to diagnostic errors.13 In this study, we applied a HF-based approach to design a CDS to support PE diagnostic decision-making in the ED. The study objective was to assess whether a HF-based approach improves CDS usability, including cognitive support to PE diagnostic decision-making, efficiency, workload and satisfaction.

### CDS technologies for improving diagnosis

Health information technologies (IT), including CDS, can facilitate access to patient-related information and support diagnostic decision-making.1 5 14 15 Several systematic reviews16–18 indicate the potential for CDS technologies to improve ED diagnostic processes. However, the evidence on the benefits of CDS for ED patient outcomes is rather weak,16 18 19 often because the technologies are neither accepted nor used.20 For instance, studies have demonstrated the positive (although small) impact of CDS on appropriate ordering of CT angiography (CTA) for PE diagnosis,18 21 22 but many ED physicians (nearly 27% in one study21) may not use the CDS because of the time and data entry requirements. Achieving improvements in diagnostic safety with CDS implementation partly depends on the usability of CDS technologies and their integration into clinical workflow.1 5 23 For PE diagnosis, CDS should support the diagnostic pathway and associated cognitive activities to avoid ordering wrong or unnecessary tests.22 Indeed, suspected PE is a challenging diagnostic scenario in the ED, with the morbidity and mortality of delayed or missed diagnoses24 25 needing to be weighed against potential harms of overtesting.26 In a study of 328 patients receiving CTA for PE diagnosis,27 emergency physicians indicated reasons for ordering a CTA: to confirm or rule out PE (93%), elevated D-dimers (66%) and fear of missing PE (55%). This latter reason (fear of missing PE) may represent a defensive behaviour, which can contribute to unnecessary radiation exposure.28 Systematic risk assessment of patients using validated rules such as the Wells’ score and the pulmonary embolism rule-out criteria (PERC) rule29 is recommended so that PE is appropriately diagnosed in a timely manner30 while avoiding unnecessary CTA scanning and radiation.28 CDS can support the diagnostic process of emergency physicians when suspecting PE by suggesting appropriate clinical pathways, which are in concordance with these validated rules (eg, ordering a D-dimer test for patient with moderate PE risk as opposed to proceeding directly to CTA).

### HF-based design of health IT

Health IT, such as CDS, need to incorporate HF methods and principles to improve usability.31–34 Clinical decision-making such as a diagnostic process involves cognitive activities of acquiring and analysing information and using this information to select a decision.1 HF experts35 recommend different levels of automation for cognitive activities at different information processing stages. High automation is useful for efficient information acquisition and analysis; the stage of decision selection should be supported by the CDS but not fully automated to preserve clinician judgement. Those HF design principles and usability heuristics, for example, consistency and visibility,36 37 should be used in designing CDS. See table 1 for the HF design principles applied to the CDS tested in this study.

Table 1

HF design of PE-Dx CDS

Few studies have demonstrated the value of HF-based design in actually improving the usability and safety of CDS technologies. For instance, a simulation study38 showed how the redesign of medication alerts based on HF design principles (eg, proximity compatibility) improved usability, for example, enhanced efficiency and reduced prescription errors. Studies that incorporate HF methods and principles have focused on medication safety,38 39 preventive care (eg, colorectal cancer screening40) and clinical reminders.41 The value of HF-based design of CDS to support the diagnostic decision-making process has been suggested,1 42–44 but has not been demonstrated yet.

### Study objectives

Using HF methods and principles, we created a CDS (known as PE-Dx CDS) to support the PE diagnostic pathway for ED physicians. The study objective was to assess whether the HF-based approach improved CDS usability and diagnostic decision-making as compared with a widely used web-based CDS. We assessed the new CDS (PE-Dx) and the web-based CDS on the three usability dimensions defined by the International Organisation for Standardisation45 (effectiveness, efficiency and satisfaction) and tested the following hypotheses:

• Hypothesis 1 (effectiveness). Because PE-Dx provided cognitive support for the diagnostic clinical pathway, it would be more likely to lead to appropriate decision regarding diagnostic pathway, and greater confidence in decision as compared with the web-based CDS.

• Hypothesis 2 (efficiency). Because the PE-Dx was designed according to HF principles (eg, automation and workload minimisation), it would be more efficient and would lead to faster performance, fewer clicks and scrolls, and lower perception of workload as compared with the web-based CDS.

• Hypothesis 3 (satisfaction). Because the design of the PE-Dx CDS incorporated HF methods and principles, it would be more satisfying and produce a more positive score of perceived usability as compared with the web-based CDS.

## Methods

### Organisational setting

We conducted an experimental study in collaboration with the ED of a large US academic teaching hospital, which has been using an EHR (Epic Systems) since 2008. The study was approved by the hospital’s institutional review board. We used the STARE-HI guidelines to write this article.46 47

### Description of the CDS for PE diagnosis (PE-Dx CDS)

While there are several strategies for evaluating patients presenting with symptoms of PE, validated clinical guidelines both nationally and specifically in use in the study ED suggest a tiered approach using two risk scores: the Wells’ score48 and PERC rule.49 50 These scores are available on websites and apps, for example, MDCalc, a widely used source for medical scores and algorithms (www.mdcalc.com). Patients are initially screened with the Wells’ score. If the Wells’ score is higher than 6, PE risk is high and cross-sectional imaging such as CTA should be ordered. If the Wells’ score is between 2 and 6, PE risk is moderate and guidelines recommend ordering a D-dimer (a blood test which, when normal, effectively excludes PE in intermediate to low-risk scenarios, and if positive prompts cross-sectional imaging). If the Wells’ score is lower than 2, PE risk is low. In this case, it is appropriate to apply the PERC. If any PERC criteria are positive, a D-dimer is recommended; if all PERC criteria are negative, a PE diagnosis can be excluded without any further work-up.29 49 51 An objective of this staged diagnostic process recommended by the American College of Physicians49 is to reduce patients’ exposure to radiation.

Using a HF-based approach, we created PE-Dx, a computerised CDS integrated in the EHR to support the PE diagnostic pathway. The human-centred design process relied on multiple, iterative steps that combined work system analysis, design sessions, focus groups and a heuristic evaluation,52 and incorporated HF design principles (table 1).

Using the flowsheet functionality in Epic, a clinical systems analyst of the participating hospital programmed the PE-Dx CDS based on a PowerPoint mock-up developed in the HF design process (table 1). PE-Dx incorporated Wells’ criteria for PE and the PERC rule in accord with hospital policy and established guidelines.49 The clinician first considered the results of the Wells’ criteria and, when a patient was ‘Wells’ low’, then s/he completed PERC. Data available in the EHR were used to auto-populate the PE-Dx CDS. Some auto-populated fields could be overridden by physicians (ie, heart rate and O2 saturation). A recommendation regarding the diagnostic pathway appeared when the final criterion was completed. If the Wells’ score was ‘low’ (ie, <2), the auto-populated PERC criteria appeared. An affirmative response to any of the PERC criteria concluded the process and a recommendation appeared to order a D-dimer. See figure 1 for screenshots of the CDS, and table 2 for a comparison of the usual diagnostic process (MDCalc) and the PE-Dx CDS process.

Table 2

Comparison of diagnostic decision pathway with MDCalc and PE-Dx CDS

Figure 1

PE-Dx CDS: wells’ criteria, wells’ score and recommendation (A); PERC criteria (B). PE-Dx CDS, clinical decision support for pulmonary embolism diagnosis; PERC, pulmonary embolism rule-out criteria.

### Study design

We conducted a scenario-based simulation study with a repeated measures design. Participants were first exposed to the web-based CDS used in clinical practice (MDCalc; www.mdcalc.com) and then to the PE-Dx CDS in two separate sessions, each lasting approximately 30 min and occurring about 1 week apart. Five randomly ordered patient scenarios were presented during each session (online supplementary appendix for scenario details). The scenarios represented the range of decisions (no further action, order D-dimer, order CTA scan) that physicians make when suspecting PE, and all possible combinations of the Wells’ score and the PERC rule.49 A physician on the study team drafted the scenarios, which were pilot tested with two other physicians. The scenarios were designed such that objectively presented data placed patients in one of five possible pathways, each with an appropriate decision:

### Supplemental material

1. Wells’ low (<2) / PERC negative; appropriate decision=no further action required

2. Wells’ low (<2) / PERC positive; appropriate decision=order D-dimer

3. Wells’ low (<2) / PERC positive due solely to age >49; appropriate decision=order D-dimer

4. Wells moderate (≥2 and≤6), PERC inappropriate; appropriate decision=order D-dimer

5. Wells’ high (>6), PERC inappropriate; appropriate decision=order CTA scan.

Differences existed between the scenarios of session 1 and those of session 2 to minimise recall; differences were minor and not relevant for PE diagnosis. The creation of the scenarios integrated best practices outlined by Russ and Saleem.53

### Participants

In all, 24 emergency medicine (EM) residents (eight from each of three residency classes) and eight EM faculty participated in the study. The study took place over a 10-week period from April through June 2018. Results of a power analysis showed that, for a power of 0.80 with five different scenarios, we needed a sample size of at least 21 participants to obtain an effect size of 0.70 with an alpha of 0.05; this was equivalent to recruiting six physicians in each of the four targeted physician groups (three residency years and attendings). To ensure sufficient power and given the uncertainty around completion of both sessions by participants, we recruited eight participants in each group, for a total of 32 physicians. Physicians were recruited via email, and all presented for both sessions. Total compensation of $100 was provided:$20 after session one and \$80 after session 2.

### Study procedures

For study purposes, the PE-Dx CDS ran in the Epic ‘Playground’, an electronic environment that mimics the actual electronic health record (EHR) with fictitious patients and is used for testing and training. Following a precise script (available on request), trained researchers explained to the study participants the goal of the experiment, that is, ‘to evaluate a new clinical decision support tool—CDS—to facilitate making decisions about PE’. They then directed the study participants to complete the following tasks for each patient scenario: (1) review prepopulated ED notes containing relevant clinical information on the patient, (2) complete MDCalc (Wells’ and PERC when necessary) in session 1 and the PE-Dx CDS in session 2, (3) indicate on the paper survey what clinical pathway they would follow (order nothing, D-dimer or CTA) and their level of confidence with the decision, and (4) complete an electronic survey. After completing the final scenario, the electronic survey included additional questions. On completion of session 2, we conducted a short debrief interview to ask participants’ preference for either MDCalc or PE-Dx and perceived barriers and facilitators for implementing PE-Dx. Comments by participants during debriefing are used as quotes when discussing study results.

### Data collection and measurement

We used multiple data collection methods to assess usability: screen capture software to video record task performance (Camtasia), stopwatch, paper survey and electronic survey.

#### Screen capture software

The screen capture software (Camtasia) started at the beginning of the session and recorded continuously every activity performed by participants until the end of the session. Researchers reviewed the video recordings and extracted data to assess:

• Number of clicks for each scenario. Counting of clicks used to navigate or choose/select an item began once the participant typed the patient's name and stopped when the participant documented his/her decision.

• Number of scrolls for each scenario. Counting of uninterrupted scrolls on a page or side bar started and ended similar to the counting of clicks.

• Number of navigation elements used for each scenario. Navigation elements included windows in the EHR (eg, ED provider note, medication list, medical history) and browser windows (eg, MDCalc for Wells’) that physicians interacted with during navigation.

#### Stopwatch

Time to complete each scenario was measured with a stopwatch on a smart phone. Start time was defined as the moment that the participant started typing the patient’s name in ED dashboard. End time was defined as the moment that the participant made his/her choice for the diagnostic pathway and recorded it on the paper survey.

#### Paper survey after each scenario

After completing each scenario, participants filled out a two-question paper survey. The first question asked for their decision regarding the diagnostic pathway: order nothing, a D-dimer test or a CTA. Their decision was compared with the guideline-supported decision for that particular scenario; this produced the measure of decision appropriateness. In the second question, participants indicated their confidence in their decision about the diagnostic pathway, measured on a 100-mm visual analogue scale, ranging from 0 (no confidence at all) to 100 (very high confidence).

#### Electronic survey at the end of each session

At the end of each session, participants answered 17 questions from the Computer System Usability Questionnaire (CSUQ).55 The original CSUQ has 19 questions, but two questions about error messages and online help were removed because MDCalc and PE-Dx did not have those features. The Cronbach’s alpha for the 17-item CSUQ was 0.95 in our study, which is similar to the reliability score in the original CSUQ study.55 The electronic survey used at the end of each scenario and at the end of each session can be found at https://cqpi.wiscweb.wisc.edu/wp-content/uploads/sites/599/2018/12/PE_DX_Questionnaire_Session_2_Final1.pdf.

### Data analysis

All outcome measures were transformed for normalcy in the models used for data analysis, and for the marginal graphics were back transformed to the original metrics for easier interpretation. Data analysis was conducted using a three-level (physician-session-scenario) empirical Bayesian (EB) regression model, allowing us to model the pre–post and scenario dependencies. We used a Bayesian estimation framework to analyse our models. Bayesian methods are a useful alternative to iterative generalised least squares estimation in multilevel regression models, particularly with smaller samples.56 To obtain the benefits of Bayesian methods, a prior distribution must be informative.57 We estimated our effects with EB Markov Chain Monte Carlo estimates,58 59 using data-derived estimates from our dataset as our EB informative priors.56 58 60 The Bayesian estimates and the analysis of our multilevel models were constructed using MLwiN V.3.02. We used the three-level EB model to obtain the estimates of marginal means and report the comparison of outcome measures for session 1 (MDCalc) and session 2 (PE-Dx).

## Results

### Participant characteristics

Eight of the 32 (25%) study participants were women and most (88%) were between 25 and 34 years old. Study participants used MDCalc primarily on the computer (75%) or their phone (25%). Most study participants used Wells’ score (81%) or PERC rule (91%) on MDCalc. All participants completed both sessions of the study.

### Effectiveness

The percentage of appropriate decisions increased from 84% using MDCalc to 94% with the PE-Dx CDS (p<0.01) (table 3). Physician confidence in their decision about the diagnostic pathway did not change with PE-Dx; confidence was high for both CDS (mean around 80 on a 0–100 scale).

Table 3

Impact of PE-Dx CDS on usability (mean score (standard deviation), effect size (95% CI) and p value) (n=32)

### Efficiency

Efficiency was measured with five variables; four of them were favourably impacted by the HF-based CDS and one variable was worse with PE-Dx (table 3). Physicians spent significantly less time per scenario with PE-Dx (117 s with MDCalc vs 96 s with PE-Dx; p<0.001). The difference of means was 20 s, which represented about 20% of time saved per scenario. Physicians used fewer scrolls per scenario (seven with MDCalc vs six with PE-Dx; p<0.001) and interacted with fewer interface elements with the HF-based CDS (16 elements with MDCalc vs 10 elements with PE-Dx; p<0.001). There were significant differences in the NASA TLX global and individual scores (except for performance) between the two CDS (figure 2). PE-Dx was associated with lower overall perceived workload, as well as lower scores on mental workload, physical workload, temporal workload, effort and frustration. The number of clicks per scenario increased slightly with PE-Dx (16 with MDCalc vs 18 with PE-Dx; p<0.01).

Figure 2

Impact of PE-Dx CDS on perceived workload. Results are shown as mean±SEM. Higher scores indicate higher perceived workload. The differences on the NASA-TLX subscales were statistically significant, except for performance. PE-Dx CDS, clinical decision support for pulmonary embolism diagnosis.

### Satisfaction

The CUSQ produced one overall satisfaction score, and three separate scores of (1) system usefulness, (2) information quality and (3) interface quality. The HF-based CDS performed better on all four measures (table 3). Physicians were more satisfied with the usability of PE-Dx compared with MDCalc, including overall perceived usability, usefulness (eg, ‘I can effectively complete my work using this system’), information quality (eg, ‘It is easy to find the information I needed’) and interface quality (eg, ‘The interface of this system is pleasant’).

In all, 28 of the 32 participating physicians (88%) indicated their preference for PE-Dx. One physician would continue using MDCalc as s/he may not agree with the recommendation of PE-Dx. Three other physicians were unsure which CDS they preferred.

## Discussion

In this study, we used HF methods and principles to design a CDS that supports the PE diagnostic process and compared this CDS (PE-Dx) to an existing web-based CDS (MDCalc). Results demonstrated that the HF-based CDS produced better usability and improved the PE diagnostic process. The majority of our hypotheses were supported: 9 of the 11 variables of usability improved with PE-Dx (table 3). Improvements in both objective (eg, time) and subjective (eg, satisfaction) measures of usability were achieved with PE-Dx. Physicians reported greater satisfaction with PE-Dx, including usefulness, information quality and interface quality (table 3). After being exposed to it, a majority of physicians (88%) indicated their preference for PE-Dx.

### Cognitive support of PE-Dx CDS to PE diagnostic decision-making

PE-Dx was more effective than MDCalc as it provided cognitive support for the PE diagnostic pathway: 94% of physicians chose the appropriate clinical pathway when using PE-Dx (vs 84% with MDCalc). With MDCalc, physicians could decide not to follow the sequence of computing the Wells’ score first and then looking at PERC if appropriate. In a follow-up analysis,61 we found that study participants followed the recommended workflow almost always with PE-Dx (98%), and only half of the time with MDCalc (51%). PE-Dx was built to guide physicians in a clinical workflow that started with the Wells’ score and continued with the PERC rule; this led to a clear recommendation to either do nothing, order a D-dimer or order a CTA.

PE-Dx helped physicians make a decision about which clinical pathway to take (HF design principle of support for decision selection).35 43 However, in nine cases (out of 160 cases or 6%), physicians did not make the appropriate decision. In one case, the study participant made a data entry error, which led to a recommendation different from the one intended for this specific scenario. In seven cases, study participants chose no further testing, whereas PE-Dx recommended a D-dimer as the Wells’ score was below two with PERC positive; in one case, the physician ordered a CTA, whereas PE-Dx suggested a D-dimer. It is possible that in those eight cases (out of 160 cases) physicians disagree with the recommendation from the CDS or did not trust the CDS.62 But we are unsure why study participants made those decisions as we did not ask them to think aloud so that we could obtain accurate estimates of time needed to complete tasks.

Using clinical decision and prediction rules has been identified as a health IT functionality to support making a diagnosis, ordering tests and determining a diagnostic plan5; this is what PE-Dx does. PE-Dx does not fully automate the decision selection, but provides a recommendation to the physician regarding next steps, that is, doing nothing, ordering a D-dimer or ordering a CTA. This partial automation of decision selection is a HF design principle.35 High automation at the stages of information acquisition and analysis is another HF design principle applied to PE-Dx: some criteria were auto-populated (information acquisition) and scores were automatically calculated (information analysis) (table 1). Physicians liked this PE-Dx CDS functionality as indicated in a debrief interview: I like that it pulls the vital signs in particular and also is smart enough to look at the entire vital signs from the entire encounter, another area in which I do feel more confident; that a computer would be able to do that more reliably than me with less error long-term.

### Saving time and reducing workload with PE-Dx CDS

Physicians completed the scenarios about 20% faster with PE-Dx, and used fewer scrolls and navigation elements (eg, browser windows, EHR elements). During a debrief interview, a resident confirmed the importance of saving time with PE-Dx: It took the criteria from the Wells’ score that matched the PERC score and just auto-populated that as well, which is extremely, you know, efficient. That would help the workflow a lot. Instead of having to go back and forth in MDCalc and trying to type in the numbers between MDCalc and Epic, it would be much quicker. However, we found that PE-Dx increased the number of clicks per scenario. On the one hand, when using PE-Dx a physician needs fewer clicks because s/he does not have to leave the EHR, open a browser, and go back and forth between the EHR and MDCalc website (table 2). On the other hand, PE-Dx requires more clicks as the physician needs to provide information for all seven Wells’ criteria, whereas MDCalc does not require the user to answer all questions before calculating the Wells’ score (table 2). We made the decision to ask physicians to address all Wells’ criteria to avoid wrong decisions (principle of ‘error prevention’ in table 1). Overall, the PE-Dx CDS had a slightly higher number of clicks per scenario than MDCalc (table 3). Even though PE-Dx slightly increased clicks, physicians reported lower workload, except for the performance dimension (figure 2).

Several HF design principles contributed to the impact of PE-Dx on efficiency, in particular auto-population and minimising workload. For the Wells’ score, only one criterion was auto-populated from the EHR, and for PERC, six criteria were auto-populated. Future improvement in the PE-Dx CDS should include additional auto-population, for example, using natural language processing. A benefit of PE-Dx was to use data entered in the Wells’ score to populate the PERC rule; this reduced data entry. As indicated above, automation of information acquisition is a HF design principle,35 which was applied to PE-Dx. This was confirmed as a good approach by a participating physician: It is great that it is all within (the EHR)… and easily accessible with one click. That it auto-populates is a huge function. To be able to take out that step of having to manually look up information and input information,… I felt (it) sped things up entirely and also made me more confident in the sense that it took out my own doubts about my ability to consistently transcribe information.

The performance subscale of the NASA TLX produced relatively high scores (means around 7 on a 1–10 scale) for both CDS. Physicians perceived high satisfaction with their performance on the experimental tasks; this matched the data on confidence in decision that was also high and similar for both CDS (table 3).

### Evidence for value of HF-based design

Our study provides empirical evidence for the value of HF-based design in increasing usability of health IT, such as CDS, and improving diagnostic decision-making. We showed that a human-centred design process integrating multiple HF methods and design principles (table 1) produced a more usable CDS that provided cognitive support for the diagnostic process. The HF design principles implemented in PE-Dx (table 1), in combination with other HF criteria such as those in heuristic usability evaluations,36 37 could be applied to other CDS, in particular CDS targeting diagnostic processes that involve cognitive activities of information gathering, information integration and interpretation, and working diagnosis.1

The HF design principles can address the information fragmentation and overload of current EHR technologies, which contribute to physicians’ frustration, stress and burnout.63 Our HF-based CDS improved efficiency (eg, faster performance, lower perceived workload), although the number of clicks per scenario was slightly higher with PE-Dx (table 3). These positive results were likely related to auto-population, minimisation of data entry and chunking/grouping, that is, HF design principles aligned with suggestions made by ED physicians early in the design process (table 1). Therefore, a HF-based design that involves analysing the actual work of clinicians, participation from clinicians in the design process and implementation of HF design principles can enhance the usability of health IT such as CDS. As data continue to emerge about poor usability and workflow integration of health IT and adverse patient safety consequences,64 it is essential to integrate HF methods and principles in technology design processes.11 65–68 This is particularly important with the emergence of artificial intelligence, which can be used to design CDS technologies that complement and enhance clinical decision-making.69–71

### Strengths and weaknesses of the study

Findings of this scenario-based simulation study may not extend to the actual implementation of PE-Dx. When a technology is implemented in the real-work environment, challenges may occur that affect technology acceptance and use. Oftentimes, challenges are related to problems with technology usability and workflow integration, and misfit of the technology with the rest of the work system.72 As our CDS technology was designed according to HF methods and principles, its usability was rated positively in our experimental study. But implementation issues may still arise. Therefore, future research should evaluate the actual implementation of PE-Dx, its use in the clinical environment, and its impact on key process and outcome measures, for example, appropriate use of lab or imaging tests. This research should use qualitative and quantitative research methods,73 as well as objective and subjective measures of usability.15 This follow-up research can assess two functionalities of PE-Dx that were not tested in the simulation study: automated creation of lab test order (D-dimer) or imaging study (CTA) in computerized provider order entry (CPOE), and automated documentation of decision in ED provider note. Because we focused on cognitive support provided in the diagnostic pathway, the experimental setup and data collection stopped right after the physician made a diagnostic decision regarding next steps (order nothing, D-dimer or CTA).

Because the study took place in a single ED of a US academic hospital, it is difficult to generalise the results to other settings and emergency physicians. A strength of the study was its experimental approach; therefore, differences between the sessions are likely attributable to differences in the two CDS technologies (MDCalc and PE-Dx). Whereas scenarios were presented in a random order during each session, the order of CDS presentation was not counterbalanced: participants were first exposed to their current CDS (MDCalc) in session 1 and to PE-Dx in session 2 to assess the incremental impact of using the new HF-based CDS compared with usual practice. The lack of counterbalancing of conditions could affect our results if the participants’ performance improved over time with their involvement in the study, which is unlikely. Further bias may have occurred because neither participants nor researchers were blinded; however, this is unlikely as we used a repeated measures design and implemented study procedures with detailed scripting of instructions given to participants by trained researchers. Another strength is that improvements were shown on all usability dimensions: with the newly designed CDS, the diagnostic pathway of PE was more effective and efficient, and physicians perceived lower workload and were more satisfied with the technology.

## Conclusion

Ensuring that HF principles are incorporated into CDS design is critical to improve CDS usability, which includes supporting appropriate diagnostic pathway, saving time, reducing perceived workload and improving physician satisfaction with the technology. This research suggests that HF principles such as automating information acquisition (eg, auto-population) and information analysis (eg, computing risk scores), minimising workload (eg, minimising data entry) and supporting decision selection are key to improving CDS usability and consequently diagnostic safety. Clinicians report greater satisfaction with a CDS designed according to HF principles. Future research should examine the effectiveness of these HF principles for other health IT and assess the implementation of PE-Dx and its clinical impact.

## Acknowledgments

We would like to thank the residents and attending physicians who participated in the study.

• ## Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

## Footnotes

• Correction notice The article has been corrected since it was published online first. Some minor changes have been done in table 3.

• Contributors PC, ASH, PH, BP and DW designed the study. All authors were involved in the human-centered design process, data collection and data analysis. All authors reviewed the manuscript before submission.

• Funding This research was made possible by funding from the Agency for Healthcare Research and Quality (AHRQ), Grant Numbers: R01HS022086-Principal Investigator: Pascale Carayon, and K08HS024558-Principal Investigator: Brian Patterson; and was supported by the Clinical and Translational Science Award (CTSA) program, through the NIH National Center for Advancing Translational Sciences (NCATS), Grant Number: 1UL1TR002373. The content is solely the responsibility of the authors and does not necessarily represent the official views of the AHRQ or NIH.

• Competing interests None declared.

• Patient consent for publication Not required.

• Provenance and peer review Not commissioned; externally peer reviewed.

• Data availability statement No data are available.