Article Text

Download PDFPDF

Exposure to incivility hinders clinical performance in a simulated operative crisis
  1. Daniel Katz1,
  2. Kimberly Blasius2,
  3. Robert Isaak2,
  4. Jonathan Lipps3,
  5. Michael Kushelev3,
  6. Andrew Goldberg1,
  7. Jarrett Fastman1,
  8. Benjamin Marsh1,
  9. Samuel DeMaria1
  1. 1 Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
  2. 2 Anesthesiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
  3. 3 Anesthesiology, Ohio State University, Columbus, Ohio, USA
  1. Correspondence to Dr Daniel Katz, Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai Department of Anesthesiology Perioperative and Pain Medicine, New York city, NY 10029, USA; daniel.katz{at}


Background Effective communication is critical for patient safety. One potential threat to communication in the operating room is incivility. Although examined in other industries, little has been done to examine how incivility impacts the ability to deliver safe care in a crisis. We therefore sought to determine how incivility influenced anaesthesiology resident performance during a standardised simulation scenario of occult haemorrhage.

Methods This is a multicentre, prospective, randomised control trial from three academic centres. Anaesthesiology residents were randomly assigned to either a normal or ‘rude’ environment and subjected to a validated simulated operating room crisis. Technical and non-technical performance domains including vigilance, diagnosis, communication and patient management were graded on survey with Likert scales by blinded raters and compared between groups.

Results 76 participants underwent randomisation with 67 encounters included for analysis (34 control, 33 intervention). Those exposed to incivility scored lower on every performance metric, including a binary measurement of overall performance with 91.2% (control) versus 63.6% (rude) obtaining a passing score (p=0.009). Binary logistic regression to predict this outcome was performed to assess impact of confounders. Only the presence of incivility reached statistical significance (OR 0.110, 95% CI 0.022 to 0.544, p=0.007). 65% of the rude group believed the surgical environment negatively impacted performance; however, self-reported performance assessment on a Likert scale was similar between groups (p=0.112).

Conclusion Although self-assessment scores were similar, incivility had a negative impact on performance. Multiple areas were impacted including vigilance, diagnosis, communication and patient management even though participants were not aware of these effects. It is imperative that these behaviours be eliminated from operating room culture and that interpersonal communication in high-stress environments be incorporated into medical training.

  • anaesthesia
  • crisis management
  • medical education
  • patient safety
  • simulation

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Effective communication is a cornerstone of safe patient care, along with clinical excellence.1 This is particularly true in the perioperative environment, where a lapse in communication between team members can permit rapid precipitation of adverse events.2 3 One potential threat to communication and medical/technical skills in the operating room (OR) is incivility, defined here as rude, dismissive or aggressive (RDA) behaviour(s) that impede professional relatedness.

Incivility creates interpersonal conflict and can impair diagnostic and technical performance,4 5 thereby increasing patient safety risks.2 6 The negative consequences of incivility have been well described in non-medical industries as well,7 where researchers have shown that even witnessing workplace incivility impairs performance and attention.8 9 Incivility is a pervasive issue for anaesthesiologists; 98% of anaesthesiologists in one survey reported being exposed to disruptive behaviours,10 and trainees have reported being subjected to RDA behaviours several times per week.6 The hierarchical structure of surgical teams may engender an atmosphere of intimidation and impede residents’ likelihood to challenge superiors (in their own specialty or otherwise), even when something unsafe or medically deleterious is occurring.11 12

The vast majority of attending anaesthesiologists believe that residents are comfortable voicing concern and communicating with surgeons on their own. However, only half of residents surveyed report that this is the case.13 14 Efforts to provide residents with tools for challenging OR hierarchy and dealing with difficult communications have varied in their effectiveness, but the presence of these efforts is a clear display of need.15–17 Indeed, struggles among team members in the OR have been a known issue in anaesthesiology for decades. Gaba et al 18 reported survey data 25 years ago that showed anaesthesia providers experience internal pressures to ‘get along with surgeons’ and external pressures to proceed with cases or hasten their work, leading to lapses in patient care. In the interim, little has been done to examine exactly how these pressures, and the incivility that often drives them, affect medical/technical and non-technical performance required for the safe provision of anaesthesia.

In this study, we sought to determine how incivility on the part of a simulated surgeon influenced anaesthesiology resident performance during a standardised simulation scenario of occult haemorrhage during laparoscopic surgery. We hypothesised that the overall performance of anaesthesiology residents experiencing and witnessing incivility would be compromised as compared with residents who encountered a courteous surgeon.


We conducted a prospective, randomised, observational study at three academic medical institutions. Study participants were recruited from within the anaesthesiology residency programmes at the Mt. Sinai Health System (New York, New York), the Ohio State University (Columbus, Ohio) and the University of North Carolina (Chapel Hill, North Carolina). Approval from the programme for protection of human subjects was obtained at each institution and written informed consent was obtained from all subjects.

All participants were Categorical (CA) Year 1–3 anaesthesiology residents in good standing at their respective departments. Residents with any administrative disciplinary actions, hiatuses from clinical practice or other potentially confounding, performance-affecting variables (eg, failing in-training examination scores, below-average faculty performance appraisals, loss of training credit) were excluded. Given the nature of the data collection points described below, and our intention to recruit a balanced group of each CA year per study group, we did not feel as though this would confound our results. Residents were told they were being asked to partake in a simulation experience and had the opportunity to opt out of participation; however, no subjects opted out. At no time were they made aware of the purpose of the study. We used a previously validated simulation scenario for our study, with performance standards dictating the essential actions that must be performed (or not performed) to avoid major morbidity or mortality, and a behaviourally anchored rating scale (BARS) to measure technical and non-technical performance in four main domains: vigilance, medical decision-making, communication and teamwork (see below in Data collection and measures section).

Throughout, we designated items or domains as medical/technical if they involved typical diagnostic and/or therapeutic actions a clinician would perform to handle a medical problem or simply approach a patient encounter. Non-technical items involved decision-making, leadership and team interaction processes such as discussions with team members, situational awareness and calling for help.19


Our simulated encounters were adapted from one of the Agency for Healthcare Research and Quality–funded study scenarios designed to gauge performance of board-certified anaesthesiologists during a simulated intraoperative haemorrhage (previously described by Weinger et al 20). Our experimental and control study scenarios were identical in all respects except for the dialogue and demeanour of the simulated surgeon (online supplementary appendix 1).

Supplemental material

In brief, the scenarios differed between groups only in respect to the surgeon’s scripted dialogue with the participant and an actor portraying the circulating nurse. The same cohort of actors served as the surgeon and nurse within each departmentally administered scenario in order to minimise variability in scenario delivery at each site. The experimental group’s surgeon was portrayed as impatient, but not overtly intimidating (ie, actors were instructed not to use inappropriate language, become physically intimidating or scream). The control group’s surgeon was courteous and the interactions straightforward. The dialogue used was reviewed by an independent panel of five board-certified anaesthesiologists at the primary site, but not involved in the study design or execution. This panel was tasked with determining if the scripted dialogue and behaviours were considered unlikely, likely or very likely to be encountered by a typical anaesthesiology resident over the course of their training. After reviewing a sample video of the planned scenario in its ‘rude’ form, and reviewing the scenario script, unanimous agreement was achieved that the planned scenario portrayed a situation very likely to be encountered in actual practice.

Each scenario was standardised as described by Weinger et al,20 using a guide that delineated details of the two scenario versions, scripts and ‘rules’ for scenario delivery (eg, contents of the simulated clinical environment, evolution of the patient’s medical presentation and their responses to interventions, standardised answers to anticipated participant questions and criteria that defined successful completion of each expected action). Each script outlined the timing and content of key phrases or comments to be made by the actors portraying the surgeon or nurse.

Data collection and measures

Four exemplar videos were created showing gradations of good to poor performance on the control scenario. Prospective raters (three board-certified anaesthesiologist volunteers) participated in a 1-day, in-person training session. They were instructed on the use of the rating software and practised viewing and rating the exemplar videos. Rater calibration was assessed during training until the raters’ checklist ratings matched the consensus ratings exactly, and their non-technical scores were no more than one point from the consensus ratings in each domain. The raters were blinded as to the purpose of the research study. In each batch of videos to be reviewed, reviewers got a mixture of rude and normal videos and received their batched videos several days apart (ie, after finishing three ratings, they received their next batch of videos at least five working days later).

For each study case, high-quality digital and audio recordings were collected. The videos were made anonymous (ie, face and voice were altered) by the study team and then assessed by two raters who graded performance independently and were blinded to the source institution. Videos that were deemed unusable according to previously validated standards21 were removed. Three raters completed training and each ultimately rated a mixture of both rude and control scenarios. Raters received batches of videos in a predetermined order that ensured equal distribution and sequence of rude and control videos. The same rater was not assigned multiple encounters conducted at a single site on the same day and so received three videos at a time over the course of 6 weeks. The raters were instructed not to score a performance if they recognised a participant due to inadequate anonymisation; this did not occur during the study.

Performance was assessed using the scoring rubric employed previously for the laparoscopic haemorrhage scenario (online supplementary appendix 2) as described by Weinger et al.20 In brief, participants were scored on (1) the completion of checklist items, so-called critical performance elements (CPEs), which included a mixture of both technical and non-technical actions expected to be performed to manage the case successfully; (2) ratings in four domains using the BARS for vigilance, decision-making, teamwork and communication; and (3) a yes/no answer to the question, “Did the participant perform at a level expected of an anaesthesiology resident?”. If there was a divergence in scoring of greater than one point or in a yes/no field, a third anaesthesiologist graded the video. Dichotomous items were then scored based on majority vote, while performance scoring was then averaged between the three scores.

Supplemental material

Personality surveys aimed at eliciting perceptions of incivility and/or criticism were collected from participants at least 4 weeks prior to the experimental encounter. These measures were included in order to minimise confounding personality variables that might affect performance when coupled with the planned simulated intervention. Reviewers did not have access to these personality data during the grading process. The surveys used were the Brief Fear of Negative Evaluation Scale and the Sensitivity to Criticism Scale.

The Brief Fear of Negative Evaluation Scale is a validated, widely cited questionnaire designed to measure fear of negative evaluation, found in psychological literature as an assessment of social anxiety.22 23 The scale is composed of 12 statements of fearful or worrisome situations, and respondents indicate the extent to which each item describes oneself on a 5-point Likert scale. The numerical responses to the questionnaire are summated for a cumulative score.

The Sensitivity to Criticism Scale is a survey tool designed to measure perceptual and emotional responses to criticism.24 Subjects are asked to imagine themselves in situations that provide a range of domains in which criticism would take place, totalling to a 30-item survey. Responses for all items are made on a 7-point Likert scale and are summed and averaged, resulting in an aggregate index of sensitivity to criticism. Two scales were chosen to ensure capture of minor behavioural differences. There is a moderate level of correlation between the two measures.24 Participants were also asked to rank the realism of the scenario, surgeon and circulating nurse based on a 1–5 Likert scale (with 1 being not at all realistic and 5 being most realistic). Self-assessment was also performed on a 1–10 Likert scale (with 10 being superior performance). All recorded data points were stored on Research Electronic Data Capture (REDCap), a secure web server for analysis.

Statistical and power analysis

Statistical analysis was performed using the IBM Statistical Package for Social Sciences (V.23) in consultation with a biomedical statistician. An a priori power analysis based on the primary outcome, percentage meeting standard for adequate performance as measured by a global binary score, was performed based on the data on the haemorrhage scenario by Weinger et al.20 Given that study population was more experienced than ours, we assumed a lower baseline pass rate than the 85% described in their study. Assuming a 75% pass rate with a meaningful difference of 25% (ie, 50% pass rate), 55 subjects would be needed to reach a significance level of 0.05 and a power of 0.8. Given the multisite nature of the study and the need to account for AV difficulties and video transportation and quality at each site, we aimed to recruit 75 participants. Results are reported as mean (SD) or median (IQR) depending on their normality as determined by Shapiro-Wilk test and visual inspection of distributions. Statistical tests performed are reported with their corresponding results and include t-tests for normally distributed continuous variables, Mann-Whitney U tests for non-normally distributed continuous variables as well as χ2 test for categorical variables. In addition to grouping by intervention, separate statistical analysis was performed between sites to assess for similarity (see online supplementary table 1). Likewise, Cohen kappa statistics for inter-rater reliability were performed to assess for agreement between reviewers. Univariate binary logistic regression was performed to assess the impact of incivility on whether or not the participant met the standard for adequate performance. Multivariable binary logistic regression was then performed to assess the same question controlling for confounders including gender, age, site location, post graduate year, gender of the surgeon, prior simulation experience, as well as personality test scores. All statistical tests were two sided using 0.05 as the threshold for statistical significance.

Supplemental material


Seventy-six participants performed in 76 individual study encounters (figure 1). No residents refused to participate in the study. All participants completed the intake psychological assessments. After elimination of nine videos due to poor quality (eg, incomplete videos, inadequate video feeds to perform assessment, audio too faint to grade actions), 67 encounters (88%) were included for final analysis. Differences between sites were limited to a higher proportion of female subjects at one site as well as use of a female surgeon at that same site (online supplementary table 1).

Self-reported data

Table 1 shows demographic information by study group including psychological testing scores and participant feedback regarding scenario realism. There were no significant differences between study groups on any measured value.

Table 1

Demographic information and subject feedback on scenario

Table 2 shows participant-reported survey values. Scenario, circulating nurse and surgeon realism were rated highly and did not differ between the groups. Participants perceived the impact of surgeon behaviour on their performance differently following participation in the study: over 65% of the experimental group felt the surgeon negatively affected their performance (compared with a control group rating of less than 25%, p=0.009). Self-reported ratings of individual performance, however, did not differ between groups (p=0.112).

Table 2

Participant-reported values

Performance data

Cohen’s Kappa scores for inter-rater reliability for scored portions of the scenario ranged from 0.350 to 0.613, indicating adequate agreement. Several CPEs (5 of 9) on the patient care checklist differed between groups (table 3). Figures 2 and 3 display the total score distributions between groups on the BARS scales, demonstrating lower scores in the experimental group in each domain (raw statistical scores available in online supplementary table 2). A higher percentage of participants in the control group were rated as a ‘yes’ on the question of whether they performed at the level expected of an anaesthesiology resident (91.2% of control group vs the 63.6% of experimental group, p=0.009).

Supplemental material

Table 3

Patient management scoring on haemorrhage scenario critical performance elements (CPEs)

Binary logistic regression was then performed to determine which variables were predictive of adequate performance rated by this question. The multivariable model included psychological testing scores, CA year, gender, age, prior simulation experience, gender of the surgeon and the presence of incivility (online supplementary table 3). The presence of incivility was the only item associated with this performance measure (OR 0.110, 95% CI 0.022 to 0.544, p=0.007), with a higher likelihood of an answer of ‘no’ in participants exposed to incivility.

Supplemental material


Incivility is a potential source of interpersonal conflict and a latent threat to effective communication, particularly in situations where surgical team hierarchy is at play.2 6 12 14 While decrements in physician performance in the setting of a ‘rude’ interaction have previously been demonstrated in the simulated setting, little work has been done to see how such an interaction affects anaesthesiology trainee performance in the operating room.4 5 Since analysing the impact of incivility in real time in an actual clinical setting would likely prove unethical, the simulated environment provided an ideal venue for such an investigation. Here, we demonstrate that exposure to incivility in the (simulated) operating room has a negative impact on anaesthesia trainee performance in several domains including technical skills, non-technical skills and a binary global performance metric. Overall, 91.2% of control group participants were rated as performing at their expected level, compared with only 63.6% of those exposed to incivility. This performance difference was maintained following multivariate examination and was observed across both medical/technical and non-technical domains. The magnitude of this difference should give pause to anaesthesiology and surgery faculty, assuming there is any translation to actual clinical performance.

We observed that medical decision-making was quite vulnerable to incivility. Indeed, decision-making was rated lower as one of the four holistically rated performance domains, but also on most of the relevant medical/technical CPEs. This suggests participants felt reluctant to communicate with the simulated surgeon and were also sufficiently ‘rattled’ so as to miss crucial elements like bolusing intravenous fluids, lessening the anaesthetic agent and calling for blood in a timely fashion. This despite the fact that the dangers of hypotension and resultant end-organ damage in the anaesthetised patient have been extensively described, learnt early in training and consistently reinforced by attending anaesthesiologists.25–28

In addition to failing to treat hypotension in general, the experimental group, presumably hampered by RDA behaviours, was less likely to make the diagnosis of acute haemorrhage as the cause for the change in vital signs. Diagnostic errors represent a significant, and often underappreciated, source of medical error.29–31 A delay in diagnosing or inappropriate diagnosis in the case of acute haemorrhage can be especially costly.32 It should be highlighted that the self-rated performance did not differ between groups, but those in the experimental group did feel the surgeon’s demeanour affected their performance. This supports the idea that self-report is generally not a good determinant of actual performance and should lead other groups to examine similar performance data and forego simple self-report.

While the witnessed decrement in medical/technical performance is concerning, equally alarming are the differences between groups in each of the non-technical domains. Participants exposed to RDA behaviours were inferior with respect to vigilance (p<0.001), communication (p<0.001) and teamwork (p<0.001). It is well documented that medical errors in the operating room are widespread and largely preventable, with breakdowns in teamwork, communication and situational awareness often cited as root causes.33 34 Dutton et al showed in an analysis of malpractice claims related to perioperative haemorrhage that 60% of events had at least one communication breakdown, half of which occurred between the surgeon/obstetrician and anaesthesia provider.32 Our findings reproduce and illustrate the potentially dire consequence of this dynamic and raise the point that perhaps exposure to RDA behaviours may be partially responsible for these breakdowns. Furthermore, it has been shown that the complex team dynamics required to take care of the surgical patient frequently lead to tense interactions with disproportionate effects on trainees.35

It is important to acknowledge that surgeons tend to have a different perception of communication in the OR as compared with the rest of the OR team. While 60% of OR team members feel that the attending surgeon’s tone and mood can drastically affect communication in the OR, only 36% of surgeon share this view.36 Our study shows that in addition to its effects on medical decision-making and communicative breakdown, the presence of RDA behaviours diminished the participant’s likelihood of seeking help. Improving help-seeking behaviours by both surgeons and anaesthesiologists during complex OR scenarios has long been a point of emphasis in crisis resource management and improving patient safety.1 18 37 Furthermore, the traditional surgical culture valued autonomy and decisive action does not align with calling for help and can be detrimental to patient safety.37

Our use of a rigorously developed, standardised scenario allowed for objective assessment of trainees’ clinical performance in the presence of incivility. The results have relevance for efforts to improve quality of care and patient safety, particularly through the use of simulation-based training. Though effective interpersonal interaction in high-stress scenarios is not traditionally incorporated into medical training, it may be beneficial to provide such training in the perioperative specialties given the threat that poor communication poses to patient safety. It is enlightening for healthcare providers to accept that incivility is not a victimless crime without a direct line to the patient. Also, incivility may be professionally counterproductive. In a recent study of a biotechnology firm, those that were considered civil were twice as likely to be viewed at leaders and enhance performance of those around them.38 While our intervention was neither designed to be overly intimidating nor directed exclusively at the participant, it is possible that more extreme RDA behaviours directed specifically at the anaesthesia trainee may have resulted in an even more pronounced divergence between groups.

The findings of our study are limited by the fact that the interactions occurred in a simulated environment, which despite our best attempts at realism can never perfectly replicate a real-life crisis. Unlike our scenario, it is rather likely that most anaesthesia providers will have had some prior interaction with their surgeon. This lack of familiarity in the present study may have, in turn, affected participant performance, particularly as it relates to communication. However, a recent study on familiarity and communication in the OR by Frasier et al showed that being familiar with the team does not always improve communication.39 While our scenario specifically addressed the effects of a surgeon’s demeanour on anaesthesia resident performance, this represents an oversimplified team dynamic which in real life is nuanced and complex. Effective team dynamics require civility and cooperation on the part of the anaesthesiologist, the surgeon and the nurse. Future studies might randomise the embedded ‘rude’ actor to be either a circulating nurse or anaesthesiologist to see how each subgroup is affected by the interaction. A more robust study might even assess the effect of an entire subgroup’s civility on the team. This, however, would require a study design with a higher degree of complexity and massive scope.

Finally, our study looked at trainee behaviour during one particular non-routine event—a haemorrhagic crisis. We acknowledge that ‘rude’ interactions may have a different effect on the anaesthesia trainee during other critical events or even in the context of more routine patient care. Future studies are needed to determine the effect of incivility and medical hierarchy on healthcare provider performance in a variety of clinical environments and levels of urgency. Likewise, we can only speculate on the effects of RDA behaviours on other anaesthesia providers—certified nurse anaesthetists, anaesthesia assistants or board-certified anaesthesiologists—whose training and backgrounds differ.

Despite the aforementioned limitations, our results support the existing body of evidence that one’s content and character of behaviour in the workplace can affect the performance of those around them. As such, it may call to action innovative efforts to improve intraoperative communication through simulation-based assertiveness training and/or civility training to guard against the deleterious effects of RDA behaviours.

Figure 2

Summary of BARS (behaviourally anchored rating scale).

Figure 3

Global binary score data.



  • Contributors All of the authors listed on this manuscript have met the requirements for authorship as set forth by the journal.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Obtained.

  • Ethics approval Icahn School of Medicine PPHS Office (IRB) We were given an IRB exemption for our study #16-00623.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No data are available.