Article Text

Errors in after-hours phone consultations: a simulation study
  1. Erel Joffe1,2,
  2. James P Turley1,
  3. Kevin O Hwang3,
  4. Todd R Johnson1,
  5. Craig W Johnson1,
  6. Elmer V Bernstam1,3
  1. 1School of Biomedical Informatics, The University of Texas Health Science Center, Houston, Texas, USA
  2. 2Department of Hematology and Bone Marrow Transplantation, Tel Aviv Medical Center, Tel Aviv, Israel
  3. 3Department of Internal Medicine, Medical School, The University of Texas Health Science Center, Houston, Texas, USA
  1. Correspondence to Dr Elmer Bernstam, School of Biomedical Informatics, The University of Texas Health Science Center, 7000 Fannin, Suite 600 Houston, TX 77030, USA; Elmer.V.Bernstam{at}


Background After-hours out-of-hospital phone consultations require physicians to make decisions based on information provided by a nurse over the phone.

Methods We conducted a simulation study to evaluate physicians’ actions following communication of key information. 22 nurses were asked to call physicians with six cases based on the six most common reasons for after-hours phone calls. We evaluated physicians’ actions following the communication of key clinical information: A situation cue described a patient's problem (eg, confusion). A background cue described a specific clinical finding regarding the cause of the problem (eg, patient's sodium is low). For each cue we defined a list of indicators, based on the medical literature, to ascertain whether physicians acted upon the provided information (which was defined as addressing at least one of the indicators).

Results A total of 108 phone consultations (containing 88 situation and 93 background cues) were analysed. Situation cues were communicated in 90% (79/88) of the calls and background cues in 33% (31/93). Physician acted upon the provided information in 57% (45/79) and 48% (15/31) of the communicated situation and background cues, respectively. When the background cues were not communicated, physicians asked questions expected to elicit the cue in 12% of the cases. Responding to the situation cue was associated with longer conversations and active inquiry by the physician.

Conclusions After-hours phone calls are error prone. Both nurse communication and physician decision-making are problematic. Efforts to improve patient safety in this setting must address both communication and decision-making.

  • Communication
  • Diagnostic Errors
  • Hand-Off
  • Hospital Medicine
  • Human Error

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Discontinuities in care due to communication failures have been associated with preventable adverse events.1 These failures may occur at any time, but the risk is higher at night or during weekends (after-hours).2 Inpatient after-hours phone communications are a clinical scenario in which a nurse consults a physician (who is on-call but may be outside of the hospital) regarding an acute patient problem using a telephone. These calls are usually limited to verbal communication, take place in a setting where scarce resources and fatigue are the rule, and are usually characterised by a paucity of information.2 For example, often the consulted physician is not the primary physician responsible for the patient, and may have received only a very brief ‘sign out’ (eg, ‘40-year-old with pneumonia, doing well’) or is entirely unfamiliar with the patient. Furthermore, nurse–physician communication is error prone due to different communication cultures specific to each profession (regardless of whether it takes place over the phone).3 ,4 Considering the frequency of on-call phone communications, there is surprisingly little research regarding potential risks and adverse outcomes resulting from these calls.

In previous work, we found that a limited set of problems account for the majority of after-hours phone calls.2 We attempted to improve nurse–physician communication by providing a problem-specific communication tool that lists the necessary data to be communicated under these common clinical scenarios.5 We based the tool on the Situation-Background-Assessment-Recommendation (SBAR) framework, which is the most commonly implemented communication framework in healthcare.6 Notably, previous evaluations of SBAR tools in healthcare have yielded mixed results.7 Several studies proclaim great success in institution-wide implementations of the SBAR framework,8 ,9 while other studies found no effect and even worse performance.10 ,11 We found that a problem-specific SBAR tool did not improve communication of key information between nurses and physicians.5 Specifically, in the majority of cases nurses stated the reason for the call (eg, patient is confused) but failed to communicate pertinent information about the cause (eg, patient's sodium is 119). This was observed regardless of whether they used the SBAR forms (that were designed to guide data extraction and communication). In this study, we addressed physician decision-making. Specifically, we evaluated physicians’ actions following the communication of key information by nurses and whether physicians were able to elicit the information when not provided by the nurse.


A simulation study was conducted at the Texas Medical Center (Houston, Texas, USA) from May 2010 to May 2011. In all, 22 pairs, each consisting of a registered nurse and an internal medicine physician (attending or senior resident responsible for fielding after-hours calls from nurses on a regular basis), were enrolled in the study. Both nurses and physicians had to be practicing on general internal medicine wards at the time of the study.

We presented each nurse with six clinical scenarios (table 1). We based the scenarios on actual patient records from a local tertiary-care hospital. Cases were selected by two experts (internists) (EVB, KOH) for representing a moderate diagnostic challenge in the context of the six most common reasons for after-hours calls (fever, glucose management, behaviour problems, medication prescription, blood pressure and pain).2 The rationale was that in order to evaluate whether physicians considered the communicated information in diagnosis and management of patients we must present some challenge. For example, a common reason for an after-hours call is blood glucose management. The reflexive response of a physician in such a case might be to ask about the glucose level and decide about insulin dosage. In our experiment, we presented a case where there was an order for insulin in the presence of normal blood glucose and a concomitant order for glucose (ie, a treatment for elevated blood potassium level). Consideration of the communicated information would direct the physician away from simple management of blood glucose to the evaluation and management of elevated blood potassium level.

Table 1

Clinical scenarios, cues and measures of evaluating appropriateness of physicians’ actions

Each case had at least one cue that was required in order to resolve the clinical scenario. There were two types of cues. A situation cue answered the question ‘what is wrong with the patient that is prompting the call?’ For example, the patient is disoriented and pulled out his intravenous line. These cues were used to evaluate the physician's understanding of the general situation and generate a differential diagnosis. A background cue was a specific clinical finding that answered the question ‘why does this particular patient suffer from this problem?’ For example, very low sodium level in a patient with acute confusion. These were used to determine whether the physician understood the aetiology of the patient's problem and was able to act on this understanding. We asked nurses to review the patient records, extract information (with or without the aid of an SBAR form) and then call the physician requesting instructions on how to manage the clinical scenario. For a detailed description of the SBAR forms and methodology for evaluating their utility in communicating information see5.

For each call we noted whether the situation/background cues were communicated. Then, for cases where the information was communicated, we evaluated the appropriateness of physicians’ responses. We considered physician responses to be ‘appropriate’ responses if there was any indication that the physician had considered and acted upon the provided information.

We defined the set of appropriate responses using a guide-book for the on-call physician and the UpToDate knowledge base (see online supplementary appendix A).12 ,13 For the situation cues we defined the elements of an appropriate differential diagnosis or what further questions/diagnostic workup was indicated. For example, in a young patient with an acute confusion (situation cue) a physician is expected to consider infection, metabolic or electrolyte abnormalities, stroke or other organic brain damage and medication/drug/alcohol related confusion.14 ,15 Indication that the physician has indeed considered the situation cue would therefore include questions regarding a possible infection, medications and substance abuse, laboratory tests and head imaging.14 ,15 For the background cues we established what the appropriate orders should be. For example, given confusion which is associated with low sodium (background cue), a physician could either repeat blood tests to validate the diagnosis; give intravenous fluids with sodium; order a limit on free water intake; or admit to the intensive care unit.16

We defined physicians’ responses as appropriate if, following a communicated situation cue, they had asked about any of the possible diagnoses on the differential diagnosis, and if following a communicated background cue, they had ordered any of the indicated orders (table 1, see online supplementary appendix A). In real clinical practice, a physician is expected to consider all major diagnoses on the differential, and address all active problems associated with these diagnoses. However, since there are often multiple reasonable strategies for a given clinical scenario, and since it is difficult to precisely define ‘appropriate response’ we used a much more lenient measure, focusing on whether physicians acted upon the communicated information rather than the adequacy of their medical decision. Figure 1 presents the study design.

As we were using cases based on actual patient records, there were no relevant ‘appropriate responses’ for the situation cue of the High Blood Pressure case (the blood pressure in itself was not high enough to warrant a comprehensive evaluation) and for the background cue of the Chest Pain case (the patient had multiple comorbidities that could have been responsible for chest pain. Thus, we made the decision to present the background cue of leg-swelling to the nurse subjects and excluded it from the analysis).

The Fever case included a misleading detail, a description of 2–3 loose stools suggesting the possibility of a Clostridium difficile infection. However, there were no other data to support the diagnosis (ie, overt diarrhoea, leucocytosis, etc). In this case, physicians’ actions were deemed adequate only if they had entertained an alternative diagnosis to that suggested by the misleading detail.

Nurses were asked to come to the laboratory (ie, simulated internal medicine ward), while physicians were contacted by phone (ie, simulated ‘out of hospital’ call). Physicians did not receive any sign-out information about the patients prior to the experiment. Subjects were told we are conducting a study to evaluate the communication between nurses and physicians without any additional details. Hence, both nurses and physicians were blinded to the objectives of the study and to the evaluation measures. Experiments were scheduled when it was convenient for the subjects. We provided nurses with a medical record (including an admission note, progress notes, medical orders, medication, nursing notes, laboratory and imaging results) and a bedside (nursing) chart (with vital signs, intake/output, etc). We used actual hospital forms to ensure that nurses worked with records that were as close as possible to records used in routine clinical practice. The expert panel made sure that the records covered all the pertinent data for the evaluation of each clinical scenario. Records were minimally modified to fit the clinical scenario, and to comply with the requirements for de-identification. The nurse subject could ask the nurse conducting the experiment (JPT) about physical exam findings for which there were scripted answers. If the question fell outside of the scripted answers, no information was given. The cumulative time to review the six records was limited to 2 h. Calls were not time-limited and were recorded by video and audio (MP3) and analysed by a single non-blinded reviewer (EJ). To avoid possible biases we refrained from using subjective measures in the evaluation (table 1).

We noted what information was communicated by the nurse, including erroneous information, and what additional information the physician requested. Evaluation was based on the data elements listed on the SBAR forms (see online supplementary appendix B). The rationale was that the SBAR forms listed a set of data elements that an expert panel deemed to be required for an appropriate evaluation of the type of case.5 We recorded the time that elapsed between the beginning of the phone conversation and when the reason for the call was communicated; the total length of the call; number and types of data items communicated regarding the patient's situation other than the situation cue (eg, patient identification and location, whether the problem was urgent); and the number and type of data items communicated regarding the patient's background other than the cue (eg, reason for hospitalisation, prior medical history, vital signs, medication). Physicians were also evaluated for their ability to elicit required information regarding the situation and background cues when these were not provided by the nurses (eg, asking for the most recent laboratory results in the confused patient). Online supplementary appendix C presents a transcription of one session including comments to indicate the steps of the study, the positions where the situation and background cues were provided as well as communication of situation data elements, background data elements, assessment and plan.

Statistical analysis

Randomisation and allocation of cases to subjects were based on a Latin Square randomisation table.17 Statistical analyses were performed using SPSS (V.20, IBM Inc., Chicago, Illinois, USA). We used generalised estimating equations (GEEs) to evaluate the association between adequacy of physicians’ actions and properties of the communication. We chose GEEs due to repeated measures within subjects and cases, missing observations and non-normal distribution of our data.18 We conducted the analysis sequentially. First, we assessed models based on different distributions (normal, Poisson and negative-binomial distributions for numerical variables, and binomial distribution for binary variables). Then, we found the best fitting correlation matrix (unstructured, independent or compound symmetry). We chose the model with the lowest Quasi likelihood under Independence Criterion.

Ethical considerations

This study was approved by the Committee for the Protection of Human Subjects (the UTH IRB). All the participants gave written informed consent and received a US$50 gift card. Any potentially identifying information in the patients’ records was erased.


Of the 132 (22 nurse–physician pairs×6) phone calls, 12 were cancelled (two pairs) due to a no-show of the nurse or inability to contact the physician, nine were cancelled by the nurse conducting the experiment due to time constraints, and three were excluded from analysis due to errors in the case presentation (eg, background cue inadvertently presented with the case). A total of 108 phone consultations by 20 nurse–physician pairs were analysed. Of these, 88 cases contained a situation cue and 93 cases contained a background cue. In all, 57 cases were delivered without the SBAR form and 51 cases with the SBAR form. (In previous work we demonstrated that there was no difference in the communication between the groups.5)

In 14% of the cases (12/88), nurses failed to communicate the situation cue (table 2). Of these, in 42% (5/12) the nurses actually reported a misleading finding (eg, new onset fever rather than persistent fever). In 58% (7/12), the physicians asked questions aimed at eliciting the situation cue, but received the correct answer in only three cases. In summary, the situation cue was communicated in a total of 79 cases (independently by the nurse in 76 and elicited by the physician in three).

Table 2

Rates of provided cues and appropriate actions

In 72% (67/93) of the cases, nurses failed to provide the background cue. Of these, in 7% (5/67) the nurses actually reported incorrect information (eg, a normal sodium level when the sodium was actually low). The physicians asked questions meant to elicit the cue in 12% (8/67) of the unreported background cues, and received the correct answer in five of the cases. In summary, the background cue was communicated in a total of 31 cases (independently by the nurse in 26 and elicited by the physician in five).

Physicians acted upon the communicated information (ie, appropriately) following 57% (45/79) of the communicated situation cues and 48% (15/31) of the communicated background cues. Providing an appropriate response was case dependent (30%–88%, p=0.001). In 25 cases, both the situation and the background cues were communicated. Of these, physicians acted appropriately regarding the situation cue in 72% (18/25). There was no association between nurses’ use of a problem-specific SBAR form and the appropriateness of physicians’ actions (p=0.5 for the situation cues, p=0.14 for the background cues). Online supplementary appendix D presents examples of various errors encountered in physicians’ responses.

After controlling for the differences between subjects and cases, better performance regarding the situation cue was associated with a more active inquiry by the doctor regarding background information (p<0.001, table 3). Communication of the background cue was not associated with a significant improvement of physician performance regarding the situation cue (p=0.41); however, the sample size was small.

Table 3

Call properties: comparison of appropriate to inappropriate actions


In nearly half of the cases (34/79, 43%), physicians failed to recognise and respond to the presentation of common serious clinical situations (eg, change in mental status, high potassium level). In a significant minority of cases (7/25, 28%), physicians failed to address the reason for the call even when presented with both the situation and background cues (eg, failing to address a case of an acute confusion despite the nurse's description of a patient with a behavioural change and a low sodium level). In over half of the cases (16/31, 52%), physicians failed to identify and treat the cause for the clinical condition. This was observed when the background cue involved non-trivial knowledge (eg, tacrolimus associated with high potassium levels) as well as when the knowledge was straightforward (eg, persistent fever in a patient who underwent surgery recently). Failure to communicate the necessary information accounted for a minority of the missed situation cues. On the other hand, inadequate reporting of information accounted for the majority of missed background cues. When nurses did not report the cues, physicians often failed to elicit the relevant information. Appropriate action regarding the situation cue was associated with active inquiry by the physician (p<0.001).

Our study has several limitations. First, we evaluated only a limited number of non-trivial clinical scenarios. All cases were based on real patients, concerned the most common after-hours clinical problems and our experts considered the diagnostic challenge in each of these to be typical of those encountered in routine care. Nonetheless, it is possible that some of the cases were challenging resulting in a pessimistic estimation of physician performance. On the other hand, to reduce the possible bias from case selection, and since it is difficult to precisely define ‘appropriate response,’ we used unrealistically lenient evaluation measures based on an objective list of published indicators. These measures may have resulted in an overoptimistic estimation of performance. Consequently, we could not quantitate the risk to patient safety posed by after-hours calls. Notably, our objective was not to quantify the extent of errors, but rather to evaluate the critical thinking of physicians following the communication of key information in this setting.

A second limitation is that when designing the study, we did not anticipate that nurses would often not communicate important cues, in particular, background cues. Thus, we did not have adequate statistical power to compare physician performance when given all required information (ie, both the situation and the background cues) with cases with missing information. However, a practicing physician is expected to elicit the necessary information from the reporting nurse. This was clearly not observed in our study.

A final limitation stems from conducting this study in a laboratory environment. Nurses and physicians had no prior knowledge of the patients. Further, nurses could not see the patients and were therefore deprived of important information. We cannot exclude the possibility that nurses would have acted differently when an actual physical patient was present for their evaluation and when actual patients depended on their actions. On the other hand, unlike real life settings, nurses were afforded ample time to review the patients’ records in a distraction-free environment, and physicians were not sleep deprived when responding to the call.

Why were physicians erring?

The observation that in almost half of the cases physicians failed to act upon the information provided to them is very worrisome. These were common cases (in particular the situation cues), presented to highly trained physicians and evaluated against unrealistically lenient criteria.

In the following section, we discuss possible reasons for failure in the different cases. As was found in previous studies, the majority of errors we witnessed could be attributed to problems with cognitive processes.19–21 These include lack of knowledge, failure to recognise the significance of data or failure to synthesise all available data supporting the correct diagnosis (see online supplementary appendix D). In the case of persistent fever and several loose stools for example, 78% of physicians considered the diagnosis of C difficile colitis, yet only 31% considered the possibility of other hospital acquired infections. This demonstrates an anchoring effect and how a single detail (eg, loose stools) can focus even the most experienced physicians on a wrong diagnosis.22 Interestingly, there were no other data to support the diagnosis of C difficile colitis (overt diarrhoea, leucocytosis, etc), but it seems that once physicians formulated a diagnosis they stopped searching for additional information (premature closure).23

In the Chest Pain case, despite a very suggestive description, only 54% of physicians entertained the possibility of a pulmonary embolus (PE). While similar rates have been cited in the literature,24 we believe that in this case the reason for the error was a diagnosis momentum. The patient had a previous history of congestive heart failure and evaluated as such, despite a lack of other findings to support the diagnosis.23 Of the eight physicians who did consider the possibility of PE, only four requested an imaging study of the chest, and of these only two noticed the patient suffered from kidney disease and ordered the indicated ventilation/perfusion scan. These results are consistent with known problems with the management of PE.24

Problems with clinical judgment (ie, considering all the relevant information but coming to the wrong conclusion)20 may have been responsible for lack of treatment in 37% of severe low sodium cases (confused patient), 80% of cases in which a hospital acquired infection (fever) was suspected and for prescribing benzodiazepines in 26% of acute liver injury cases (medication); however, this is unlikely as a sole explanation, considering that these were relatively straightforward cases. Further, it is not clear why in 70% of high potassium cases (glucose) and in 41% of acute confusion (behaviour) cases (both, situation cues that were communicated by most nurses), physicians proceeded directly to symptomatic treatment without attempting to identify a cause for the problem.

We suspect that our findings are attributable to the nature of after-hours phone calls rather than any characteristic of the participating professionals. In clinical practice, these calls are made at times when resources are limited and the physicians are often unfamiliar with the patient. It is also possible that there is a cultural component whereby nurses are looking for ‘quick fixes’ that would decrease their workload, and physicians favour symptomatic treatment that would suffice until morning when the primary physician resumes responsibility.25

Mitigating the risk of phone consultations

After-hours phone calls are potentially dangerous due to communication failures, cognitive limitations, and possibly the limited resources and limited responsibility of the on-call physician. Simple interventions, such as problem-specific templates for communicating patient data, may reduce communication failures.2 Our results show, however, that such interventions are not effective in isolation.5 ,26 It seems that improving communication is necessary, but not sufficient to improve outcomes.

Eliminating phone communications would require significant changes in healthcare processes. Alternative solutions might be to provide access to a shared electronic health record (ie, a comprehensive patient record accessible from outside the hospital)27 or to develop computerised systems designed to support both communication and decision-making. Establishing the necessary knowledge base and developing practical systems will be challenging for multiple reasons. However, identifying the common reasons for after-hours calls as well as common communication and cognitive errors may guide such efforts. More importantly, recognising and documenting the risk to patient safety associated with after-hours phone consultations are a necessary step toward changing the organisational culture to reduce or eliminate the potential of harm associated with these communications.25 Further research is needed to identify the extent and impact of adverse outcomes associated with care provided over the phone.


After-hours phone calls are error prone. Both nurse communication and physician decision-making are problematic. Efforts to improve patient safety in this setting must address both communication and decision-making.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors EJ: analysed data, drafted manuscript, revised manuscript. JPT: helped plan the study, was primarily responsible for conducting the experiments, participated in data analysis, currently convalescing from critical illness, and therefore was not available to approve the final manuscript. KOH: helped plan the study, participated in data analysis, helped draft and revise the manuscript. TRJ: helped plan the study, participated in data collection and analysis, helped revise the manuscript. CWJ: helped plan the study, led the data analysis, helped revise the manuscript. EVB: conceived of the study, obtained funding, supervised data collection and analysis, helped draft and revise the manuscript.

  • Funding This study was funded in part by a grant from the University of Texas System Patient Safety Committee (to EVB, JPT, KOH and TRJ) and by a training fellowship from the Keck Center Computational Cancer Biology Training Program of the Gulf Coast Consortia (CPRIT Grant No. RP101489 to EJ).

  • Competing interests None.

  • Ethics approval Committee for the Protection of Human Subjects at the University of Texas Health Science Center at Houston.

  • Provenance and peer review Not commissioned; externally peer reviewed.