Article Text


Objective measures of situation awareness in a simulated medical environment
  1. M C Wright,
  2. J M Taekman,
  3. M R Endsley*
  1. Department of Anesthesiology and the Duke University Human Simulation and Patient Safety Center, Duke University Medical Center, Durham, North Carolina, USA
  1. Correspondence to:
 Melanie C Wright PhD
 Department of Anesthesia, Box 3094, Duke University Medical Center, Durham, NC 27710 USA;


One major limitation in the use of human patient simulators is a lack of objective, validated measures of human performance. Objective measures are necessary if simulators are to be used to evaluate the skills and training of medical practitioners and teams or to evaluate the impact of new processes or equipment design on overall system performance. Situation awareness (SA) refers to a person’s perception and understanding of their dynamic environment. This awareness and comprehension is critical in making correct decisions that ultimately lead to correct actions in medical care settings. An objective measure of SA may be more sensitive and diagnostic than traditional performance measures. This paper reviews a theory of SA and discusses the methods required for developing an objective measure of SA within the context of a simulated medical environment. Analysis and interpretation of SA data for both individual and team performance in health care are also presented.

Statistics from

Human patient simulators can be a valuable resource for studying issues related to medical error. Using simulators, researchers are able to study the effects of various adverse events on the performance of surgeons, anaesthesiologists, other medical practitioners, and nurses. This provides a great improvement to the “hit or miss” method of training and post hoc study of adverse events associated with the treatment of real patients, where adverse events are likely to occur only rarely and may never be encountered during the training period. In addition, human patient simulators can be used to show how new equipment design and new processes or procedures may affect the performance of clinicians, and thus, the potential resulting outcomes for patients.

Using simulators to evaluate the effects of training, medical equipment design, or changes in process or procedures ultimately requires valid measures that can provide data to support conclusions regarding the effectiveness of these interventions. Unfortunately, few measures have been developed for quantifying the performance of medical professionals using simulators. In addition, the validation process for these measures can be arduous and is currently an important topic of discussion within the medical simulation community.

This paper reviews existing methods of measuring human machine systems that may be applied in a medical simulation environment. Advantages and disadvantages of various measurement methods are described. We present a theory of situation awareness (SA) and, as an example, discuss the implications of SA within anaesthesia. We propose that direct measures of SA through the situation awareness global assessment technique (SAGAT) may provide an effective objective measure of individual and team performance in a patient simulation environment. We describe the methods required for the use and analysis of SAGAT for patient simulation applications.


In evaluating performance, we are typically concerned with understanding the ability of the individual medical care provider to perform various tasks while utilising the tools and medical care systems at his or her disposal. This combined human system performance seeks to evaluate not the individual per se, but the degree to which the tool or devices either aid or undermine that performance. This type of evaluation is extremely important in evaluating the design of new technology for the medical care setting. In addition, we may be interested in evaluating individual or team performance in order to assess the quality of new training regimens.

A number of different types of measures have been used to assess human machine systems in a variety of work environments. These include direct and indirect measures of performance, mental workload measures, and a range of analytic measures of specific aspects of performance such as movement or communication.1 Examples include:

  • Direct performance measures—measures of outcome or human machine “score”, time on task, error rate, degree of error (such as deviation from planned path);

  • Indirect performance measures—subjective ratings of performance including both self ratings and outside observer ratings;

  • Mental workload measures—subjective ratings of workload, secondary task measures (better performance on a secondary task implies greater spare capacity suggesting lower workload), physiological measures;

  • Task analytic measures—analysis of eye tracking data, time and motion studies, time spent on various components of a task, communications analysis.

Key messages

  • Objective measures of human system performance are needed for use in evaluations conducted using human patient simulators.

  • Situation awareness (SA) is defined as a person’s perception of elements in the environment, comprehension of that information, and the ability to project future events based on this understanding.

  • SA is a critical component in decision making for medical practitioners.

  • Objective measures of SA can be more sensitive and more diagnostic than traditional measures of human performance.

  • Objective measures of SA such as situation awareness global assessment technique (SAGAT) may provide an effective means of assessing both individuals and teams in a human simulation environment.

  • The use of SAGAT requires detailed analysis of the task to be studied to identify SA requirements in order to develop appropriate SA queries.

  • Results of evaluations using objective measures of SA with human patient simulators may be used to improve medical practitioner training and medical equipment design.

Direct performance measures are difficult to define for assessing the performance of clinicians in a simulated patient environment as there may be many possible solutions to a particular problem. Possible measures might include success or failure in a given task or scenario or time to identify a specific problem or adverse event that has been preprogrammed into a simulation. Researchers have used written tests,2 observed errors,3–5 problem detection and diagnosis time,6 task completion time,3 and measurement of simulator variables such as effect site concentration of drugs.7 Unfortunately, these measures provide little evidence as to why poor performance may have occurred. In addition, overall performance measures often reflect only the outcome of the task or event. Therefore, they may not identify errors or misconceptions that are resolved before the task is complete or that are not reflected in the outcome of the event (for example, certain drug errors).8

Subjective7,9 and secondary task10 measures of workload have also been used in patient simulations. While workload measures are useful in identifying situations where the clinician may suffer from overload, this is only one aspect of a task that is likely to influence performance. In some cases measures of workload and performance dissociate. For example, poor performance is sometimes associated with passive monitoring tasks where the individual is not actively involved in performing a task, often termed the out of the loop problem.11 While workload may be low, performance can suffer because the individual is not aware of changes that may be occurring (for example, the patient has taken a turn for the worse).

Indirect performance measures have been used successfully in human simulation environments. Several researchers have used observer ratings to assess the performance of clinicians in a simulated environment.12–14 Gaba et al found that this method resulted in slightly better interrater reliability when used to evaluate technical performance (for example, motor performance associated with tasks such as chest compression) compared to behavioural performance (such as decision making and team communication skills).13 Gaba describes several problems associated with this type of measure including the high cost of using multiple experienced raters and high interindividual variability that may occur between raters.15


Cognitive constructs such as attention and mental workload are useful in the study of human performance when formally defined and integrated into testable theories. Situation awareness is another such construct. Situation awareness can be thought of as an internal mental model of the current state of an individual’s environment. Endsley has formally defined SA as “the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future”.16 This definition breaks the concept of SA into three distinct levels including: (1) Level 1—perception of the environment, (2) Level 2—comprehension of the meaning of this information, and (3) Level 3—projection of events or actions in the future based on this perception and comprehension.

In Endsley’s model of SA (figure 1), SA is shown as a construct that is distinct from decision making and the performance of actions.17,18 However, SA precedes decision making and is one important component of dynamic decision making. Other factors including both task or system factors and individual factors as shown in the model also influence this process. For example, two practitioners may have the same SA, but choose different courses of action based on their prior clinical experience. Personality characteristics such as risk aversion or system constraints may also affect their decisions. SA as defined in this model includes only that portion of a person’s knowledge that pertains to the state of a dynamic environment. Background knowledge, experiences, and established rules are static knowledge sources that fall outside of the definition of SA, although they may influence the development of SA. For example, preconceptions based on previous experiences can direct an individual’s attention, affecting the formation of SA. Another important detail of this model is that SA is continuously changing as the environment changes, either due to the decisions and actions of the individual or due to other outside influences.

Figure 1

 Model of situation awareness.17 Reprinted with permission from Human factors, vol 37, No 1, 1995. Copyright 1995 by the Human Factors and Ergonomics Society. All rights reserved.

Klein specifies four reasons why the phenomenon of SA is important in the study of human work,19 these include:

  1. SA appears to be linked to performance. This assertion has obvious face validity since it is expected that the more relevant information a worker has about a situation the more adaptive their responses will be.

  2. Limitations in SA may result in errors. If needed information is not available or is not correctly interpreted, whether due to failures of memory or attention or due to system failures, then clearly there is an increased likelihood of errors.

  3. SA may be related to expertise. Klein raises examples of classic research where experienced physics researchers have been shown to classify physics problems differently than novices.19

  4. SA is the basis for decision making in most cases. This characteristic of SA is an integral part of both Klein’s recognition primed decision model19 and Endsley’s model of SA.17

Research in aviation and other environments where SA has been measured supports Klein’s four reasons for studying SA. Measures of SA have correlated with performance in aviation.20,21 Bell and Lyon found that fighter pilots with lower observer ratings of SA during a combat scenario had a greater number of decision errors than pilots with more highly rated SA.22 In addition, SA measures have been shown to be sensitive to both task difficulty and experience in aviation and in power plant operations.23,24

More interesting, however, is the research that has shown measures of SA to be sensitive to differences that were not reflected in performance measures. For example, Endsley conducted a study comparing the SA of pilots using a new avionics system with that of pilots using the old system.25 Pilots subjectively believed the new system to be better, but mission performance measures showed no differences. Endsley used SAGAT, a direct SA measurement technique in which pilots are asked questions regarding their perception, comprehension, and projection of the current situation during a simulation freeze to evaluate the new system. She found that the new system provided pilots with better SA regarding knowledge of enemy aircraft location and other critical factors compared to the old system.

In another example, concern about the effects of a new form of air traffic control known as “free flight” on the ability of the air traffic controllers to track and monitor aircraft led to a comparison of performance and SA between the new and old systems.26 Performance tests with the new system showed trends toward performance differences regarding separation errors, although the results were not significant. SAGAT measures in this experiment were able to provide more diagnostic detail showing that controllers were aware of significantly fewer aircraft under free flight conditions (level 1 SA), that controllers had a significantly reduced understanding of what was happening in the traffic situation (level 2 SA), and that controllers had reduced knowledge of where aircraft were going (level 3 SA). These studies suggest that measures of SA can have diagnostic powers beyond measures of performance that may be predictive of performance problems or errors that are not seen within the limited sensitivity, scope, or time involved in a laboratory study.


Human patient simulators show great promise for the improvement of both training methods and equipment design in a wide range of medical applications. The practice of anaesthesia is a good example.27 The environment of the anaesthetist is highly dynamic, complex, uncertain, risky, and subject to intense time pressure.28 The study of cognitive processes of humans has shown that this type of environment is very challenging to the human as a decision maker.27 Humans are poor at performing tasks under low stress and low workload conditions (for example, monitoring tasks) and are also poor at performing tasks under high stress and high workload conditions. These types of working conditions affect the individual’s ability to focus attention in the appropriate areas. In addition, task complexity, dynamism, and uncertainty are likely to interact to reduce a person’s ability to make sound decisions.

For an anaesthesiologist, developing and maintaining SA can be a difficult task because of the large amount of information present in a complex and dynamic environment. The following examples apply Endsley’s three levels of SA29 to the tasks of an anaesthesiologist:

  • Level 1 SA—Perception of the elements in the environment. The first step in achieving SA is to perceive the status, attributes, and dynamics of relevant elements in the environment. For an anaesthesiologist, these may include awareness of patient vital signs such as heart rate, blood pressure, oxygen saturation, breathing rate, laboratory values, and hospital studies and may also include awareness of drugs and the patient’s level of consciousness. Awareness of the actions of other team members such as the surgeon and nurses and awareness of equipment function (including potential problems) also are critical elements of level 1 SA for the anaesthesiologist.

  • Level 2 SA—Comprehension of the current situation. Comprehension of the situation is based on a synthesis of the separate level 1 elements. Level 2 SA involves understanding the significance of objects and events in the environment and combining this data to form a holistic picture of the environment in light of one’s goals. For example, anaesthesiologists will synthesise and integrate information regarding patient physical signs and patient information to identify the most probable cause in a complex differential diagnosis. They will understand the significance of a sudden drop in heart rate, based on knowledge of a surgeon’s recent procedures and other vital signs, to know if it represents an expected and temporary event or a serious problem.

  • Level 3 SA—Projection of future status. The highest level of situation awareness is to project the future actions of elements in the environment. This is achieved through knowledge of the status and dynamics of elements in the environment and comprehension of the situation (level 1 and level 2 SA). Anaesthesiologists with a high degree of level 3 SA will be able to project the response of the patient to anticipated drug administration and physician actions, including changes in vital signs such as breathing rate and oxygen saturation. This type of projection is very important in allowing them to be proactive and not just reactive.

In the medical environment, the level of SA possessed by a practitioner may be critical to the outcome of the patient. The measurement of SA allows researchers to determine whether clinicians have good SA in relation to the task environment. This could help in the identification of performance problems and error mechanisms (possibly induced by equipment with a poor user interface, poor arrangements of equipment in the surgical setting, or poor communication and teamwork among the surgical staff). Measurement of SA may be used as a method of evaluating training by identifying areas of deficiency (that is, areas where individuals fail to attain the needed levels of SA). The results can be used to improve training and education of medical practitioners. A measure of SA may also lead to improvements in system design (such as improved equipment design or process changes) that will support medical practitioners in attaining a high level of SA which will allow them to make the best decisions, resulting in better patient outcomes.


Measures of SA can be either indirect, such as subjective ratings, or direct, usually through probing workers with direct questions regarding the task. Subjective measures can be either self rating techniques such as the situation awareness rating technique (SART) by Taylor30 or they may be ratings made by outside observers. Endsley has suggested that self ratings most likely reflect a measure of subjects’ confidence level regarding their SA, rather than providing a true measure of their SA.8 In the case of outsider observer ratings, these can only be based on behaviours and verbalisations of the subjects; therefore, they can not determine whether specific information has been stored and processed internally.

Gaba et al suggest the use of real time probes to measure the SA of anaesthesiologists using patient simulations.31 They suggest that queries be raised by actors within the scenario (such as doctors or nurses) and the participant’s responses can be used to assess their level of SA. Endsley, however, states that such queries may bias the results of the experiment by artificially directing the subject’s attention to certain parameters.8

Endsley presents SAGAT as a method of directly measuring SA.8 In this method, a simulation is frozen at various points in time and workers are asked questions designed to assess their level 1, level 2, and level 3 SA. The answers are then compared to the real situation according to the simulated computer database and experts’ interpretations of the meaning of that data to provide an objective measure of SA. One important aspect of SAGAT is the development of queries for the experiment. This requires a fairly in-depth study of an individual’s role to identify the SA requirements and appropriate phrasing of questions. Methods such as Goal Directed Task Analysis (GDTA) may be used to identify task goals, related decisions, and finally the SA requirements that are needed to make the decisions that allow operators to meet their goals.32

SAGAT has been criticised regarding the perceived intrusiveness of freezes in a simulation to collect SAGAT data, and the degree to which it reflects memory and as such is limited.33 Research collected to date does not support these concerns. Several studies have shown that a temporary freeze in a simulation to collect SAGAT data does not impact performance.8,34 Other studies have used SAGAT in some trials, but not others in order to examine whether SAGAT interferes with participant performance32,35–37 and found no differences indicating either better or worse performance when SAGAT is being used. Studies have also found that subjectively participants who used SAGAT appeared to adjust to the technique quite well and are able to return to the action fairly readily after a short freeze to collect SAGAT data.32,38

In terms of an over-reliance on memory, Endsley32 provides a detailed examination of this issue finding that memory is central to SA and that SAGAT taps into the working memory stores where information is integrated and processed to form the ongoing dynamic situation representation, as well as long term memory stores that feed SA. SAGAT in fact overcomes the many problems of retrospective report of past mental events, due to its use of freezes to collect these mental impressions of the situations as immediately as possible and without other intervening events to disrupt memory. Endsley8,34 found that pilots’ ability to report their SA via SAGAT was unaffected by how long after the freeze the question was asked, testing intervals from around 20 s to up to 6 min, showing a lack of memory decay for this information due to integration of both working and long term memory stores. The SAGAT methodology is specifically designed to tap into the SA resident in human memory as effectively as possible, using cued recall queries administered as quickly and immediately as possible.

The application of SAGAT to medical applications is limited. Zhang and colleagues used a variation on SAGAT to compare two anaesthesia displays in a patient simulation. They found significant differences in SA due to type of display for level 1 and level 2 SA for some of the scenarios tested.6 One problem with their method, however, was that they used a very small subset of queries (four level 1 queries, two level 2 queries, and two level 3 queries), such that during the course of the experiment participants may be able to predict the questions that would be asked for any given simulation freeze.

The validity of SAGAT has been established in other environments including piloting, driving, and air traffic control.8,32 In general, these results should generalise to the human simulation environment. However, with the exception of the work by Zhang et al, we are aware of no published applications of direct measures of SA in patient simulation scenarios.


In order to measure SA, appropriate queries must be developed for assessing level 1, level 2, and level 3 SA. A systematic approach is taken to identify first the goals of medical professionals in the work environment and then the appropriate information requirements to meet those goals. One such approach is a GDTA. For a GDTA, experts are interviewed to identify first the high level goals associated with the task, and then, the sub-goals. A tree structure is created of goals and sub-goals. Further interviews are conducted to identify key decisions for each sub-goal and the SA requirements (at all three levels) that are needed to make those decisions (figure 2). These requirements are then used to develop SA queries for SAGAT. Figure 3 provides an example of a section of a GDTA and some related SA queries an anaesthesia task based on a preliminary GDTA of certified registered nurse anaesthetists. By basing the analysis on goals rather than specific tasks, information requirements are identified that are independent of current technology; thus, the information can be generalised more easily for future efforts.

Figure 2

 Goal directed task analysis structure including decisions and SA requirements.

Figure 3

 Sample section of a preliminary goal directed task analysis and related SAGAT queries for the role of a certified registered nurse anaesthetist.

For the measurement of SA using SAGAT in a simulated patient environment, the patient scenario is frozen at random points during the trial and participants are required to answer SAGAT queries in such a way that they can not see any important perceptual details of the simulation. Endsley32 recommends that the number and timing of simulation freezes and queries meet the following four criteria:

  1. The timing of SAGAT stops will be randomly determined.

  2. A SAGAT stop will not occur within the first 3–5 min of an experimental trial.

  3. SAGAT stops will not occur within 1 min of each other.

  4. Over the course of the experiment, at least 30 samplings will be collected per SA query (across subjects and trials) for each experimental condition.

The SAGAT responses are scored as correct or incorrect within tolerance bands determined by experts in the area being evaluated, rather than based on absolute error. The frequency of correct responses is tabulated across each query within each experimental condition before further data analyses are completed. The SAGAT data can be analysed both as a composite measure (total SA score based on the number of correct responses to all SA queries) and on an individual query basis. An analysis of individual queries helps provide diagnostic information regarding what types of information or what levels of SA may have been more or less affected by any experimental manipulations such as training conditions or equipment setup.


Measures of SA such as SAGAT have been used in other domains to both diagnose problems and identify potential solutions including the design of supporting equipment or displays and the development of training programmes. For example, based on the SAGAT results of the air traffic controller evaluation mentioned previously,26 researchers designed a display which provided enhanced information on flight paths for aircraft in transition states as a way of compensating for the lower SA observed.39 With the new display controllers were found to be over three times more likely to be correct in understanding whether aircraft were conforming to their advisories, showing improved SA and ability to perform the task.39

Endsley et al40 used SAGAT to identify critical differences between experienced and inexperienced general aviation pilots, and found numerous problems with the novice pilots’ ability to take in key information, deal with distractions and high workload, monitor effectively, and to understand perceived information and project future events. Based on this research, a set of computer based training modules was designed to build some basic skills underlying SA for new general aviation pilots.41 These modules included training in time sharing or distributed attention, checklist completion, ATC communications, intensive preflight planning and contingency planning, and SA feedback training which were all found to be problems for new pilots. In tests with low time general aviation pilots, the training modules were generally successful in imparting the desired skills. Some improvements in SA were found in follow on simulated flight trials, but the simulator was insensitive to detect flight performance differences.

Similar research and training development has been conducted for army platoon leaders.42,43 The resulting computer based training programme sought to help build up the mental models and schema that are needed for pattern recognition to produce situation understanding and projection. The training programme also taught skills related to building SA through team communications and contingency planning. In initial testing with cadets performing exercises at the Royal Norwegian Naval Academy, trained cadets were more likely to correctly refuse to attack a refugee camp than untrained cadets, indicating better SA.43 In addition, trained cadets indicated that they spent more mental effort developing level 3 SA and determining how to best meet their goals.


The theory of SA can be extended to include team environments, such as would be encountered in a surgical setting. Endsley describes team SA as “…the degree to which each team member possesses the SA required for his or her responsibilities”.17 Team members have individual SA requirements and in some cases these requirements overlap, resulting in shared SA requirements. Cooke refers to shared knowledge between team members in two ways: (1) complementary shared knowledge in which the team members have knowledge that does not overlap but is complementary, resulting in the needed team knowledge; and (2) common shared knowledge in which team members share the same knowledge.44,45 A team can be considered to have high team SA when all of the individuals on the team possess the SA (whether complementary or shared) required for their respective roles.

An objective measure of SA such as SAGAT can provide unique insight into team performance within simulated medical environments as well as individual performance. Queries can be designed to assess specific SA requirements for each team member role. More importantly, however, responses to queries related to common SA requirements can be compared across team members, identifying SA differences between team member roles. In addition, specific responses can be compared to determine whether the same responses (correct or incorrect) are made across team member roles. This type of analysis can provide diagnostic information regarding the source of breakdowns in team SA. For example, common incorrect responses may be indicative of problems that affect the entire team in a similar way (such as poorly designed information display). Alternatively, a mix of correct and incorrect responses or different incorrect responses across team member roles may be indicative of breakdowns in team coordination.


The potential benefit of patient simulators for training and evaluating medical practitioners and medical equipment is widely recognised.15,46 However, methods for measuring the performance of practitioners and the effectiveness of equipment are lacking. The direct measurement of level 1, level 2, and level 3 SA can provide important information regarding the perceptual and cognitive processes of medical practitioners. Future research is needed to determine the SA requirements for a variety of medical tasks or roles (such as anaesthesiology) and to validate measures of SA such as SAGAT within the patient simulation environment. A valid measure of SA may ultimately provide a training tool, in which feedback is given to trainees regarding their performance so that they develop better skills in attaining SA. Such a measure may help designers make important design decisions such as choosing between competing display designs for monitoring patient vital signs. Measurement of team SA may ultimately lead to training and equipment design that supports better coordination between medical teams which will also lead to higher quality health care.


View Abstract


  • * SA Technologies, Inc, Marietta, Georgia, USA

  • Work attributed to, and funded by the Department of Anesthesiology, Duke University Medical Center, Durham, NC 27710 and SA Technologies, Inc., Marietta, GA.

  • Competing interests: MRE is the President of SA Technologies. SA Technologies markets software and user guides for the SAGAT.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.