Article Text

Development and reliability of the explicit professional oral communication observation tool to quantify the use of non-technical skills in healthcare
  1. Peter F Kemper1,
  2. Inge van Noord1,
  3. Martine de Bruijne1,
  4. Dirk L Knol2,
  5. Cordula Wagner1,3,
  6. Cathy van Dyck4
  1. 1Department of Public and Occupational Health, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands
  2. 2Department of Epidemiology and Biostatistics, EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands
  3. 3The Netherlands Institute of Health Services Research (NIVEL), Utrecht, The Netherlands
  4. 4Faculty of Social Sciences, Department of Organizational Science, VU University, Amsterdam, The Netherlands
  1. Correspondence to Peter F Kemper, Department of Public and Occupational Health, EMGO+ Institute for Health Care and Research, VU University Medical Center, Van der Boechorststraat 7, Amsterdam 1081 BT, The Netherlands; p.kemper{at}


Background A lack of non-technical skills is increasingly recognised as an important underlying cause of adverse events in healthcare. The nature and number of things professionals communicate to each other can be perceived as a product of their use of non-technical skills. This paper describes the development and reliability of an instrument to measure and quantify the use of non-technical skills by direct observations of explicit professional oral communication (EPOC) in the clinical situation.

Methods In an iterative process we translated, tested and refined an existing checklist from the aviation industry, called self, human interaction, aircraft, procedures and environment, in the context of healthcare, notably emergency departments (ED) and intensive care units (ICU). The EPOC comprises six dimensions: assertiveness, working with others; task-oriented leadership; people-oriented leadership; situational awareness; planning and anticipation. Each dimension is specified into several concrete items reflecting verbal behaviours. The EPOC was evaluated in four ED and six ICU.

Results In the ED and ICU, respectively, 378 and 1144 individual and 51 and 68 contemporaneous observations of individual staff members were conducted. All EPOC dimensions occur frequently, apart from assertiveness, which was hardly observed. Intraclass correlations for the overall EPOC score ranged between 0.85 and 0.91 and for underlying EPOC dimensions between 0.53 and 0.95.

Conclusions The EPOC is a new instrument for evaluating the use of non-technical skills in healthcare, which is reliable in two highly different settings. By quantifying professional behaviour the instrument facilitates measurement of behavioural change over time. The results suggest that EPOC can also be translated to other settings.

  • Communication
  • Patient safety
  • Team training
  • Teamwork
  • Quality measurement

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: and

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


A lack of non-technical skills is increasingly recognised as an important underlying cause of adverse events in healthcare.1 ,2 Non-technical skills are ‘the cognitive, social and personal resource skills that complement technical skills and contribute to safe and efficient task performance’.3 Examples of non-technical skills are task management, teamwork, situation awareness and leadership.4 ,5

It can be reasoned that the nature and number of things that professionals communicate to each other can be perceived as a product of their use of non-technical skills. Application of non-technical skills implies that tasks, situations, decisions and team roles are made more explicit. Task management, for instance, becomes more explicit when a physician discusses with a colleague who is responsible for a patient, rather than assuming this is implicitly clear. Or when transport of a patient is standardised, it can be expected that abnormalities will be proactively managed instead of troubleshooting along the way.

In order to improve patient safety through the better use of non-technical skills, dedicated training is required,6 ,7 such as crew resource management (CRM),8 which is increasingly being applied in healthcare.9 The number of evaluations of this training and the corresponding use of non-technical skills is also increasing rapidly, with results being promising but still limited.10 Classroom-based training has shown mixed results with regard to behavioural change.11

A possible explanation for these mixed results might be that non-technical skills are difficult to measure. Non-technical skills are broad concepts that capture a wide range of aspects that can be relevant, depending on the situation. Furthermore, most non-technical skills are automatic and consist of routine behaviour, of which people have no realistic perception regarding the extent to which they use them. Most studies rely on self-reported questionnaires to measure non-technical skills,11 or use proxy measures such as incident reporting12 and adherence to guidelines.13 Although these outcomes are relevant, they are not a measure of the actual demonstration of a non-technical skill.

Probably the best way to measure non-technical skills is by systematic direct behavioural observation because observations have the advantage of measuring behaviour as it actually occurs. There are several existing structured observation methods that assess the use of non-technical skills in healthcare.3 ,14–16 Most of these methods are setting-specific (eg, the operating theatre or anaesthetics) and demand clinical knowledge to assess the use of non-technical skills. Moreover, all these instruments appraise the use of non-technical skills, which, although highly structured, is a subjective assessment. Up to now, most studies have used the assessment of non-technical skills by direct observations in descriptive studies, for example as an educational feedback tool during a training session. Only a few studies have applied observations in evaluation studies.17 ,18 There is a need for an observation method that can be used independent of context by observers without or with limited clinical expertise, and that systematically quantifies non-technical skills rather than appraises them. Therefore, building on the shoulders of our predecessors, we set out to develop this method.

Our starting point has its origin in aviation, in which training of non-technical skills by means of CRM training was widely established by the mid-1990s in airlines across Europe and North America.8 To give aircraft personnel structured feedback regarding their non-technical skills during CRM training, Antersijn and Verhoef19 developed a checklist of important non-technical skills for the staff of the Royal Dutch Airlines. This checklist, called SHAPE, structured non-technical skills into five domains, notably self, human interaction, aircraft, procedures and environment. The domains aircraft (A) and procedures (P) are bound to the aviation context. In healthcare, the aircraft-specific non-technical skills should be replaced by department-specific clinical skills, although we did not use the specific clinical skills in this study.

Within the other domains (S, H and E), we can distinguish specific and general non-technical skills. The general non-technical skills in SHAPE are context independent and are applicable to different settings and situations without specific knowledge of the situation. For instance, a person can be assertive in the cockpit as well as on the hospital ward, which means that these general non-technical skills of the SHE domains can be translated to the healthcare context.

Like most of the existing observational instruments in healthcare, SHAPE uses behavioural markers, defined as ‘observable non-technical behaviours that contribute to superior or substandard performance within a work environment’.20 Normally, these behavioural markers are used to appraise the use of a non-technical skill. Within a well-structured and complete framework, non-technical skills can also be quantified by counting the number of times a behavioural marker is explicitly expressed. We used the general non-technical skills defined in SHAPE to quantify non-technical skills by systematically observing professional communication on the work floor. The present paper describes the development of this new observation instrument, called the explicit professional oral communication (EPOC) measurement. We also present the interobserver variability in two different settings—the emergency department (ED) and the intensive care unit (ICU).


Development of the EPOC measurement

The development of the EPOC comprised five steps (see figure 1). We started with the SHE domains of the original SHAPE. In an iterative process we translated, tested and refined the instrument in the healthcare context, first to ED and later to ICU. Decisions within the developmental steps and the progression through these steps were made by the development team. This team consisted of four researchers (PFK, IvN, MdB, CvK) with a background in medicine, epidemiology and psychology. In addition, various international experts in the field of non-technical skills were consulted on invitation and during international conferences. Furthermore, experienced EPOC observers were included in the team after the first studies ended.

Figure 1

Chronological display of the development of the explicit professional oral communication. CRM, crew resource management; ED, emergency department; EPOC, explicit professional oral communication; ICU, intensive care unit.

Within these three categories, the final version of EPOC consists of six dimensions to classify explicitly the professional oral communication of an observed person. The self category of EPOC measures assertiveness. The human interaction category is divided into working with others, task-oriented leadership, and people-oriented leadership. The environment category consists of situation awareness, and planning and anticipation. Each dimension is subdivided into several concrete verbal behaviours that together represent the dimension. Table 1 displays the categorisation of EPOC and provides definitions for the categories and dimensions, and examples of verbal behaviours. The instrument is described in a handbook with instructions, definitions and examples.

Table 1

Overview of categories, dimensions and items, with definitions and examples and the descriptive results

Measurement: observation

Assessing EPOC meant that only work-related interactions between professionals were counted. Every time the observed person expressed one of the verbal behaviours of the EPOC the observer had to tally this on the observation form. One exception was made for ‘listens’. When someone nods his or her head, this was also tallied. Social talk or conversations with the patients or family were not included.

An observation lasted 30 min. During an observation one person at work was observed and his or her work-related verbal expressions were tallied. An observation was carried out by one observer (an individual observation) or, in order to calculate the inter-observer reliability, by two independent observers simultaneously (a contemporaneous observation). The observations were carried out directly and were not recorded on video. All observations were conducted during daily practice between 07:00 and 19:00 hours.

In addition to the verbal behaviours, contextual information was gathered before, during and after the observation. Contextual information consisted of the starting time and the occupation of the observed person, the type and number of patients the observed person saw, how many times and with whom the observed person interacted. Directly after observation, both the observer and observed person filled out the National Aeronautics and Space Administration (NASA) task load index (NASA TLX)21 to measure the perceived workload during the observation period. Next, the observer indicated which tasks the observed person had performed, by means of a short description of the observation period and ticking a number of preselected tasks (eg, handover or multidisciplinary meeting).

All observers received a 1-day theoretical training course to enable them to learn the definitions of the verbal behaviours, practise with written examples and becoming familiar with the common sources of rating biases (eg, halo effect). Specific attention was paid to maintain their sensitivity to verbal behaviours that occur less frequently. This was followed by a 1-day practical training course in the clinic with an experienced observer who supervised the observations and discussed the outcomes afterwards. Contemporaneous observations were carried out regularly and discussed afterwards. To make sure that all observers rated behaviour in the same way and to check whether they were consistent during the whole data collection period, regular meetings were organised to discuss contemporaneous or doubtful examples. Furthermore, these meetings were used to receive feedback about the EPOC with regard to the further development of the instrument.

Evaluation of the EPOC: ED and ICU

The evaluation of the EPOC consisted of two parts. First, we examined the occurrence of EPOC items by assessing how many times each item of the EPOC was observed. Second, we determined the reliability of EPOC by assessing the interobserver reliability.

Data from two distinct studies that applied the EPOC were used for this evaluation, one conducted in four ED and the other in six ICU. Both studies assessed the effect of a medical team training in a controlled trial, comprising a baseline measurement and a follow-up measurement.22 In the ED the first version of the EPOC was applied, whereas in the ICU the second, revised, version was used. The data sets of the ED and ICU were therefore examined separately. The measurements within each site were also studied separately, as it was not possible to recruit the same observers during the post-measurement as in the pre-measurement in the ICU departments.

Statistical analysis

The descriptive results of all individual observations were analysed in order to determine the occurrence of the EPOC items. Three parameters were used to assess the interobserver reliability: the intraclass correlation coefficient (ICC), the SEM and the limits of agreement (LOA). Due to the comprehensive number of graphs that the analysis of the LOA creates, the results and discussion are described in online supplementary appendix A (available online only).

The ICC examines the proportion of the total variance that can be attributed to ‘true’ differences between observed persons. The ICC for a single measurement based on absolute agreement23 was determined for each category and dimension. The ICC was derived from both the contemporaneous and individual observations, a method following from the work of Euser et al.24 The restricted maximum likelihood method was used to estimate the variance components and the delta method was used to calculate the corresponding CI.24 To minimise the influence of the observed person, it was made sure that only one observation per unique person was used in the analysis. As the observations were carried out in the context of a controlled trial, the observed persons (subjects) were nested in either the intervention or control unit, which is incorporated in the model as a fixed factor. This resulted in four variance components: (1) the subjects nested within the intervention or control unit; (2) the observer; (3) an interaction between the observer and the intervention or control unit; (4) the residual variance (error). The ICC was estimated dividing the variance of the subject by the total variance, as described by Molenberghs et al.25

The SEM was estimated by taking the square root of the sum of three components of variance, these being the observers, the interaction between intervention or control and observer, and the residue. The SEM can be considered as the estimation of the ‘noise’ of the EPOC.26 ,27

Table 2

Overview of the ICC with its CI, the mean score per item and the SEM


In the ED, 378 individual and 51 contemporaneous observations of individual staff members were conducted in two measurement periods during 240 h of observation. In the baseline measurement, on average 8.3 (range 1–30) verbal behaviours per 30 min observation were counted, compared to 9.5 (range 2–28) in the follow-up measurement. In both measurements the most frequent item was ‘shows that one is listening’ (respectively n=394 and n=599), representing approximately 29% of the observed behaviours. Items belonging to the category ‘self’ were infrequently observed (less than 1% of all observed behaviour in both measurements). Some of the EPOC items were never observed at all (eg, ‘uses authority’).

In the ICU, 1144 individual and 68 contemporaneous observations of individual staff members were conducted in two measurement periods during 640 h of observation. In the baseline measurement, on average 41 verbal behaviours per 30 min observation were counted (range 2–129), compared to 35.5 in the follow-up measurement (range 1–95). This is approximately five times higher than the average of the ED. The most frequent item in the baseline measurement was ‘answers a question’ (n=3094), representing 13% of all observed verbal behaviours. The most frequent item in the follow-up measurement was ‘reacts to suggestions from others’ (n=2393), representing 11.5% of all observed verbal behaviours. There were no items that were never observed.

The ICC in the ED ranged from 0.70 to 0.91, and in the ICU from 0.53 to 0.95, with the self category as an exception in both settings (table 2). The graphs of the LOA (see supplementary appendix A, available online only) show that all measurements stay well within the LOA, although due to insufficient numbers for the ED at baseline the LOA could not be computed. The LOA are small, reflecting low variation in differences between the observers.


The EPOC is a new observational method for assessing non-technical skills through quantifying EPOC of healthcare professionals. We assessed the amount of explicit professional communication in two settings as well as the interobserver reliability. Our results show that some of the verbal behaviours and dimensions occur less often than others. It is plausible that these behaviours do indeed not arise very often, such as ‘uses authority’. Some behaviours may take place more frequently after dedicated training, for instance, ‘explicitly coordinating tasks with each other’. In addition, some concepts may occur but may be difficult to classify correctly due to close overlap with other concepts, such as ‘expresses concerns’ and ‘gives suggestion’.

The results show good interobserver reliability for the EPOC. Although there is no consensus concerning what constitutes a good ICC,28 the general convention is that ICC below 0.40 are poor, between 0.41 and 0.60 are moderate, and above 0.60 are good or even very good (>0.80).29 Most categories and dimensions exceed 0.60. Interobserver reliability of the overall EPOC score, the human interaction category and its underlying dimensions, ‘working with others’ and ‘task-oriented leadership’ are very good in both studies. These findings indicate that the observers have been well trained and that the framework is comprehensive and clear. Furthermore, it means that the EPOC is solid for use in scientific research.

The self category, and its dimension ‘assertiveness’, has the lowest agreement. It can be argued that this category was observed too infrequently in the ED to calculate a valid ICC. During the ICU study, the self category was observed more often. However, the ICC for this dimension was also low in ICU, especially in the follow-up measurement. This suggests it is hard to assess this category reliably.

The follow-up measurement of the ICU study has somewhat lower ICC than the baseline measurement of this study. The environment category even has moderate ICC in the follow-up measurement. This difference is probably due to more formal and informal discussions about the definitions between the baseline observers, resulting in a higher mutual calibration. This signifies the need for an intensive and involving training of the observers, and to keep stimulating discussions about the application of EPOC with each other.

EPOC was applicable both in ED and ICU. Although the transfer of the EPOC from the ED to the ICU went very smoothly (see step 4 of figure 1), both settings differed significantly from each other in outcomes. The ED has overall a smaller CI range in ICC scores than both ICU measurements. A possible explanation for this finding is that the ED observations were conducted by two observers and in the ICU measurements a total of eight observers carried out observations. Another reason could be that the average amount of verbal behaviours per 30-min observation is almost five times higher in the ICU compared to the ED. This difference is probably due to the nature and organisation of work in both departments. In the ED, work processes are mainly organised along a chain of care. This chain starts with the triage and ends with the patient being referred to other providers or being sent home. Healthcare professionals in ED work sequentially rather than simultaneously. Providing care in the ICU is more of a team effort, with regular meetings to discuss the status of a patient. For adequate transfer of the EPOC across medical settings it is important to recognise such differences and details.

A major benefit of EPOC is that the explicit communication as a whole can be quantified. Experience with the EPOC showed that all professional communication during an observation can be classified along the verbal behaviours of the instrument. Moreover, it enables tracking differences in the sorts of professional communication. This is highly relevant when studying the effects of, for instance, a medical team training directed at improving communication, leadership and decision making.

Compared to existing instruments for observing non-technical skills,3 ,15 ,30 the EPOC is distinctive as it quantifies general verbal behaviours rather than appraising context-specific behavioural markers that require clinical expertise. As general verbal behaviours are context independent and occur in every professional interaction, observing these skills does not require context-specific knowledge of the situation, such as clinical expertise. In addition, due to a minimum interpretation of what is being said, even complex situations can still be reliably observed. Quantifying the verbal behaviours makes it especially useful for evaluating changes in occurrence and patterns of non-technical skills.

When using EPOC as an instrument for evaluation, it should be noted that the expected effects of improving non-technical skills may fluctuate depending on the setting, previous training, motivation and so on. In the current context, for example, improving non-technical skills in the ED will probably result in more explicit communication, as there is very little verbal communication to begin with. Yet, for the ICU, in which a lot of communication between team members occurs round the clock, improving non-technical skills may rather change the content than the amount of verbal communication; for instance, to proactive planning instead of troubleshooting. This could even end in a decrease of verbal behaviours in the ICU, as communication becomes more efficient.

The difference in expected effects also emphasises that improving non-technical skills will not always result in more explicit communication. It has been proposed that there is an optimum after which the number of things being said damages the efficiency. For instance, Stachowski et al31 showed that during a simulated crisis, fewer verbal statements were associated with high-performing teams in the control room of a nuclear power plant. In other words, the situation determines what effect can be expected and should be taken into account. Therefore the EPOC should first be adequately tested before the evaluation starts.


A limitation of the EPOC is that it only assesses verbal communication, whereas non-verbal behaviour or things that should have been said are equally relevant. For instance, ignoring a question or purposefully turning your back on someone expresses more than can be said in words. During the development phase of the EPOC, several non-verbal behaviours were tested as part of the observation. However, as observing non-verbal communication is hard to standardise, these non-verbal items did not pass the testing phase.

The EPOC also has limitations related to observation schemes in general. Flin et al32 summarise the boundaries of observational methods in three points. First, a classification of behaviour can never capture every aspect of performance. Second, important but infrequent behaviours are hard to measure once they occur. Third, to err is human also applies to observers. Observers can be distracted, fatigued or faced with too complex situations. An additional fourth caveat in line with the previous one is observer bias, which means that observers are more likely to find those things that they are looking for.

It may occur that the observed person is influenced by the observer, the so-called Hawthorne effect. In our experience this influence was marginal. Observed persons stated that they very quickly became used to the presence of the observers or even forgot that they were being observed at all.

Further research should explore other psychometric properties of this measurement, as described by Mokkink et al.26 The level of reliability could be further increased by studying the test–retest reliability and internal consistency. The validity of the EPOC should also be explored. Studying the criterion validity of the EPOC should answer the question of what the optimum explicit communication is in a particular situation. Furthermore, attention could be paid to cross-validate the EPOC with a measure of non-technical performance. This should reveal to what degree the scores of the EPOC are consistent with changes in the use of non-technical skills (construct validity). In addition, the ability of the EPOC to measure changes in non-technical skills over time (responsiveness) should be assessed.


We developed a new instrument for evaluating the use of non-technical skills in healthcare, which is reliable in two highly different settings. By quantifying professional behaviour, our instrument facilitates the measurement of behavioural change over time. Our results suggest the EPOC can also be applied to other settings.


The authors would like to thank the Royal Dutch Airlines for letting them use the SHAPE checklist and Patricia Antersijn for explaining the development of the SHAPE. Furthermore, they wish to express gratitude to the observers, who helped develop the EPOC by critically using and discussing it. The preliminary results of this study were presented in September 2010 during the Behavioral Science Applied to Acute Care Teams (BSAACT) meeting in Amsterdam. The authors would like to thank the organisation for this opportunity.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors PFK drafted the final manuscript, all other authors (IvN, MdB, DLK, CW, CvD) read, revised and approved the manuscript. PK, IvN, MdB and CvD participated in the development team of the EPOC, as described in the manuscript, which conceived, designed and executed this study. CW participated in the design of the study and was regularly consulted by the development team. DLK helped with the methodological/statistical part of the present study. All authors read and approved the final manuscript.

  • Funding This study was partly funded by Zon-Mw, the Dutch Organisation for Health Research and Development.

  • Competing interests None.

  • Ethics approval This study received ethics approval from the ethical committee of the VU University Medical Centre.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Additional results have been published in a web-only appendix. Other results or data are available on request from the corresponding author.

  • Open Access This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: