Article Text

Download PDFPDF

Development of the Huddle Observation Tool for structured case management discussions to improve situation awareness on inpatient clinical wards
  1. Julian Edbrooke-Childs1,
  2. Jacqueline Hayes2,
  3. Evelyn Sharples1,
  4. Dawid Gondek1,
  5. Emily Stapley1,
  6. Nick Sevdalis3,
  7. Peter Lachman4,5,
  8. Jessica Deighton1
  1. 1 Evidence Based Practice Unit, University College London and the Anna Freud Centre, London, UK
  2. 2 Department of Psychology, University of Roehampton, London, UK
  3. 3 Department of Health Service & Population Research, King’s College London, London, UK
  4. 4 International Society for Quality in Healthcare (ISQua), Dublin, Ireland
  5. 5 National Clinical Lead SAFE, Royal College of Paediatrics and Child Health, London, UK
  1. Correspondence to Dr Julian Edbrooke-Childs, Evidence Based Practice Unit, University College London and the Anna Freud Centre, London NW3 5SU, UK; ebpu{at}


Background ‘Situation Awareness For Everyone’ (SAFE) was a 3-year project which aimed to improve situation awareness in clinical teams in order to detect potential deterioration and other potential risks to children on hospital wards. The key intervention was the ‘huddle’, a structured case management discussion which is central to facilitating situation awareness. This study aimed to develop an observational assessment tool to assess the team processes occurring during huddles, including the effectiveness of the huddle.

Methods A cross-sectional observational design was used to psychometrically develop the ‘Huddle Observation Tool’ (HOT) over three phases using standardised psychometric methodology. Huddles were observed across four NHS paediatric wards participating in SAFE by five researchers; two wards within specialist children hospitals and two within district general hospitals, with location, number of beds and length of stay considered to make the sample as heterogeneous as possible. Inter-rater reliability was calculated using the weighted kappa and intraclass correlation coefficient.

Results Inter-rater reliability was acceptable for the collaborative culture (weighted kappa=0.32, 95% CI 0.17 to 0.42), environment items (weighted kappa=0.78, 95% CI 0.52 to 1) and total score (intraclass correlation coefficient=0.87, 95% CI 0.68 to 0.95). It was lower for the structure and risk management items, suggesting that these were more variable in how observers rated them. However, agreement on the global score for huddles was acceptable.

Conclusion We developed an observational assessment tool to assess the team processes occurring during huddles, including the effectiveness of the huddle. Future research should examine whether observational evaluations of huddles are associated with other indicators of safety on clinical wards (eg, safety climate and incidents of patient harm), and whether scores on the HOT are associated with improved situation awareness and reductions in deterioration and adverse events in clinical settings, such as inpatient wards.

  • healthcare quality improvement
  • patient safety
  • safety culture

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

An area of concern for clinicians and researchers is evidence suggesting that England may have some of the highest levels of childhood mortality in Europe.1 Some of the deaths could be due to failure to recognise the seriousness of the medical condition or to recognise deterioration in that condition.2 Early detection of risk factors for patients, including deterioration on hospital wards, is key to improving patient outcomes.3 There are multiple and complex causes of preventable morbidity and mortality in hospitalised patients, including unidentifiable and identifiable safety risks. Identifiable risks include delayed diagnosis of medical conditions, delayed recognition of deterioration, a lack of recognition of patient concerns and a lack of appropriate resources and staff.4 Proposed solutions to address these identifiable risks, such as early warning system scores, are often restricted by fragmented approaches that fail to build capacity across hospitals, and focus predominantly on technical solutions as opposed to learning and cultural ones.5 6 The integration of information is essential to achieving high levels of safety.

‘Situation Awareness For Everyone’ (SAFE) was a 3-year project aiming to redirect the clinical team’s view of the patient and their disease or ‘clinical gaze’.7 In this process, a range of prospective indicators of risk or deterioration, including clinical indicators and staff concerns, are considered. The main intervention of the SAFE programme was the routine use of ‘huddle’ meetings on the wards. A huddle is an ‘ad hoc meeting to re-establish situational awareness, reinforce plans already in place, and assess the need to adjust the plan’.8 Situation awareness can be defined as ‘the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future’.8 Increased situation awareness of potential risks in real time on inpatient wards may help reduce identifiable safety risks. For example, one study found that increasing situation awareness led to approximately 50% fewer unplanned transfers to higher levels of care.9

Huddles involve a suite of interventions to support and facilitate a ward culture of proactive rather than reactive care. The core tasks of a huddle are the identification of risks to patients, the development of a shared team understanding of patients who are at risk of deterioration and the making of plans to mitigate such risks. The teams would incorporate other tools such as early warning system scores and structured communication methods to aid these tasks. However, the way in which these tasks are achieved is also crucial in huddle theory. They must be brief, structured case management discussions of approximately 10–15 min duration in total, optimise staff engagement and focus on essential information only.9 10

As outlined in the literature,9 a theory of change proposes that huddles enable collaborative and efficient information exchanges within ward teams, which fosters a shared clinical view of the current health state of patients. This promotes increased situation awareness, which leads to opportunities to identify actions that may be taken to mitigate risks and prevent deterioration of patients.9 The introduction of huddles in the USA has demonstrated that huddles can lead to increased efficiency among staff members, improvements in the quality of information sharing, increased accountability, feelings of empowerment and a culture of collaboration and community. These work together to increase staff members’ quality of awareness about patients and to facilitate staff members’ capacity to enhance patient safety.10


The above evidence suggests that huddles may be effective at increasing levels of situation awareness in clinical settings, such as inpatient paediatric wards. However, how huddles are implemented in practice, and in different contexts, remains unclear. A review of the literature indicates that, to the best of our knowledge, there is no valid and reliable observational assessment available to apply to huddles. This is crucial when attempting to understand and improve how huddles take place and how they increase situation awareness, which is the primary desired outcome. Such a tool would help researchers and frontline staff to identify the essential ingredients of huddles and, in particular, it would help clinical staff to reflect on their practice to uncover areas for improvement. An observational instrument, like the Huddle Observation Tool (HOT), could help provide evidence for researchers and clinicians to determine whether huddles are implemented in line with best practice. Effectively, this could afford a reliable assessment of the fidelity with which the huddle intervention is applied (ie, whether huddles are delivered within wards as originally intended11). It would allow subsequent linkages between fidelity of implementation and effectiveness of the huddles in improving response to deterioration and enhancing the safety of patient care. The aim of the present research was to develop the HOT to capture important elements of huddles.


A cross-sectional observational design adhering to relevant reporting guidelines12 was used to develop the HOT through three standard psychometric instrument development phases (see figure 1). A favourable ethical opinion for the research was received from Dulwich Research Ethics Committee prior to data collection (reference: 14/LO/0875).

Figure 1

Summary of phases of development.

Phase 1: review of evidence and initial tool development

HOT version 01 (V01) was developed by using the following inductive process. A researcher (JH) reviewed the literature, with a particular focus on research generated by the original developers of huddle methodology9 10 and the design and use of observation tools in hospital settings.13–16 An expert steering group was consulted, which included clinicians routinely working on paediatric wards (PL) and a senior researcher with over 10 years of expertise in developing clinical observation tools (NS).

Three researchers then conducted initial observations of huddles on two paediatric wards to identify the essential observable elements of huddles and field notes were recorded. These information sources were used together to generate an initial item pool for testing in the field for construct validity and reliability. Assessors were researchers with MSc level training in research methods. They were trained in non-participant observations of huddles in a 2-hour workshop before observations, with in-depth follow-up sessions as they reviewed these initial huddles to review their experience and any challenges. A training guide was also developed for HOT assessors (see online supplementary appendix A for instructions on how to conduct the observations and online supplementary appendix B for guidance on how to use the HOT) and observations were discussed and reviewed in team meetings. Any discrepancies in ratings of these initial huddles were discussed by the team until a consensus was achieved as to how huddles should be rated to ensure the assessors were ready to be deployed.

Supplementary Appendix 1

Supplementary Appendix 2

Phase 2: revision and consultation

HOT V01 was used to observe a total of n=16 huddles (9 morning huddles, 3 afternoon huddles, 3 evening huddles and 1 night huddle) across four paediatric wards participating in SAFE by four researchers over 2 months (January to February 2015).

Huddles lasted between 2 and 12 min and comprised between 2 and 16 members of staff, including doctors, nurses and allied health professionals. In addition to completing the tool, field notes were taken and huddles were audio-recorded. Sampling aimed for maximum heterogeneity of sites adopting the new initiative. Therefore, four paediatric units across England were selected, including two specialist children’s hospitals and two district general hospitals, with the number of beds and length of stay considered to make the sample as varied as possible.

Substantive revisions were made to HOT V01 based on feedback from the observations. The key areas of feedback were that: (A) some words were ambiguous, (B) some items contained more than one concept (‘all attendees have the opportunity to speak’ vs ‘some attendees are ignored and few speak’), (C) there was overlap between some of the domains assessed (eg, ‘Culture’ contained items on collaboration which were also contained in ‘Coordination and cooperation’), and (D) the tool was too long, as huddles were generally not more than 5 min in duration. Through discussion with the expert steering group, HOT V02 was produced to more expediently capture the observable dimensions of situation awareness demonstrated through huddles. HOT V02 was viewed as more functional, with all quantitative data captured on one page.

The instrument was submitted for final refinement by the research team. Minor changes were made to HOT V02, predominantly consisting of simplifying the items and visual layout. HOT V03 was then used to rerate the 16 phase 2 audio-recorded huddles. The first author (JEC) and the last author (JD) rated the huddles blinded to each other, then discussed any rating discrepancies, and agreed the final ratings. No further changes to the instrument occurred as a result of this process.

Phase 3: tool refinement and reliability analysis

Finally, for the main reliability analysis element of the study, huddles were observed across four paediatric wards participating in SAFE (as in phase 1) by three researchers over 16 months (February 2015 to May 2016, which was the duration of the overall evaluation17). This included non-participant and participant observations (the difference between the two was that in non-participant observations the observers were not members of the ward teams and did not interact at all with the ward teams during meetings, but in the participant observations they were active members of these teams). Huddles took place up to three times per day. The non-participant HOT V03 was used to observe a total of n=27 huddles (22 morning huddles, 5 afternoon huddles), of which n=16 (5 children’s hospitals, 11 paediatric wards in district general hospitals) had two independent ratings for analysis. Huddles lasted between 2 and 11 min (median=4.5, IQR=3–7.25) and comprised between 3 and 20 members of staff (median=5, IQR=4–7), including doctors, nurses and allied health professionals (eg, physiotherapists).

Participant observation ratings were returned using the participant HOT V03 on n=30 huddles, of which 4 had missing data, resulting in a final sample of n=26 huddles. The overall data set for this phase was therefore 42 assessed team huddles with aggregated scores used when more than one observer rated a huddle. Huddles lasted between 2 and 15 min (median=10, IQR=5–10) and comprised between 3 and 20 members of staff (median=6.5, IQR=5.25–15), including doctors, nurses and allied health professionals (eg, physiotherapists).

As shown in figure 2, HOT V03 comprised four items rated on a 5-point scale from strongly disagree (0) to strongly agree (4) and was used with a global scorei ranging from 0 to 16 each with free text response sections for notes. (1) ‘Risk management’ (‘Were there opportunities to identify risks and come up with concrete plans for these risks?’), which was considered to clearly capture the key component of situation awareness. (2) ‘Structure’ (‘Did the huddle have a clear structure?’), which was included to separate the format of a huddle from other concepts, such as collaborative culture and leadership. The role of a leader or coordinator was considered to be less salient than the organisation of the huddle (eg, a huddle could have a clear structure without a clear leader if all participants are aware of their role and turn in the discussion). (3) ‘Collaborative culture’ (‘Did everyone have the opportunity to contribute and were all points of view respected?’), which was considered to be superordinate to the other items included in V01 (eg, a collaborative culture would necessitate all members, irrespective of their level of seniority, to be respected). (4) ‘Environment’ (‘Was the huddle free from distractions?’), which was identified in the phase 1 observations as a salient component of huddles. The role of the leader was asked as a separate dichotomous question. The majority of huddles were rated as having a clear leader identified (Nnon-participant observation=93%, n=9 missing; Nparticipant observation=23%, n=2 missing). Finally, the use of visual tools was asked as a separate dichotomous question. HOT V03 demonstrated face validity and was considered functional by all participant observers and by feedback from non-participant observers during site visits.

Figure 2

Huddle Observation Tool.

Analytic strategy

To assess the reliability of HOT V03, data were entered into and analysed using SPSS V.21.18 For phase 2, inter-item consistency was used, which is suitable when assessing single-construct scales (ie, the quality and consistency of huddles as reflected by the global score). For phase 3, our aim was to assess the structural element of HOT V03. To this end, we examined the inter-rater reliability to assess whether or not different raters responded in a consistent manner using intraclass correlation coefficients (ICC) for the global score; an ICC ≥0.61 is considered acceptable for clinical feedback and ≥0.71 for research purposes.15 For the inter-rater reliability for each item, we calculated the weighted kappa coefficient, taking into account the ordinal type of data for items in Likert scale. Ratings were available for 16 huddles; given the small number of data points and the original 5-point response scale, responses were recoded to three response options to enable analysis (ie, ‘strongly agree’ or ‘agree’ was recoded as 2, ‘neither’ as 1, and ‘disagree’ or ‘strongly disagree’ as 0). We also examined the relationships between individual items, and the individual items and the global score.


Phase 2: revision and consultation

Overall, the main changes were to simplify the tool (ie, clarifying domains, reducing the number and content of items, adding dichotomous (Yes vs No) questions) and changes to the layout. A participant observation version with training guide was also developed to mirror the non-participant version, which was used and discussed by staff at one of the hospital wards. A conference on SAFE open to staff from the 12 hospitals taking part in the project took place from 28 to 30 April 2015 during phase 2 (further data on conference attendance are not available). HOT V02 was presented to attendees and feedback recorded.

The descriptive statistics and results of the phase 2 reliability analysis are shown in table 1. There were two significant positive inter-item correlations between the structure and environment items, and collaborative culture and environment items. This suggests that more structured and collaborative huddles were conducted in environments with fewer interruptions. The correlation between the structure and environment items was 0.70, which is recommended when measures are designed to tap into the same underlying construct.19 The smaller remaining inter-item correlations were not necessarily surprising as they were designed to tap into four different domains that may in fact be orthogonal. There were three significant positive item-total correlations between the structure, collaborative culture, and environment items, and the total score, and these were all above recommended values.19 Huddles with higher levels of structure, collaborative culture and uninterrupted environments had higher overall global scores, suggesting that these elements were key to the quality and consistency of huddles.

Table 1

Phase 2 inter-item correlations and item-total correlations of the HOT V03

Phase 3: prospective reliability analysis

The descriptive statistics and results of the phase 3 inter-rater reliability analysis are shown in table 2. The weighted kappa was only above the recommended value of 0.70 for the environment subscale. Still, the CIs did not overlap for the collaborative culture and risk management subscales. Although the CIs did overlap for the structure subscale, the majority of ratings were the same: 13 out of 16 ratings were rated as disagree or strongly disagree by both raters. Finally, the ICC was above the recommended value of 0.71 for the total score, suggesting that raters’ scores were similar for this element.15

Table 2

Phase 3 non-participant descriptives and inter-rater reliability of the Huddle Observation Tool V03

The descriptive statistics and results of the phase 3 internal consistency reliability analysis are shown in table 3. There were three significant positive inter-item correlations between structure and collaborative culture (large correlation), collaborative culture and risk management (moderate correlation), and risk management and environment (moderate correlation). These suggest that more structured huddles were associated with more collaborative cultures. There were also more opportunities to discuss risk management in huddles within more collaborative cultures and more opportunities to discuss risk management in huddles with fewer interruptions. However, only the correlation between structure and collaborative culture was above the recommended value of 0.70.19 All four domains showed large significant positive item-total correlations,20 and these were all above recommended values.19

Table 3

Phase 3 participant inter-item correlations and item-total correlations of the Huddle Observation Tool V03


The aim of the present research was to develop the HOT, which captures the essential ingredients of huddles. HOT V03 demonstrated face validity and was considered functional by participant and non-participant observers. In the phase 3 analyses, inter-rater reliability for the non-participant observations was acceptable for environment and the total score. It was low for collaborative culture, structure and risk management, although this may be partly explained by the small number of observations. Still, this suggests that collaborative culture, structure and risk management are more variable in how observers rate them but, nevertheless, agreement on the total score for huddles was acceptable. In the participant observations, the correlations suggest that more structured huddles were associated with more collaborative cultures, and that there were more opportunities to discuss risk management in huddles with more collaborative cultures, as well as in huddles with fewer interruptions. Further research should examine the direction of causality in these correlated huddle variables—for example, does a more collaborative culture lead to a better organised and structured huddle? Or does a clear huddle structure facilitate participants to achieve more collaboration? Qualitative research with participants of huddles will have a crucial role in indicating the most likely pathways through which these aspects of huddles influence each other, which is being captured in other parts of the SAFE evaluation.17

In addition, different huddles were rated by the two types of rater, and future research should examine whether these differences are explained by differences between huddles or between raters. Future research with a larger sample of observed huddles should continue to examine the reliability of the HOT; for instance, in terms of factor structure and internal consistency. Future research should also examine whether (A) data from participant and non-participant observations of huddles are associated with other indicators of safety on paediatric wards, such as safety climate and (B) whether huddles improve situation awareness and reduce preventable death on paediatric wards. These are important research questions that we are addressing in ongoing analysis from the SAFE evaluation programme.

In addition to future research, the HOT enables clinicians to view, record and reflect on—in a structured and precise manner—how a team communicates and identifies risks, whether this is part of a huddle or another type of case management discussion. This could provide useful information to help understand the processes around how a team currently minimises potential patient risks and subsequent harm. In turn, this may form the basis of continuous quality improvement of team and ward-level processes and patient care. HOT could be used as an anonymous feedback tool if given to all staff to rate their own huddles in order to inform continuous quality improvement of huddles.

Limitations should be considered when interpreting the findings of the present research. A small sample size of huddles was observed, particularly for the inter-rater reliability analysis, and the findings may not generalise to other huddles conducted across SAFE sites. Assessors took notes on the huddle in real time and completed the HOT ratings immediately after the huddle and before any discussions of the huddle between observers took place. Our training was very clear on the need not to calibrate or compare ratings between raters at the time of the data collection; anecdotal feedback from our raters confirms this is what they actually did on the wards. Although huddles were observed by researchers, the final ratings in the phase 1 analysis were based solely on audio recordings, meaning that valuable visual or non-verbal data may have been lost. Equally, however, a strength of this approach is that audio recordings enable the most accurate capture of the verbal-interactional features of the huddle in situ and do not rely solely on observer memory and note taking.21 Self-selection bias may apply, in that participant observation tools may have only been completed and returned by staff with more positive experiences of huddles. Future research should therefore recruit a random sample of participant observers.

Still, to the best of our knowledge, HOT is the first participant or non-participant observation tool for huddles. From a research perspective, HOT could be used to provide an objective measure of changes to huddles, situational awareness and collaborative culture over time, which is crucial when attempting to understand huddles and their role in increasing situation awareness. For this purpose, HOT was used over the course of the SAFE programme. From a clinical perspective, HOT could be used to assess the effectiveness of huddles as clinicians test and develop the best way to implement the huddle to improve teamwork and situation awareness, in line with best practice guidance in their ward. Although HOT was tested in paediatric wards, it may be useful for researchers and clinicians in reviewing and reflecting on huddles, and determining how they can most effectively be used to improve situation awareness in any clinical setting in which patient safety is a priority.

Thus, future research could also examine the application of HOT beyond huddles and paediatric wards. The implementation of huddles which are supported using HOT may be of relevance to adult wards, including intensive care and high dependency units. Inpatient mental health settings have an array of different safeguarding concerns and huddles may be a useful means of promoting situation awareness in these settings. Future research should also examine modified versions of HOT to capture other safety improvement interventions. For example, as part of the SAFE programme, wards have been implementing ‘druggles’, which are huddles specifically about patient medication, with the aim of minimising medication errors. We hope that the present research will help clinicians and researchers to be able to systematically analyse huddles and explore how they are implemented and, in turn, how implementation can be improved. The development of routine huddles and other approaches to reviewing safety is of particular interest, and the evaluation of this type of patient safety committee is too often forgotten.



  • i A global score was calculated; however, the unidimensionality of the instrument has not yet been examined. Future research, through the collection of a larger number of HOT assessments, should enable us to conduct a factor analysis of huddle observations, such that we can evaluate and establish the dimensional structure of the instrument. Larger scale data collection was beyond the scope of the current clinically focused project.

  • Contributors JEC led the research and drafting of the paper under the supervision of JD, and JEC and JD rated the huddles and provided oversight of the research. JH conducted the literature review and developed the initial observation tool. ESh, DG and ESt conducted the observations of the huddles. NS and PL provided expert input on revising the observation tool. All authors contributed to the drafting of the paper.

  • Funding Situation Awareness For Everyone (SAFE) is a Health Foundation funded programme; both the implementation of SAFE and the evaluation were funded by the Health Foundation. This work was also supported by funding from WellChild, the funding was specifically to support evaluation work around perspectives of parents and young people and to support patient and parent involvement in the research. This programme of work and evaluation was also supported by the Royal College of Paediatrics and Child Health (RCPCH) which leads on the delivery of the programme. Sevdalis’ research is supported by the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care South London at King’s College Hospital NHS Foundation Trust. Sevdalis is a member of King’s Improvement Science, which is part of the NIHR CLAHRC South London and comprises a specialist team of improvement scientists and senior researchers based at King’s College London. Its work is funded by King’s Health Partners (Guy’s and St Thomas’ NHS Foundation Trust, King’sCollege Hospital NHS Foundation Trust, King’s College London and South London and Maudsley NHS Foundation Trust), Guy’s and St Thomas’ Charity, the Maudsley Charity and the Health Foundation. Deighton was supported by the National Institute for Health Research (NIHR) Collaboration for Leaderships in Applied Health Research and Care (CLAHRC) North Thames at Bart’s Health NHS Trust. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, the Department of Health, or RCPCH.

  • Competing interests None declared.

  • Ethics approval Dulwich Research Ethics Committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.