Background Standard operating procedures (SOPs) should improve safety in the operating theatre, but controlled studies evaluating the effect of staff-led implementation are needed.
Methods In a controlled interrupted time series, we evaluated three team process measures (compliance with WHO surgical safety checklist, non-technical skills and technical performance) and three clinical outcome measures (length of hospital stay, complications and readmissions) before and after a 3-month staff-led development of SOPs. Process measures were evaluated by direct observation, using Oxford Non-Technical Skills II for non-technical skills and the ‘glitch count’ for technical performance. All staff in two orthopaedic operating theatres were trained in the principles of SOPs and then assisted to develop standardised procedures. Staff in a control operating theatre underwent the same observations but received no training. The change in difference between active and control groups was compared before and after the intervention using repeated measures analysis of variance.
Results We observed 50 operations before and 55 after the intervention and analysed clinical data on 1022 and 861 operations, respectively. The staff chose to structure their efforts around revising the ‘whiteboard’ which documented and prompted tasks, rather than directly addressing specific task problems. Although staff preferred and sustained the new system, we found no significant differences in process or outcome measures before/after intervention in the active versus the control group. There was a secular trend towards worse outcomes in the postintervention period, seen in both active and control theatres.
Conclusions SOPs when developed and introduced by frontline staff do not necessarily improve operative processes or outcomes. The inherent tension in improvement work between giving staff ownership of improvement and maintaining control of direction needs to be managed, to ensure staff are engaged but invest energy in appropriate change.
- Patient safety
- Quality improvement
Statistics from Altmetric.com
Attempts to reduce the risks of inadvertent patient harm in surgery have taken a variety of approaches, but the majority of these have addressed either standardisation of the work process or improvement in work team culture. High-risk industries such as aviation have employed standardisation routines to improve reliability and reduce error since the late 1930s. One of the most common methods for achieving this is formalisation of frameworks of activity, aiming to make tasks and actions explicit and structure and standardise work using standard operating procedures (SOPs). In manufacturing, the International Quality Standard (ISO 9001, 2008) dictates the use of SOPs in manufacturing processes. There are a range of examples where the introduction of standardisation, and specifically SOPs, has been shown to have a benefit in healthcare. SOPs have been developed to standardise the management of medical conditions1; these include the management pathway for common surgical emergency procedures2; the documentation of patients’ vital signs3; and task completion.4
The best known example of an SOP for surgical safety is the WHO surgical checklist,5–7 although recent research suggests that effective implementation of such checklists may be difficult.8 ,9 Other approaches to standardising work in the operating room environment that have been advocated include systematised quality improvement approaches10 ,11 and teamwork training initiatives to improve communication and co-operation between staff.12
We have categorised safety risks and interventions in surgery as affecting work systems, workplace culture or technology.13 Identifying these three ‘dimensions’ led us to question which approach was most effective and whether approaches addressing more than one dimension were more effective than single dimension approaches. This study of the SOP approach as one of the single dimension approaches which address system standardisation is part of a programme of work (the Safer Surgical Services or S3 programme), in which we aim to evaluate and compare the different approaches to reducing error and risk in surgical care.
The study was designed as a controlled interrupted time series, with 6-month preintervention clinical data collection (active and control arms), 3-month intervention (active only) and 6-month postintervention clinical data collection (active and control). Observational process measures were collected in theatre in a large convenience sample of operations in the 3 months before and after the intervention, in both active and control theatres.
A tertiary referral centre specialising in orthopaedic and reconstructive surgery with 106 beds and six operating theatres. Both active and control surgical teams specialised in lower limb orthopaedic surgery, performing operations such as knee arthroscopic procedures and knee and hip primary and revision arthroplasty. The control team, however, concentrated more on hip surgery and the active teams on knee surgery. Every possible care was taken to reduce contamination between the two groups by ensuring that only the active group staff attended training and received support.
The intervention in this study was a facilitated introduction of the principles of SOPs, with the surgical team given the choice on where to apply standardisation and the format of any SOP processes and documentation developed.
The whole surgical team receiving the intervention (theatre nurses, anaesthetists, anaesthetic practitioners, surgeons, theatre managers, porters and sterile services) was invited to attend 2×2 h training sessions, delivered on-site by a senior Management Consultant (SN). These sessions covered the concepts of SOPs and of plan, do, check, adjust cycles to introduce change. Discussion was then encouraged within the theatre teams as to how standardisation of theatre work might help to improve safety and reliability, and where it would be most usefully applied. Suggestions were reflected back to the teams by means of a survey and they were asked individually to rank projects: the highest ranked project was chosen for further work.
We assessed the effect of the intervention with three observational measures of work processes and three measures of clinical outcome: we evaluated team non-technical skills using Oxford Non-Technical Skills (NOTECHS) II,14 technical process deviations using the glitch count15 and compliance with the WHO checklist using a simple 3-point check.16 We also recorded the duration of the operations. Operations were observed in full by two independent observers, one with a clinical background, the other with human factors (HF) training. Data collection booklets for each surgical procedure were developed17 to record observational data. Intraoperative observation began when the patient entered theatre and ended when they left the operating theatre. The clinical observers included two surgical trainees (MH and ER) and one nurse practitioner (JM), the HF specialist had a higher degree in HF and/or psychology (SP, LM and LB). All observers underwent a 2-month training period for familiarisation with surgical procedures and to harmonise the data collection process, prior to preintervention data collection. Results were agreed by discussion at the end of each procedure and entered into a secure deidentified database.
Oxford NOTECHS II is a behavioural rating scale,18 modified from a previous version described and validated by our group.19 Each subteam: (nursing, surgical and anaesthetic) is scored on a 1–8 scale against four behavioural parameters: leadership and management, teamwork and co-operation, problem solving and decision making and situational awareness. We compared the change in total Oxford NOTECHS II score and subteams scores between the preintervention and postintervention phases in the active and control groups.
Glitches are defined as deviations from the recognised process with the potential to reduce quality or speed, including interruptions, omissions and changes, whether these actually affected the outcome of the procedure.15 ,20 Glitches were recorded, timed and classified independently by each observer, before being agreed by discussion and entered into a secure database following the completion of the operation. A glitch rate per hour (total number of glitches/operation length) was calculated for each operation, allowing operations of differing lengths to be compared.
WHO surgical safety checklist
The quality of WHO surgical safety checklist performance was evaluated by observing the time-out (T/O) and sign-out (S/O) sections of the checklist, using an evaluation system previously described by our team.4 The observers recorded whether these procedures were attempted, and where this happened, recorded three quality parameters: was all checklist information communicated? was all the team present? and was there active participation by team members?
Clinical outcome measures
Anonymised clinical outcome data on readmissions within 90 days, complications and length of stay (LOS) were extracted from hospital administrative records data by Trust staff with no connection to the study team. We obtained ethics clearance to extract non-identifiable individual patient-level data from all patients under the care of the consultants participating in the S3 study. For each consultant in the active or control group, clinical outcome data were obtained for all patients operated on for 6 months before and 6 months after the intervention was delivered. To ensure anonymity and to avoid linking consultants to a particular case, individual consultant data were amalgamated into either intervention or control groups.
For process measures, difference between the control and active arms was assessed using two-way analysis of variance (group×time), with treatment (control vs active) and time (preintervention vs postintervention) as factors. Differences between groups were assessed by the group×time interaction, effectively comparing the pre–post change in the active and control groups. Preintervention and postintervention differences are reported as 95% CIs. All statistical analyses were carried out in R (V.3.0.1). For clinical outcome data, baseline demographic information was summarised using descriptive statistics. t Tests for mean age and χ2 test for gender distribution were used to compare the before and after periods. Binary clinical outcome variables in the before and after periods were compared using OR and 95% CIs from a logistical regression adjusted for age and gender. Mean LOS in the before and after periods was compared using linear regression controlling for age and gender. This statistical analysis was conducted in Stata V.12.
Patients whose operations were observed were informed of the possibility of observations taking place and given opportunity to opt out if they wished. Staff in the theatres undergoing observation were given information on the study and provided written consent before observations took place.
Engagement with the facilitated introduction of the SOP
After receiving the training, and ranking the issues raised, staff chose to address how to reliably record and communicate the tasks required for the operating list. A project team was developed, consisting of a consultant surgeon, a registrar and two theatre nurses. The decision was made to trial a dynamic briefing tool, using a whiteboard where all information usually be shared verbally would be recorded to allow sharing with the whole team, and information transfer would be standardised across team members. Questions and answers could also be posted on the board by team members. The board supported a written SOP to standardise recording of list-related information including surgical and anaesthetic plan, equipment required and any high-risk pieces of information. The briefing tool summarising this information was used in 29/29 of the observed postintervention operations. The written SOP is included in the online supplementary appendix.
A total of 105 operations were directly observed. This corresponds to approximately 20% of the caseload of the active theatre during the relevant study period (table 1). The proportion of joint replacements versus lesser procedures was higher in the control group than the active group; however, the case-mix and the characteristics of the patients in both teams remained stable during the whole study period (table 1).
Mean Oxford NOTECHS II score was 74.84 before and 73.79 after the intervention in the active group (difference=0.36; 95% CI −4.64 to 5.36), whereas it was 72.52 before and 72.88 after in the control group (difference=−1.05; 95% CI −5.36 to 3.26; figure 1). The difference between the change in the active and control groups was not significant (p=0.668). Subteam analysis revealed no differences between groups for surgeons (p=0.97), nurses (p=0.54) and anaesthetists (p=0.53).
The mean glitch rate per operation was 4.75 glitches per hour in the active group and 4.92 glitches per hour in the control group before the intervention. After the intervention mean glitch rate rose in both groups, to 7.80 glitches per hour in the active group (difference=3.05; 95% CI 0.96 to 5.15) and to 9.79 glitches per hour in the control group (difference=4.87; 95% CI 2.71 to 7.03). The difference in change between the two groups was not significant (p=0.24; figure 2).
The profile of glitches encountered across the control/active preoperative /postoperative phases did not differ markedly, except for the notable increase in distraction glitches in both control and active groups (figure 3).
WHO surgical safety checklist compliance and quality
Of the 105 observed operations, 94 attempted a WHO T/O. There was no significant difference in the attempt rate between preintervention (24/25; 96%) and postintervention (25/29; 86%) cohorts in either the active arm (difference=−10%; 95% CI −28% to 9%) or the control arm (preintervention 21/25; 84%, postintervention 23/26; 88%, difference=4%; 95% CI −18% to 27%). The difference between the change in the active and control groups was not significant (p=0.25).
All three requirements for T/O were satisfactory in 11/24 (44%) cases in the preintervention active arm, which decreased to 8/29 (28%) in the postintervention phase (difference=−18%; 95% CI −48% to 11%). All three requirements for T/O were satisfactory in 14/25 (56%) cases in the preintervention control arm, which decreased significantly to 5/26 (19%) in the postintervention phase (difference=−37%; 95% CI −65% to −8%). The difference between the change in the active and control groups was not significant (p=0.28).
The WHO S/O was attempted in only 16 of 105 operations. There was no significant difference in the attempt rate of S/O between preintervention (0/25; 0%) and postintervention (2/29; 7%) attempt rates in the active arm (difference=7%; 95% CI −6% to 20%; p=0.54) or the control arm (pre 0/25; 0%, post 1/26; 4%, difference=4%; 95% CI −7% to 15%; p=1). The difference between the change in the active and control groups was not significant (p=0.849). The frequency and quality of T/O and S/O completion is shown in figure 4.
Complications and LOS were significantly higher for the control group than for the active group, probably because of the different case-mix, but there was no evidence of a differential time trend across the intervention period. Readmissions were very infrequent in both groups, LOS was stable in both groups and complications rose in both groups when comparing the two time periods (table 2).
This study presents an evaluation of frontline staff-led development of SOPs for the theatre process. The staff chose to standardise their briefing method and content. After this intervention study was completed, the change in practice was adopted by other surgical teams and sustained for over 2 years, but the effect of this change did not show in our primary outcome measures.
Standardisation of work using SOPs (of which the WHO checklist is an example) is accepted as one of the key tools for improving reliability and safety in healthcare. However, attempts to objectively evaluate its effects are relatively uncommon. In this study, we trained surgical teams in how to develop an SOP and helped them do so, but we achieved no benefit either in measures of team performance or patient outcome. The team involved worked enthusiastically to develop a solution that worked for them, and the result was adopted by other surgical teams in the unit. The benefits of the work were felt by the team, but not reflected in our evaluation metrics. This contrast between subjective and objective achievement in safety interventions is not a new problem.21
The controlled study design, the use of semiobjective, well-validated and described measures of process and the use of objective clinical endpoints collected by observers blind as to study group all add importantly to the validity of our findings. Our dual observation method allowed us to demonstrate reliability across observers.4 ,15 ,18 We also attempted to separate those involved in delivering the training and supporting the intervention (SN) from those evaluating the interventions effects, though in practice, this was logistically difficult. Our data clearly demonstrate the necessity of controls in such studies, which has also been highlighted by other recent improvement studies where failure to include controls would have led to serious misinterpretation of results.22 ,23 Interestingly, the controls here helped identify secular trends in specific outcomes which may have reflected changes in local context: glitch count and complications both worsened substantially in both groups during the study period. During this time the sudden death of a key organisational figure coincided with introduction of a new IT system in theatres, combining to produce significant turmoil for the unit, which may be relevant to these findings. Globally all glitch categories increased following the intervention, but the increase appears less in the active group than in the control group, suggesting that the intervention may have weakened the overall trend. The teams had the freedom within the study to direct their focus; this was much appreciated and appeared to empower some more junior team members. The effect of the group working developed relationships in the teams, and the work they completed continues to be used and appreciated within the theatre complex.
Our study was necessarily small (given the methodological choice of full observation of whole operations) and was, therefore, not powered to detect clinical outcome differences: these may, therefore, have been missed through type 2 error. It is difficult to power studies such as these, as the active component of the intervention and its potential for benefit is often unknown. We could not blind observers to study group, and the observational methods included a measure of judgement, introducing the possibility of observer bias, although we detected no suggestive evidence that this had occurred in our results. There may have been contamination between the active and control teams, as the nursing and anaesthetic teams did rotate between theatres. However, the physical component of the briefing intervention (the whiteboard) was only present in the active theatres, so was not available to the control teams to use. We witnessed staff from other theatres attempting to borrow the temporary whiteboard and improvising the intervention in their own theatres. Where these theatres were not part of the control arm, this was permitted and demonstrates how the intervention can be taken on by other theatres with little teaching or demonstration. The observers were present for over a year after the implementation of the SOP intervention and were able to witness its continued use. The design has been tweaked slightly, and less of the theatres who were not an original active theatre have maintained its use, illustrating the impact on sustainability when individuals are closely involved in the change process.
Reflecting on the failure to affect preselected outcome measures, it is possible that these targeted the wrong aspects of the system. The active teams after intervention appeared bolstered by working together on a solution, and their new method of briefing resulted in much less wasted time at the start of the day, reduced handover repetition and was more robust in ensuring team situational awareness. This change in the volume and quality of work ‘prelist’ was not captured by the in-theatre data collection.
It is also possible that the SOP developed (dynamic briefing tool) was tackling too small a part of a complex system to make a measurable impact on important outcomes. Our approach was truly staff led: the teams chose how to implement the change and what change to make. Doubts within the research team about the effectiveness for our outcome measures of the team's choices were expressed, discussed and eventually suppressed during the study, since we had committed at the outset to the staff-led principle, because of its importance for motivating and empowering staff to implement and sustain changes. Two messages are, therefore, that effective interventions require a careful balance between staff empowerment and expert guidance and that when using staff-led interventions in complex systems, there is a real risk that preselected outcome measures are out of focus with the area of improvement the team choose to work on.
There were initial problems with the implementation of the intervention. Identifying a time when all of the theatre teams were available to meet together was difficult and often relied on staff staying after lists had finished or cancelling theatre lists, both of which required much negotiation with management and staff. Our consistent experience has been that the potential future benefits of safety interventions rarely outweigh the current costs of lost activity in the value systems of hospital middle management or senior surgical staff. To make large-scale adoption feasible may therefore require organisational changes in hospitals, which provide these key figures with different motivators. It was also difficult to attract support for the benefits of standardisation, which seems to have many negative connotations in healthcare professions, apparently rooted in a fear of loss of professional autonomy.24 Thus, the idea of larger scale standardisation, perhaps of an entire surgical process, was not welcomed in spite of widespread uptake in other industries (eg, DEFSTAN 0025, X7)) and within healthcare.5 ,25 The military use the term ‘standing’ rather than ‘standard’ operating procedure, in reference to a units individual procedures, whereas ‘standard’ could imply one procedure across all units—this language may be useful in the healthcare context.
Some of these issues have been discussed previously by others. Proudlove et al,26 reviewing the existing literature on quality improvement pointed out the risk that ‘staff-led’ interventions not tied into a larger strategy or organisational business strategy may not improve value, and that many hospital processes require fundamental re-design, rather than being ready for improvement: these were certainly issues we experienced within the current study. Dixon-Woods et al27 identify 10 challenges for quality improvement, many of which we would concur with. We found, for example, that convincing people of a need for change was a major challenge, but difficulties in maintaining sustainability were not.
Staff-led development of an SOP for theatre work appeared to improve work processes but this was not demonstrated by the preselected outcome measures. Staff-led interventions have weaknesses as well as strengths. Future work on SOPs in this area should use a combination of evidence-based expert development and facilitated introduction. Improved motivation of middle management and senior clinicians to engage fully with safety interventions is needed and will require changes in their incentives and professional culture.
The authors would like to thank Julia Matthews and Laura Bleakley for their assistance in intraoperative data collection.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
Contributors PM, SN and KC conceived and designed the intervention, with input from LM, ER, MH, SPP, DG, OR-A and GC. Data collection and floor work with frontline staff was carried out by LM, ER, SN, SPP and MH. GC and OR-A led the statistical analysis. All authors contributed throughout the writing process. LM wrote initial drafts of the article and PM completed the final one. All authors agreed the final version of the article.
Funding This paper presents independent research funded by the National Institute for Health Research (NIHR) under its Programme Grants for Applied Research programme (Reference Number RP-PG-0108-10020). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Competing interests None.
Ethics approval Oxford A Ethics Committee (REC:09/H0604/39).
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.