Article Text

PDF

The problem with root cause analysis
  1. Mohammad Farhad Peerally1,
  2. Susan Carr2,
  3. Justin Waring3,
  4. Mary Dixon-Woods1
  1. 1SAPPHIRE, Department of Health Sciences, University of Leicester, Leicester, UK
  2. 2John Walls Renal Unit, University Hospitals of Leicester, Leicester, UK
  3. 3CHILL, Nottingham University Business School, University of Nottingham, Nottingham, UK
  1. Correspondence to Dr Mohammad Farhad Peerally, SAPPHIRE, Department of Health Sciences, University of Leicester, Centre for Medicine University Road Leicester, LE1 7RH UK; mfp6{at}le.ac.uk

Statistics from Altmetric.com

Introduction

Attempts to learn from high-risk industries such as aviation and nuclear power have been a prominent feature of the patient safety movement since the late 1990s. One noteworthy practice adopted from such industries, endorsed by healthcare systems worldwide for the investigation of serious incidents,1–3 is root cause analysis (RCA). Broadly understood as a method of structured risk identification and management in the aftermath of adverse events,1 RCA is not a single technique. Rather, it describes a range of approaches and tools drawn from fields including human factors and safety science4 ,5 that are used to establish how and why an incident occurred in an attempt to identify how it, and similar problems, might be prevented from happening again.6 In this article, we propose that RCA does have potential value in healthcare, but it has been widely applied without sufficient attention paid to what makes it work in its contexts of origin, and without adequate customisation for the specifics of healthcare.7 ,8 As a result, its potential has remained under-realised7 and the phenomenon of organisational forgetting9 remains widespread (box 1). Here, we identify eight challenges facing the usage of RCA in healthcare and offer some proposals on how to improve learning from incidents.

Box 1

Lessons not learnt

This example provides a summary of a real case that occurred in a hospital and the failure to learn from the incident in spite of a root cause analysis.

In a large acute hospital, a patient underwent a routine cataract surgery—an operation with a minimal risk profile—led by an experienced ophthalmologist. The wrong lens was inserted during the operation. The error was promptly recognised postoperatively; the patient was returned to the operating room and the procedure was safely redone.

A subsequent root cause analysis identified that two lenses were in the operating room, one (the wrong one) brought in by an operating department assistant and the other by the surgeon. The investigation report identified that having more than one lens in the operating room and a failure in the double-checking process had caused the incident. The action plan included the development of a new protocol emphasising the individual responsibility of the surgeon to select the appropriate lens, a training programme, improved documentation and a poster emphasising the importance of double checks.

One year later, in the same hospital, a different patient with a different surgeon had the same procedure. Once again, the wrong lens was implanted. This time, the staff member who chose the wrong lens was the surgeon.

The unhealthy quest for ‘the’ root cause

The first problem with RCA is its name. By implying—even inadvertently—that a single root cause (or a small number of causes) can be found, the term ‘root cause analysis’ promotes a flawed reductionist view.10 Incident investigation in the aftermath of an adverse event is intended to identify the latent and active factors contributing to the genesis of a particular adverse event,4 but too often results in a simple linear narrative that displaces more complex, and potentially fruitful, accounts of multiple and interacting contributions to how events really unfold.7 ,10–12 This is a tendency exacerbated by use of some RCA techniques (such as timelines or the ‘five whys’) that tend to favour a temporal narrative rather than a wider systems view.

Questionable quality of RCA investigations

Once an adverse event is classified as meeting the definition of a serious incident, an RCA is supposed to involve the convening of a skilled multidisciplinary investigation team, preferably with representations from risk management personnel and clinical teams.13 Over a predefined timeframe, which is mandated in some countries (60 days in the UK, 45 days in the USA),3 ,14 this team collects and analyses data and formulates an action plan.13 However, challenges pertaining to the quality of this process abound. The task facing the investigation team is far from straightforward: the events underlying an incident have to be reconstructed from many different sources of varying degrees of reliability, usefulness and accessibility, ranging from hospital records, staff interviews and statements, to records of workforce rotas.15 The information obtained directly from healthcare workers is influenced by their willingness and ability to provide relevant data16 ,17 and by nature of the relationships and conversations between investigators and other stakeholders.18 The involvement of patients and families affected by the incident is wildly variable, with only limited evidence-based guidance on how it can best be done.19 Yet, despite the complexities, sensitivities and challenges of this work, RCAs in healthcare are typically conducted by local teams, not the expert accident investigators who are proficient in systems thinking and human factors, cognitive interviewing, staff engagement and data analysis that are characteristic of other high-risk industries.20–22 Further, inconsistent use is made of the various investigative tools that are available.15 ,23 As a result, exemplary practice in the analysis of healthcare incidents is rare.24 ,25

Political hijack

Constrained by strict timelines, and skewed by hindsight bias26 and lack of independence from the organisation where the event took place, RCAs in healthcare often end up a compromise between ‘depth of data and accuracy of the investigation’.16 The quest to complete an investigation on time and produce a report risks goal displacement, where the report is seen as the end product rather than the beginning of a learning cycle. Reports themselves, influenced by the need to preserve interpersonal relationships and by hierarchical tensions and partisan interests,8 ,16 ,27–29 may not always reflect the content of discussions during investigations nor the realities of what happened.15 ,16 Investigating teams may end their analysis once they have reached a cause of mutual convenience, perhaps one that edits out causes (and thus solutions) deemed to be beyond the remit or capacities of the organisation16 and that occludes deeper organisational and sociopolitical dynamics.7 ,15 ,16

Poorly designed or implemented risk controls

The key goal of RCA is to prevent similar events from recurring.7 ,10 ,16 But few studies have investigated the nature and effectiveness of risk control strategies stemming from RCA investigations in healthcare. The available evidence points to the endemic tendency of investigators to settle for administrative and perhaps ‘weaker’ solutions (such as reminders) rather than those that address the latent causes, such as poorly designed technology or defective operational systems.8 ,16 ,30–32 Again, some of the reasons for this lie in the limited expertise of local investigation teams in selecting and designing appropriate risk controls.7 ,30 ,33 Only limited guidance is available21 ,33–37 and what is available may not be sufficiently attentive to the specifics of the healthcare context. Yet poorly designed or ineffectual corrective actions may do harm.31 ,32 Among other unintended consequences, risk migration, where attempts to mitigate a risk create new risks, may easily occur.38 ,39 Recommended actions may also, of course, result in little change,7 ,30 ,40 especially (but not only) when senior managers are not involved in the generation of action plans and do not support their implementation.41 ,42 Despite the time and effort invested in RCAs,7 ,40 ,43 few incentives exist to follow-up formally on action plans:8 ,44 estimates of implementation rates vary between 45% and 70%.31 ,45

Poorly functioning feedback loops

For learning to occur, several conditions must be satisfied. Among the most important of these are the sharing of the outcomes of incident analysis with those involved, those who reported, and those likely to be affected in the future, especially in implementing recommendations. Evidence in other fields suggests that learning from events does not happen by itself:46 purposeful intent is needed both to disseminate the findings47 and ensure that the recommended actions made salient and actionable.46 Yet, as currently practised, feedback mechanisms in healthcare RCAs function poorly, contributing to the disenchantment of staff48 ,49 and frustrating the kind of double-loop learning50 needed to secure change.

Disaggregated analysis focused on single organisations and incidents

The current RCA approach favours analysis of individual incidents in isolation and within bounded organisations. The consequent tendency to generate localised action plans that are not shared more widely may result in failure to disseminate painfully acquired learning and to address deeper, institutionally engrained patient safety concerns.21 ,51 Single incident analysis also frustrates the organisation's ability to assess its vulnerability to recurring events.52 Organisations' inability to effectively prioritise actions may lead to an unwarranted commitment of resources to averting specific very rare events rather than addressing the conditions that allowed the event to occur. Though mechanisms for aggregating learning from incidents and creating alerts do exist in some countries, their impact to date has been limited: similar events often recur in the same or similar organisations (box 1), suggesting failure to learn both within and across organisations.24

Confusion about blame

Though healthcare is often exhorted to embrace a ‘no-blame’ culture, the extent to which this urging is based on a correct understanding of what happens in other high-risk industries is questionable.15 ,53 Investigators in other industries do not set out with a remit to assign blame,20 but that does not mean that individual or organisational culpability is forever sequestered. The vast majority of mistakes and other errors are the result of systems defects that need to be corrected, but when blatant transgressions, neglect or unacceptable behaviour is found, it is clearly wrong to write accountability out of the picture.54 Nor is no-blame the reality in practice, since disciplinary, institutional and legal (civil and criminal) processes continue to operate and are highly visible to healthcare practitioners and managers, yet may appear arbitrary and unsatisfactory both to them and to patients and families. A ‘just culture’ is increasingly promoted in many organisations to balance the disparity between individual blame and organisational accountability.55 This approach, however, comes with problems of its own when applied to incident investigation in healthcare. For instance, some of the more visible features of the just culture philosophy in incident investigation is the use of prescriptive algorithms and decision tools (such as culpability tree) to objectify culpability. Such ‘calculus-like logic’56 may imply that actions committed by staff are binary (either acceptable or unacceptable) without appropriate appreciation of the messiness of the system in which the action occurred.56 ,57

The problem of many hands

RCA is further challenged by the problem of many hands, which describes the problem that many actors and their actions may contribute to an outcome, yet no individual is responsible either for that outcome or for fixing the problems that caused it.58 This problem, which is endemic in healthcare, makes it difficult to address hazards that arise at the level of the system, since many of the actors that are implicated in hazards—including, for example, drug and equipment suppliers—are outside the direct control of individual care organisations. RCA investigations may fail to assign responsibility to such actors, instead reabsorbing responsibility into the organisation where the incident occurred. These organisations typically lack the legal mandate, resources and structural authority necessary to make the changes required.

Discussion

RCA is a promising approach with considerable face validity as a way of producing learning from things that have gone wrong. But it has consistently failed to deliver benefits on the scale or quality needed. The eight problems we have discussed here mean that, too often, RCA results in the tombstone effect: though its purpose is to guard against a similar incident in the future, it may instead function primarily as a procedural ritual, leaving behind a memorial that does little more than allow a claim that something has been done.59 ,60 Incident investigation clearly will continue to play an important role in making healthcare safer, but it must first get better at doing what it does.

The first step in securing improvement is likely to involve the professionalisation of incident investigation: those conducting it need specialist expertise in underlying theories, ergonomics, human factors and hands-on experience of analytical methods.20 For these reasons, the establishment of professional investigatory bodies, such as the one shortly to be launched in the UK, are welcome—though the scope, reach and impact of such bodies will need careful monitoring. Second, the role of patients and relatives in the investigative process needs to be recognised and valued. Such engagement has the potential to generate a unique perspective of the service provided from the end-user's perspective and may foster dialogue that is informative to both causal analysis and design of risk controls.61 The psychological and emotional readiness of patients and families involved in the investigative process needs to be considered, along with the maturity and ability of the organisation to facilitate such a process within the appropriate legal framework. Transparency on the agreed level of involvement is paramount from the start and the outcomes of investigations should be available to patients and relatives, though clarity on how this should best be done is not yet available.19 Third, better understanding of the role of blame is needed. The dissonance caused by claims of no-blame or even just culture and the reality is a source of confusion and distress in relation to RCAs. To address current confusions, clarity is needed on the distribution of responsibility between bodies investigating incidents (whose prime mandate would be to promote learning) and other bodies (including professional regulators and the law courts), and in what instances the investigative body needs to make referrals.22

Fourth, healthcare must focus increasingly on aggregated analysis of incidents.31 ,32 ,45 ,62 ,63 Such a bird's eye view of incidents may facilitate prioritisation of interventions, based on the harm associated with incidents and also on the associated risks. Aggregated analyses can be performed at numerous levels of the organisational hierarchy, for example, the micro (within one department) and at the meso level (organisational).41 At the national level, aggregated analyses offer a way of identifying common themes across similar and apparently more disparate incidents31 ,32 ,45 and may also serve as a means of generating actions that require collaborative efforts between healthcare organisations or indeed between industry and healthcare. Such an example could be for instance product redesigning—a solution that may not be identified through the analysis of a single incident within one department but may reveal itself as a recurring theme when analysing multiple incidents across many organisations. Linked to this, healthcare urgently needs to develop and evaluate much better methods for designing risk controls and other improvement actions. One possibility that could be evaluated, for example, is that of a hierarchy of risk controls.33 ,34 ,36 ,37 ,64 More broadly, the use of active surveillance of issues that have already been detected and monitoring of effectiveness of risk controls need to become a routine part of the risk management process following RCAs. Healthcare also needs to markedly improve its capacity to evaluate, curate and share these risk controls. Such an approach would help to address the problem that organisations tend to constantly reinvent risk controls, resulting in waste and the creation of new risks.58 An easily accessible database with descriptions of risk controls and contexts would enable lessons learnt from one RCA to be shared widely and support a participatory approach65 to organisational learning.

Finally, healthcare needs to do more to detect hazards and assess risks proactively. RCA is essentially retrospective, and depends crucially on an incident being recognised as such, but that may not happen for a variety of reasons: healthcare personnel may have become habituated to particular practices or outcomes, or fear and other negative emotions discourage reporting. Though RCAs were imported from other high-risk industries, the other tools and techniques commonly used in those industries to assess systems and assure their safety before an incident has occurred—such as failure modes and effects analysis (FMEA), hierarchical task analysis and so on—have had far less attention in healthcare66 FMEA, in particular, may be especially useful for the rigorous proactive risk assessment of a select few but high-priority hazards.67 For healthcare truly to become a learning system, action is needed on multiple levels. RCAs have dominated for too long as the principal means of generating learning. The time has come to recognise both their opportunities and their limits.

  • RCA is a promising incident investigation technique borrowed from other high-risk industries, but has failed to live up to its potential in healthcare.

  • A key problem with RCA is its name, which implies a singular, linear cause.

  • Other problems include the questionable quality of many RCAs, their susceptibility to political hijack, their tendency to produce poor risk controls, poorly functioning feedback loops, failure to aggregate learning across incidents and confusion about blame and responsibility.

  • Implementation and evaluation of risk controls to eliminate or minimise identified hazards need to become a more visible feature of the RCA process.

  • To maximise learning, lessons learnt from incidents, descriptions of implemented risk controls and their effectiveness need to be shared within and across organisations.

References

View Abstract

Footnotes

  • Twitter Follow Mohammad Peerally at @FP_Farhad

  • Contributors MFP wrote the first draft of the manuscript, which was subsequently revised critically and edited by SC, JW and MD-W. MD-W edited the final version of the manuscript. All authors approved the final manuscript version being submitted for publication.

  • Funding Wellcome Trust Senior Investigator Award (Mary Dixon-Woods (WT097899)) and Health Foundation (Improvement Science Doctoral Awards).

  • Competing interests MD-W is the Deputy Editor-in-Chief of BMJ Quality and Safety.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.