Root cause analysis was introduced to a chemical plant as a way of enhancing performance and safety, exemplified by the investigation of an explosion. The cultural legacy of the root cause learning intervention was embodied in managers' increased openness to new ideas, individuals' questioning attitude and disciplined thinking, and a root cause analysis process that provided continual opportunities to learn and improve. Lessons for health care are discussed, taking account of differences between the chemical and healthcare industries.
- root cause analysis
- safety culture
- quality improvement
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Root cause analysis is a method for investigating why things went wrong. The name is intended to signal that we want to get to the “root” of a problem rather than simply covering over the symptoms. Developed decades ago, variations have emerged and spread through many industries including nuclear power, aviation, and chemical plants.1 In the US healthcare industry, for example, the Joint Commission on Accreditation of Health Care Organizations (JCAHO) expects accredited organizations to conduct “a timely, thorough and credible root cause analysis” of all sentinel events.2 The purpose of the case description presented here is to show some of the promise of root cause analysis and, more importantly, to argue that the preparatory work for implementation of this or any tool achieves much of its benefit by changing the culture of the organization.
At a chemical refinery the newly appointed plant manager decided that root cause analysis* could address, simultaneously, a recent history of financial losses, some dangerous incidents, and repeated equipment failures. The plant was owned by a joint venture among chemical companies, and one of these partners used root cause analysis as a regular practice at some (but not all) of its plants. Root cause analysis as a concept is well known in the chemical industry, but the details and extent of the practice vary across companies and plants. The plant manager asked staff from this partner company to introduce their root cause analysis practice to his plant with a learning intervention for about 20 plant employees (and some from other plants who wanted the experience), including operators, maintenance staff, engineers, and first line supervisors.
The 3 week intervention included exposure to investigation, analysis, and reporting methods, culminating in oral and written reports to plant management. The particpants were divided into four teams with diverse backgrounds, each including at least one experienced root cause facilitator (most from other plants). Each team investigated one of four significant recent incidents or repeated problems that were of high financial and/or safety significance. Each day of learning included one or more brief lectures on skills needed by the teams, open time for teams to collect data and work on their analyses, and discussions to share insights and observations across teams. One investigation is used as an example to illustrate the tools and techniques of root cause analysis (box 1).
Box 1 Major features of root cause analysis
Investigations always involve collecting information and preparing a report, but this particular intervention emphasized some rigorous and laborious elements:
a time line of events, sometimes detailed to the minute;
an “is/is not” process that differentiates circumstances where the event occurred from similar circumstances where it did not—for example, what differed between this charge heater and a similar one that did not explode, or between this moment in time and earlier times when the same charge heater ran uneventfully3;
a detailed causal event diagram working backwards in small logical steps from the explosion to the causes and conditions that were collectively necessary to produce the incident (see box 2), and working back further and further to uncover deeper causes;
a process of categorizing the quality of data used to build links in the causal event diagram (was each causal link a verifiable fact, an inference, or a guess?).
Box 2 Analysis of causes of the charge heater explosion
The causes listed below give only a flavour of the rigorous and detailed effort to identify as many causal links as possible, use facts to support or rule out possibilities, and conclude with the most plausible causes.
Operators ran the burners in the charge heater unevenly to increase heat in order to achieve a higher desired production level, while avoiding alarms that would signal an unsafe condition.
Heat was removed more slowly than usual from the tube skin because coke had adhered to the inside of the tubes and was acting as an insulator. There was more coke than usual because it was assumed that a new decoking process worked as well as the previous process and no one had checked for coke build up.
The combination of running some tubes hotter (at a higher gas pressure) and the build up of coke moved the high heat point up the tube. The thermocouple meant to detect temperature on the tube skin, set at a height specified in the heater design, was now below the hottest part of the tube. Since alarms did not sound, operators believed the tube temperature was acceptable. The tube ruptured above the thermocouple.
THE CHARGE HEATER FIRE INVESTIGATION
The charge heater fire investigation team examined an explosion and fire that cost $16 million in lost production and repairs. Fortunately, no one was injured. Charge heaters are large gas fuelled burners that operate under very high heat and pressure to help transform waste products from oil refining back into usable products. The residue of this process is coke (coal dust) which can accumulate on the inside of heater tubes. In addition to unearthing causes of the explosion, managers also wanted to discover and ameliorate the conditions that led to this event and might lead to future events.
Analysing the information available, the team concluded that the explosion and fire were caused by a tube rupture inside the charge heater that occurred when the tube's 0.75 inch steel skin got too hot and tore. The team found that three factors together produced the heater explosion: (1) high heat input, (2) low heat removal, and (3) unawareness on the part of operators of the actual tube skin temperature (see box 2 for details). As the team noted in a verbal report: “We are seeing that several things combine over time to create an event”.
The team noted a “key learning” that plant staff made decisions without questioning the assumptions that seemed to underlie them. The maintenance department changed decoking processes but did not know and never checked if the new process was effective. Operators increased the burner pressure in the charge heater to meet higher production goals (set by managers) and changed the pattern of firing heater tubes to avoid setting off alarms but did not know the consequences of doing so. The team speculated that their colleagues probably were unaware of the assumptions they were making and therefore “how quick we jump to conclusions about things”. The team repeatedly mentioned that, prior to learning the new investigation process, they rarely questioned their own conclusion-drawing processes and the assumptions that underlay them.
On the basis of these insights, the team recommended that the plant should identify “side effects” and be more aware of the broader “decision context” when changing production processes. Based on the insights from this and other teams, managers decided to implement a “management of change process” to address the unanticipated side effects and interactions that caused problems.* In follow up interviews with team members 6 months after their investigation, we were told: “The biggest issue that came out [of the root cause analysis training] was management of change . . . [It] is serious. It is real. If you don't do it, your job is on the line . . . [you] have to explain why not.”
Thus, what started as an investigation of a particular problem turned into a broader change initiative that touched on work activities throughout the plant. Teams and managers looked beyond human error to understand why these causes emerged and persisted until they created a serious problem.
CULTURE CHANGE BEHIND THE TOOL
The root cause analysis learning intervention led to both more disciplined thinking about problems and a shift in culture towards more trust and openness. But these benefits were not an automatic result of the tool; they emerged from the way the tool was introduced and put to use.
The process of “drilling down” precisely and narrowly into causes of this incident allowed the team to develop new understandings. Root cause analysis can be used casually and subjectively4,5—for example, by blaming the charge heater fire on “inadequate operator training”. The lack of something is not a cause, however; training is a proposed corrective action disguised as a cause. The underlying cause alluded to by lack of training is really the mental model held by operators regarding the way in which the equipment operates in response to changes in the heater firing pattern. When facilitators coached the teams to define cause-effect linkages in very specific and concrete terms (note the examples in box 2), supported by data, team members developed capabilities for recognizing how assumptions and mental models affect behavior and performance.6,7
The discipline of root cause analysis also encouraged the team to become aware of interdependencies among causes. They recognized interactions between system components—for example, operating practices and maintenance practices—because no single cause was sufficient to produce the event. They recreated a central tenet of the quality movement that working to optimize individual components does not automatically add up to an optimized system.8
Safety culture is based on an ability to entertain doubt, to develop a questioning attitude, and to respect the legitimacy of others' viewpoints.9,10 The discipline of root cause analysis and the team interaction helped team members to become aware of their own mental models and “taken for granted” assumptions. They emphasized the benefit of having a diverse team because of the surprising differences in the way different people looked at the same problem. These new attitudes spread when team members returned to their work groups. One team member summarized his new approach by saying he now questions his co-workers: “I say, are you sure? Are you sure? Did you look at the initial aspects of what happened?”
The root cause analysis intervention also affects the way people share information across disciplinary and hierarchical boundaries.11,12 In our early observations of the intervention, it was evident that at least some participants were anxious about being open with colleagues in their own department or in other departments, or with management. Would operators talk to engineers? Would an operator working on this investigation be perceived as having sold out? Would managers listen to reports that were critical of their own behavior? The investigation could have blamed the operators for “getting around” the tube temperature alarms, ignored the role of management decisions about production goals, and instituted more monitoring and rules. A punitive or controlling response would have reinforced barriers to the open flow of information, discouraged participation in any root cause analysis procedure, and masked the underlying, systemic causes of the event.13–15
However, plant management encouraged openness, fact based reasoning, and urgency to solve collective problems. The plant manager was frequently present and consistently supportive. This management support did not arise spontaneously: it was carefully nurtured by the facilitation team who were working publicly with the investigation teams but privately coaching senior managers to act less defensively and be visibly engaged with the investigations. This engendered a willingness to confront reality and to surface underlying assumptions about “how we do work around here”. Resistance to discussing problems across functional groups faded as participants recognized that team members from other groups welcomed their information and insights about common issues.*
LESSONS FOR HEALTHCARE ORGANIZATIONS
In general, healthcare organizations have been late to adopt new organizational practices such as total quality and root cause analysis.16 This reluctance stems, in part, from a culture of individual responsibility tied to heavy investment in professional expertise and high variability among patients, in-group secrecy in response to a punitive legal environment, and organizational structures that reinforce boundaries between groups. Yet health care is following the same trajectory as did other industries in moving beyond human error, gradually building capabilities to understand and act effectively on systemic structures.5,10,17 Aspects of the healthcare culture—such as the values of excellence, discovery, and caring for others—support this progress.16,18
Although managers of healthcare organizations want tools and techniques to help them achieve higher levels of safety, quality, and performance, the assumption that the value of a tool is in the tool itself leads us to overlook the importance of the context in which the tool is introduced and used. Root cause analysis, computerized prescriptions, and other tools not only represent knowledge and embody improved procedures, but also create new opportunities for people to communicate with each other, challenge each other to rethink their assumptions, and work together in different ways.
Although JCAHO requires root cause analysis for signal events, these analyses may fall short of their potential. The quality of analysis depends on the quality of input data, and the quality of input data depends on the quality of conversations and relationships. People share what they know and participate fully when they trust that they and their colleagues will be treated fairly and that appropriate positive actions will be taken.14,19,20
Moreover, root cause analysis can be used casually to reinforce existing beliefs or with discipline and precision to challenge existing assumptions and reveal complex underlying processes. People naturally select and interpret data to support prior opinions and to please powerful audiences.4 Managers have considerable power to influence systems towards truthful reporting and rigorous analysis or towards comforting signals and palliative answers. Pursuing facts and digging out causes is difficult, confusing, time consuming, annoying, uncertain, and politically hazardous unless coaches model appropriate behaviors and managers provide sufficient resources and psychological safety.
In the case study presented here, the goal of the facilitation team was to educate management by challenging their mental models with rich and compelling data and interpretations. Managers who accepted faulty equipment or human error as a sufficient cause of problems were squandering opportunities to understand systems and make meaningful changes. Managers began to understand their collective responsibility to provide the resources (people, time, money, equipment, plans, opportunities, legitimacy, procedures, etc), shape the structures (for example, silos versus cross-functional teams), and set the values (for example, “it's great we hit that production goal early, but how many safety corners did we cut to make it?”) by which the system operates. The cultural legacy of the intervention was embodied in managers' increased openness to new ideas, individuals' questioning attitude and disciplined thinking, and a root cause analysis process that provided continual opportunities to learn and improve.
New tools and techniques create opportunities to reshape cultural assumptions, values, and practices through the context in which the tools are introduced and used.
The quality of decision making depends on the quality of input data, and the quality of input data depends on the quality of conversations and relationships.
New understandings have to be put into action and made visible to everyone in order to build trust, motivate participation, and enhance performance.
The cultural legacy of the root cause learning intervention was embodied in managers' increased openness to new ideas, individuals' questioning attitude and disciplined thinking, and a root cause analysis process that provided continual opportunities to learn and improve.
There are strengths of the healthcare culture that can be built upon to support new understandings and new practices, rather than trying to break the existing culture.18
New understandings have to be put into action and made visible to everyone in order to complete the change process.21 The root cause analysis intervention showed that operators could tell engineers about technical problems and get the engineers to listen, that front line employees could tell managers about how managers' resource allocation decisions were creating unanticipated safety and quality issues, and that managers would listen, refrain from “shooting the messenger”, and take action to make things better. These small wins become stories that build trust, motivate further participation, shift the culture, and enhance performance.
↵* At this plant “root cause” was used as an adjective, never as a noun. They emphasize that there is no single root cause, only an analysis process to learn more about probable causes.
↵* Unlike most companies, investigation of causes is separate from specific recommendations. Managers decide on actions, sometimes with the help of a solution development team.
↵* *While openness was key to the reporting process, written reports were kept brief due to legal and regulatory vulnerability.