In analysing the safety strategies of organisations successfully managing hazardous systems it is apparent that safety itself is a problematic, and even risky, concept. It is less the valuation of safety per se than the disvalue surrounding mis-specification, misidentification, and misunderstanding that drives reliability in these organisations. Two contrasting models of high reliability can be identified in precluded event and resilience focused organisations. Each model is adapted to different properties in the raw material, process variances, and knowledge base of the organisation. These two models bound the reliability approaches available to medicine. The implications of each for medical reliability strategy are explored, and the possible adaptation of features from each for medical organisations are assessed.
Statistics from Altmetric.com
In his satiric screenplay The hospital, Paddy Chayefsky writes
“The hospital was to do all the killing for me. All I had to do was to make the doctors patients in their own hospital. They died as ritual victims of their own institution.”
We constantly complain about organisations, including medical organisations. Are they actually lethal?
Obviously some organisations, such as military, policing, criminal, or terrorist organisations, can be lethal to outsiders. For medical organisations the issue of safety concerns risks to those inside the organisation, the patients and practitioners themselves. Assuming that safety is part of the purpose of these organisations, the question of safety then becomes an issue of unintended and preventable outcomes, an appraisal of organisational reliability in protecting against these outcomes.
This is the perspective of the analysis that follows. The attributes of safe organisations are those that lead to reliability to the suppression and containment, in short, the management of failures.
THE ILLUSION OF SAFETY
A major issue to confront in the analysis of safe organisations is that there are no safe organisations. This is so because safety is both an indeterminate and illusory concept; it is always a forward looking assessment, the run of past experiences cannot determine the safety of an organisation. An organisation is only as safe as the first accident ahead of it, not the many successful experiences that lie behind. It may be possible to state probabilistically that an organisation is relatively safe, argued on past performance over a period of many cases, but this is quite different from asserting that it is safe in any particular future case. In addition, the term safety, even relatively, has no objective standard. In a variety of industries and technologies the question of “how safe is safe enough?” has only a social or political answer, reflected in regulatory, legal, or community tolerances.
Because safety can never be established ex ante, as a positive attribute of an organisation, those organisations committed to safety at the highest level—high reliability organisations—adopt a special approach to its pursuit. From research on nuclear power plants, air traffic control systems, nuclear aircraft carriers, and electrical grid management organisations1–6 one proposition concerning the promotion of reliability emerges.
It is not the valuation of safety, per se, but rather the disvaluing of misidentifying, mis-specifying, and misunderstanding things that drives the pursuit of high reliability in an organisation. All else being equal, the more people in an organisation who are concerned about the misidentification, mis-specification, and misunderstanding of things, the higher the reliability that organisation can hope to achieve.
There are several elements in this proposition. Firstly, it reflects the indeterminacy of safety mentioned above. Secondly, it places the focus on a negative possibility and not a presumed positive attribute. This is important because it avoids the arrogance of optimism, which is a major threat to reliability and safety in organisations.7
Note also that it is toward issues in understanding—mistakes in conception and representation rather than simply adverse events in performance and execution—that attention is directed. This is in recognition that representational mistakes can lead to greater harm. It is these mistakes, in rules, planning, and training, that give rise to mistakes of execution and performance. They transform what would be randomised mistakes and lapses into outputs of a causal system.8
Another element in this proposition is that concern about misunderstanding is generalised across a wide set of possible tasks, operations, and assumptions. It is not confined to a prior set of specified and bounded safety issues. To do so would invite another mistake of mis-specification. Nor do high reliability organisations confine such concern to an official safety officer who relieves others in the organisation from responsibility for the issue. Finally, both attention and responsibility regarding failures of understanding are widely distributed throughout the organisation, across levels and specialties. These properties observed in high reliability organisations are embedded in both their structure and culture. Table 1 describes them, in contrast to what is often observed in more conventional organisational settings, and indicates how they are frequently promoted.
The negative focus on failure possibilities is reinforced in the culture of the organisation—the psychological orientation, attitudes, and values modally distributed among its members. Instead of focusing retrospectively on safety records achieved, attention is directed ahead to possible failures against which the organisation may be yet unprotected. The valuation of caution and scepticism is a counter to hubris—an attitude that threatens reliability. A variety of checks and counter checks, including multiple sign offs, are employed as a precaution against potential mistakes.
The focus upon representational mistakes, or errors of rendition as they have been termed,9 is reflected in both culture and procedure. “Don’t let anyone think this technology isn’t still capable of surprises” cautioned a maintenance manager to his subordinates at a nuclear power plant after finalising a new procedure. Procedures themselves are subject to continual critical scrutiny, even as they are followed, in order to correct and improve them. In one high reliability organisation virtually every procedure has been revised a minimum of three times and as many as 21 times. This is in contrast to those organisations that concentrate on punishing failure at the performance level and may ignore the errors embedded in the knowledge base of the organisation or in the rules themselves. Beyond procedures, high reliability organisations often feature supervisors and their subordinates engaging in “what if” games, formulating strategies for responding to contingent scenarios, some of which may be unspecified in formal training or procedure manuals. In some high reliability organisations rapid improvisational action is actively discouraged. In at least one US nuclear power plant, organisational culture actively stigmatises people who act too rapidly and decisively as “cowboys”. Indeed, for a nuclear power plant acting outside of analysis is a regulatory violation.
Further, in high reliability organisations, concern about failure is applied to a great many activities and processes, not simply a set of those labelled safety. There may be extended root cause analyses applied to failures to look for precursor factors that go well beyond the proximate causes. The organisation may also be engaged in a restless push for constant improvement—across many fronts. In effect, there is a fear that if there isn’t continual improvement, what has been gained may well erode. Because reliability and safety variables are risky to bound, it is not unusual for high reliability organisations to address attention to a variety of issues that indicate a general well orderedness of the organisation. For example, there may be a stress on clean work spaces, tools replaced after use, and even trivial items, such as towel dispensers, being maintained in good working order. In effect, these factors in their combination constitute proxy variables for the more elusive safety or reliability that is sought. But they represent an extension of analysis, not a substitution for it.
Finally, attention to the possibility of failure is widely distributed throughout high reliability organisations. Many individuals feel a responsibility to be watchful for the unexpected or for conditions that may ramify into larger failure. In power plants, unusual sounds not heard before may trigger many individuals to investigate. Procedures are owned at many levels of the organisations and proposed revisions are encouraged from personnel all the way down to the shop or floor level. The culture of the organisation conveys prestige for those who are imaginative in creating contingency or “what if” scenarios. In short, a valued role is to be what might be termed a partisan of the neglected perspective.
TWO MODELS OF HIGH RELIABILITY
The general features outlined above distinguish organisations with special commitments to reliability from those we might term marginal reliability organisations. The latter treat reliability as one of many competing values and trade off some measure of reliability at the margins against cost, speed, production efficiency, and the like. But there are at least two models of high reliability that can be identified as distinct alternatives.10 These may well bracket an appropriate strategy for medical organisations.
The reliability of precluded events
Since the 1998 publication of To err is human by the Institute of Medicine there have been calls to strive for a zero error standard for medicine. This is another way of addressing a question of whether events and occurrences that simply must not happen can be successfully precluded by medical organisations. We can say something about the organisational dimensions of this challenge from the analysis of some organisations that confront it daily and have so far survived. From the general properties above we can sketch out some of the likely requirements associated with attempting to achieve reliability of this order. To preclude a set of events and occurrences from happening, an organisation must do the following.
1. Be able to specify formally those core events that must not occur
It is distinctly possible that an ultimate limiting condition for reliability is this specification. An organisation might conceivably hope to guard against more than it knows about, but it cannot reliably preclude things about which it is uncertain. An organisation within which there is fundamental doubt about what it is trying to prevent, or whose members disagree on what is to be prevented, has a very weak foundation for building any reliability strategy.
2. Establish clear priorities in support of events to be precluded
Reliability must be non-fungible with respect to priorities, not to be traded off at the margins for competing organisational values such as greater efficiency or production output. In air traffic control centres, for example, it is clearly established that it is better to close off a sector or deny take off clearances than to fall into an overcrowded or congested condition that threatens separations. In nuclear power plants, it is clearly established (by regulation) that it is better to shut down a reactor than risk falling into operating conditions outside of prior analysis. In other words, it is an accepted priority that safety concerns trump those of lost power production.
3. Identify a set of precursor events or conditions that can funnel through specified chains of causation into precluded events
These precursor events must also be specified clearly and their causal connections and subsequent undesirability widely accepted. Cutting corners with respect to these is discouraged and condemned in the culture of the organisation.
4. Establish a set of procedures that specify behaviours that guard against both precluded and precursor events
These procedures must be both formal and evolving. The members of the organisation at all levels must be committed to following procedures and the culture must embrace following them. At the same time procedures must be viewed as a process of perfecting the knowledge base of the organisation, not an end state.
5. Cultivate and reward sensitivity and attentiveness
This should be widely done throughout the organisation toward events or conditions that might also have a causal connection, even ambiguously understood, to precursor and precluded events.11 This sensitivity, unlike the case with core and precursor events, is not formally limited to a specified set of possibilities.
6. Find a way to pursue incompatible strategies simultaneously and buffer their paradox
These paradoxes lie in the reciprocal relationship between such things as following routine and yet being sensitive to the unexpected; being decisive and yet protecting against errors of commission; and in stressing both anticipation and resilience.12,13 One method of coping with the challenge of simultaneously protecting against errors of omission and commission, for example, is to automate actions in one domain (such as automatic reactor shutdowns) while steadfastly refusing to automate, and requiring joint clearances and sign offs in another (such as in resetting valve adjustments).
7. Develop a formal structure of roles, responsibilities, and reporting relationships that can be transformed under conditions of emergency or stress
Formal hierarchy, for example, may be decentralised on behalf of the development of informal teams in which formally independent roles can be locked together and coordinated in emergencies. This allows the organisation to reconfigure itself to match the full variance of issues with which it might be confronted.
8. Recognise that key features of its strategy and structure are unstable and subject to decay
Reliability at the level of precluded events does not exist in a steady state. Routines will numb mindfulness; trust or shared understandings will erode. Problems once solved will have to be solved again. Reliability is really the management of fluctuations in performance, rather than the guarantee of invariance. Therefore, a continuous monitoring of the organisation is necessary. Frequently restlessness can be observed among key personnel, who believe that if they are not continually watchful and striving for improvement, what they have achieved is perishable.
9. Have external supports and regulation that allow for all of the above
A key component of reliability at this level is in the organisation’s relationship with its environment. Actually, social dread of the precluded events (a reactor meltdown, the collision of planes under air traffic control) is an important resource. It allows the organisation to gain support for its failure priorities and resources for the enormous costs involved in this type of reliability. Close regulation is also an important element. It not only imposes constraints on behalf of reliability, it transfers learning across similar organisations and protects these organisations from the full demands of competitive market efficiencies.14
As challenging as they are, high reliability research suggests that these above elements may well be necessary if an organisation is to hope to achieve reliability on the order of precluding events. Further, this research cannot establish that even these are a sufficient set for achieving that reliability. On the other hand, more optimistically, there is no evidence that indicates that discrete reliability thresholds exist with respect to these features; that is, having more of them rather than fewer may well improve an organisation’s quest to reduce (if not preclude) adverse events.
The resilience model of high reliability
Another approach to reliability that might be considered for medical organisations is that which focuses on strategies of resilience—ways to circumscribe, cope with, and contain failure rather than preclude it.15 Here the organisation in effect accepts the inevitability of precursor events but hopes to contain them before they escalate or ramify into more severe ones.
Recent research into electricity grid management as well as telecommunication systems and large scale water flow systems offers a clearer picture of what is involved in the resilience strategy of high reliability.16 Table 2 describes some of the properties of resilience approach in contrast to the precluded event model.
One major difference in the two approaches lies in the materials associated with the internal processes of the organisations. Precluded event reliability can be attempted because there can be great standardisation and control over inputs—the raw materials, equipment, and external demand—as well as over throughput processes of an organisation. In nuclear power plants, for example, special nuclear grade standards apply to all critical materials and replacement parts can be expected to be nearly identical to originals. Similar physical systems can be expected to behave more or less the same, and one system can be expected to repeat its behaviour over time. Many of the problems encountered will be similar to previous ones and standardised procedures can be employed to handle them.
But in other reliability settings such standardisation and control cannot be counted upon. In electricity grids, managers must deal with a variety of power producers and users. Different power pathways have different characteristics, such as load and heat tolerances, that must be known and allowed for by dispatchers. Telecommunications systems must accommodate a variety of input devices; in water systems every dam, channel, or spillway may be unique. Further, these systems due to their complexity are formally underdetermined; that is, they are capable of assuming more conditions or system states than can be planned for or anticipated in formal designs. This means they have the capacity to confront managers with problems of high variety and significant novelty.
Related to these contrasts are differences in the character of the knowledge base of the organisation. In organisations of the first type, formal deductive principles govern the design and operation of key systems. In resilient organisations, however, because they are underdetermined, formal knowledge gives an incomplete representation of what is happening at times. Idiosyncratic components, unanticipated events, and the unexpected outcomes of planned activities can place a premium on experiential knowledge or tacit understandings, even intuitions, that don’t reduce to formal generalisations. The ability of grid operators and dispatchers to fashion work arounds to compensate for the gaps or crashes that can occur in automated systems, for example, is an important resource for reliability. Usually these abilities stem from long experience with how things work (often from the days prior to automation) gained over a wide variety of jobs and tasks.
Organisations that seek a precluded event standard for reliability are quite likely to have strong command and control over both inputs and outputs to their production processes. Nuclear power plants, as mentioned, have nuclear grade controls over their material inputs. They have guns and guards to give them extensive physical control over environmental inputs. At the same time these organisations also have control over production outputs and schedules. It is understood that nuclear power plants can go off line anytime their reliability needs require it. Air traffic controllers can deny planes take off clearances or entry into sectors when conditions threaten the reliability of separations. By controlling both inputs and outputs, they can keep both within narrow bandwidths and thus put less pressure on internal production processes.
Resilience focused organisations, on the other hand, rarely have the luxury of controlling demand or inputs. Electrical load comes from consumers who are not under the direct control of dispatchers; nor are the generators in a de-regulated market oriented electricity system. In addition, storms and external factors can shower the grid management organisation with a variety of challenging inputs. At the same time the grid managers are responsible for the balance of load and generation at all times on the grid. This is their reliability output and they must often use a variety of means to produce it. The challenge for reliability in the electricity grid and in other resilience focused organisations is to cope with a relatively wide band width of inputs and transform them into a very narrow band width of outputs. It is this reduction of variance, from high to low, that is the special reliability challenge of these organisations.
While the deductive principles that comprise the knowledge base of precluded event organisations allow careful forward analysis of proposed actions, this is not as likely to be the case in resilience focused organisations. Because the technical systems with which they deal are formally underdetermined, these organisations confront more cases that may not be covered by prior analysis. Further, the wide range of inputs may well require a wider internal variety of potential responses in order to produce the narrow output required for reliability.17 These conditions can lead to the necessity for improvisation to create the variety of options necessary for coping with diverse inputs. Electrical grid managers may improvise strategies to re-route power flow in order to bypass a troublesome section of line, or they may find sources of generation outside of their control area to patch in during peak load periods.
In one prominent theory pertaining to reliability it has been argued that complexity (the potential for non-linear interactivity among many component parts) and tight coupling (the close interdependency among these parts) in technical systems makes them highly prone to catastrophic failure.18 Yet complexity and tight coupling, while definitely a challenge to high reliability, can also be positive elements in support of the improvisational strategy often required in resilience focused organisations. The complexity of a technical system can provide more raw material for improvisation. The tight coupling can place managers in a location from which they can see the larger picture and string together elements to create a variety of options. With many generators, line links, and interties across control areas, grid managers have a variety of elements with which to work. The tight coupling of these elements means that it is actually easier to link them together in creating options for balancing load and generation.
Finally, because of the difficulty of covering actions under formal planning, anticipation, and forward analysis, a significant amount of the coping activity of resilience focused reliability organisations must occur in real time. Some of this activity may actually be just in time decisions and actions. Associated with real time coping are the formation of informal groups, a great deal of crosscutting communication across departments and hierarchical levels, and sometimes the bending of procedures to effect needed solutions. While this real time reliability seems decidedly dicey, even haphazard, it represents a functional adaptation to the reality of the systems being managed. But there is one additional feature that adds immeasurably to the reliability of this approach.
The importance of the reliability professional
A critical element in the success of resilience focused organisations is the existence of individuals who are able to function in a world between the deductive principles of formal models and the idiosyncratic frame of case by case experience. They also mediate between the broad scope of design level generalisations and the myopia of the single case. Because of their unusual cognitive perspective and skills, and the fact that they frequently display an unusual commitment to reliability issues as their central focus or client, they could be termed reliability professionals.
Reliability professionals can be anywhere in an organisation but many of them can be found at the middle levels of resilience focused organisations—as department managers, shift supervisors, technical group leaders, or control operators. They are professionals not necessarily as degree holders in specific disciplines but as a consequence of careers that have given them familiarity with a variety of jobs and roles in their organisation or industry. In electrical grid management, they often have long careers spanning virtually every aspect of power generation and transmission.
They are important because they supplement the design gaps of underdetermined systems. They focus on real time challenges but do not lose track of the larger picture. In the cognitive location between formal principles and experiential knowledge reliability professionals are often very good at pattern recognition. In addition they are frequently virtuosos of improvisation—fabricating, adjusting, working around as needed. They appear to be a critical element in the reliability of resilience focused organisations.
IMPLICATIONS FOR MEDICAL SAFETY
In these attributes and models of safe organisations, are there any useful lessons for medical organisations? What overall strategy should they pursue? What standard should they strive to achieve?
Firstly, the precluded event reliability standard does not seem to be a good fit for medicine. Its requirements are daunting for any organisation to achieve. Many of them would be quite difficult for medical organisations. It is not clear, for example, that society views medical mistakes with the same dread as those of other precluded event organisations.19 Medical failures tend to end with the individual patient, as opposed to ramifying widely outward to threaten large numbers of innocent third parties or society collectively. Also, it is not clear that medicine can identify a formal set of precluded events, protection against which can constitute a highest priority. Even addressing sentinel events, a recent focus among medical organisations,20 does not begin to exhaust potential safety threats to individual patients that inhere in a wide variety of activities.
Because there is no social dread surrounding medical reliability, other values applied to medicine, such as efficiency, overall cost control, and timeliness seem to intrude in ways they do not in precluded event organisations. Further, in medicine, there is neither the close regulatory attention, nor the regulatory supports that are applied to high reliability organisations in other industries. All of this does not mean, however, that adopting some practices of precluded event organisations could not contribute to advancing reliability in medicine.
At the same time, there are some apparent correspondences between the reliability challenges in medicine and those in resilience focused organisations. The raw material in medicine is also underdetermined with respect to formal knowledge. As in resilience focused organisations, there are wider behavioural variances among patents and how they respond to treatment than the standardised materials in precluded event organisations. The role of experiential knowledge, even intuition, is important in medicine—especially in the diagnosis and appraisal of idiosyncratic cases.21 Further, there are limits to control of medical processes under planning and anticipation; important actions, adjustments, and decisions must be undertaken in real time.
In comparing reliability objectives for medical organisations with the two models of reliability described here, and in locating medical organisations along the continuum they form, the following adaptations from each model might be worth considering.
From precluded event high reliability organisations:
Promote the negative concept of safety prevalent. It can never be established on the basis of past performance. They focus instead on the ever present possibility of adverse events, particularly perceptual and conceptual failures attendant to performance mistakes, and the need for constant attention to detect and correct them. A focus on safety as a positive attribute is likely to lead to hubris, founded on a set of specified measures that address the wrong kind of excellence.22
Promote a generalised concern about misestimation, mis-specification, and misunderstanding throughout all activities and widely among organisational personnel. Reward the identification of procedural or rule failures or gaps, and encourage wide participation in ongoing procedural revision processes. Reward all personnel who act as partisans of the neglected perspective regarding reliability. This is especially important for medical organisations given the large number of things that could potentially harm individuals.
Recognise that there are a multiplicity of failure types possible within a medical organisation. These are not identical in cause and correction, and some, such as failures of misguided action versus inaction, may have a reciprocal or paradoxical relation to one another. Therefore, it is unlikely that a single failure suppression regime will adequately address failures of each type.
No matter how narrow a band width of variation (in inputs, outputs, and production processes) may be sought in high reliability organisations, there will always be variance. High reliability is not about establishing invariance, it is about the management of fluctuations. In this sense, it is important to recognise that high reliability cannot be locked in by perfecting the structure of an organisation—its job descriptions, reporting relationships, rules, and procedures. It must be actively and continuously managed. Mindfulness and extra care will erode over time, inter-departmental cooperation and agreements will unravel, and reliability commitments may be chipped away in conflict with other organisational values. Managers must be sensitive to failure signals about their own reliability regimes, and be prepared to find new ways to, in the words of one nuclear power plant manager, restore the fervor.
From resilience focused high reliability organisations:
Recruit, train, and promote individuals to function as reliability professionals throughout the organisation. They are a critical resource for coping with the variance associated with managing underdetermined systems. Their cognitive orientation—combining principles with experience—seems well suited to the character of the knowledge base drawn upon in medical practice. The scope of their focus and their ability to discern patterns seems well suited to keeping track of the whole patient amid institutional pressures either to see only a piece of the patient under treatment, or aggregate patients into the larger picture of patient loads and throughput.23 Reliability professionals are the ones best suited to preserve the value of reliability in relation to other organisational values in the crucible where trade offs are clearest and where they matter most—actual cases in real time.
Safety in organisations is an indeterminate and even illusory concept. It can even be a risky objective if detached from an ongoing concern about overall organisational reliability.
At the highest level, organisational reliability is less about the valuation of safety and more about a widespread organisational disvaluation of mis-specifying, misidentifying, and misunderstanding organisational activities and processes.
Two distinctive models of high reliability can be identified: precluded event and resilience focused reliability. These models vary significantly in organisational strategy and structure.
Although both models of reliability have some features of potential usefulness to medical organisations, the character of medical technology and tasks will likely require a distinctive medical model if high reliability, and ultimately safety is to be a dominant feature of medical organisations.
The role of these professionals should be identified, encouraged, and rewarded among all ranks and specialties—technicians, nurses, doctors and residents, laboratory directors, and department heads—in a wide variety of medical organisations. That role should include the identification and reporting of problems in real time, without punishment for this reporting.
Reliability professionals should be encouraged to communicate with one another about concerns pertaining to reliability, and its proxies. This would allow them to bridge gaps in communication across departments and specialties. They should also be encouraged to take ownership of organisational procedures—drafting and revising them from the perspective of experiential insights about what is likely to work in practice.
Reliability professionals should be consulted in appraising new technologies and proposed reorganisations and their likely impact on organisational reliability. All such innovations should be subject to a reliability test: does the proposed change make it easier or harder for medical personnel to understand what they are doing in real time? Does the change increase or decrease options for containing the consequences of single failures and coping with the unexpected?
The adaptations above are only suggested possibilities. Their implementation may be quite difficult. But a prior issue is for system designers, medical practitioners, and managers alike to understand that the reliability challenge for medicine is different from that of other industries. Medical organisations will need to craft their own model of reliability and safety based upon the special features of medical technology and tasks. To attempt to mimic reliability models from other industries without taking these differences into account would be a major medical failure in its own right.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.