Elsevier

Safety Science

Volume 50, Issue 6, July 2012, Pages 1422-1430
Safety Science

Are safety investigations pro-active?

https://doi.org/10.1016/j.ssci.2011.03.004Get rights and content

Abstract

This paper elaborates on the debate whether safety investigations are obsolete and should be replaced by more modern safety assessment approaches. Despite their past performance, in particular in aviation, accident investigations are criticized for their reactive nature and the lack of learning potential they provide. Although safety management systems are considered a modern method with a more prospective potential, they too are hard to judge by their quantitative performance. Instead of measuring both concepts along the lines of their output, this contribution explores the origin, context and notions behind both approaches. Both approaches prove to be adaptive to new developments and have the ability to shift their focus towards learning and cognition. In assessing their potential, accident investigations prove to cover a specific domain of application in the risk domain of low probability and major consequences, fulfilling a mission as public safety assessor. In order to make optimal use of their analytic and diagnostic potential, investigations should mobilize more complex and sophisticated scientific theories and notions, in particular of a non-linear nature. Consequently, they are neither reactive, nor proactive, but provide a specific approach to safety issues.

Highlights

► Differences between accident investigation and safety management. ► Their role as niche tools in safety enhancement. ► Their relation to non-linear decision making.

Introduction

Over the past two decades, the concept of ‘investigation’ has seen a broadening of its applications beyond the scope of crime scenes and transportation accident investigations. New organizational models and legal institutions have been created to further disseminate the concept of independent investigations worldwide, while new missions have been added to existing investigation agencies. Simultaneously, the concept of single event investigation is criticized for its lack of statistical relevance and low cost-effectiveness. Moreover, safety boards should not provide a learning potential due to their reactive nature or lack of an adequate learning methodology (Ale, 2003, De Bruijn, 2007). Even in aviation, safety investigations are criticized, despite their long lasting performance and proven value. Investigations should have become obsolete and should be replaced by more ‘modern’ concepts.

In contrast, ‘modern’ concepts, such as safety management systems, audits and quality assurance claim to be proactive. The dichotomy created by such a ‘pro-active versus reactive’ contradiction, cannot be valued because there are no criteria for how to measure the success of an investigation apart from the follow-up rate of its recommendations. Moreover, safety management systems too do not have a proven track record based on criteria which facilitate a comparison with other safety enhancement approaches (Hale, 2006, Swuste, 2009, Johnson, 2007).

In order to facilitate a comparison between both safety enhancement approaches, characteristics of both these approaches should be established and the context should be assessed in which these instruments could develop.

Safety investigations have a history of decades in aviation. In analyzing the historical development of safety investigations in aviation, three phases can be identified:

First, a technological phase in which the notion of technical failure was dominant. After the Second World War, as a flywheel for progress in aviation, the level of technical harmonization was selected focusing on navigation, communication and reliability (Freer, 1986, Freer, 1994). Many resources had to be invested in improving the technical reliability of the aircraft, because new technologies were in their infancy, causing teething troubles in various areas. New technologies involved the introduction of pressurized cabins, jet engine technology, radar and all-metal airframes. In order to keep public trust in the aviation industry, a common and public process of timely learning without allocating blame was deemed necessary. This system has been successful for 60 years, solving many knowledge deficiencies in aviation. This development was supported by a first series of accident modeling, starting as early as 1928. The US Army model derived from Thorndyke in 1951 can be considered the first epidemiological/stochastic model, reinforced by Haddon, Suchman and Klein’s classic book on accident causation in transport in 1964.

Some examples of aviation show case investigations after the Second World War are:

  • the De Havilland Comet initiated research into metal fatigue and crack propagation

  • the Tenerife disaster initiating human failure research and crew resource management

  • de crashes at Mt Erebus and Tenerife lead to victim identification

  • several accidents in the USA lead to family assistance and trauma care for survivors

  • the Boeing 747 crash in the Bijlmermeer caused the development of an external safety policy and integral airport safety management system.

A second phase in accident investigation started at the beginning of the nineties when independence from state interference was deemed necessary. This development focused also on multi-modal safety boards in order to learn from each other. The initiative for this development is taken in the USA, where in 1967 the National Transportation Safety Board becomes the first independent multi-modal investigation agency in the world (Benner 2009). Based on a visionary approach, the concept of multi-modal and systemic learning was developed, supported by arguments of economy of scale, critical mass in investigative resources and organizational efficiency. These more pragmatic arguments have played a role in particular in smaller countries like the Netherlands. In addition to technological issues, operational issues emerged in investigation practices, dealing with human error and organizational failure (ETSC, 2001).

A third phase in accident investigation has emerged over the past decade due to a breakthrough of independent investigations in the public eye which occurred after a series of major events outside the transportation sector, such as with disco fire in Göteborg in Sweden, the explosion of a firework storage in Enschede and disco fire in Volendam in the Netherlands (Kahan, 1998, Stoop, 2004, Stoop, 2009).

Transportation Safety Boards now face new missions, dealing with public trust, serving as a public safety assessor, support to victims and relatives in taking care of family assistance and focusing on rescue and emergency services in the aftermath of major accidents (Roed-Larsen et al., 2005). Consequently, governance and policy making issues emerged on the agenda of investigation agencies by stating: Independent Accident Investigation; a citizen’s Right and Society’s Duty (Van Vollenhoven, 2006). This phase characterizes the transition from accident investigation into safety investigation: a shift from event analysis towards systems analysis.

The first attempts in industrial society to explore human behavior go back to the 19th century. The can be attributed to La Mettrie. With his ‘L’Homme machine’ he introduced the metaphor of man as an unpredictable machine. According to the mechanistic concepts of his time, the human body was described as complex clockwork which could perform certain tasks based on pre-described movements. Later medical and behavioral scientific research, focused on how this ‘machine’ behaved under physical and mental loads.

The first generation of accident causation models as derived by Heinrich, referred to accident analysis by metaphors, such as the Iceberg Principle and Domino Theory. Bird and Loftus applied a linear causality, while Kjellen introduced the deviation concept. The MORT approach explicitly put the responsibility for risk control at the corporate level. Multi-causality was introduced by Reason, defining accident as an interaction between latent and active failures, and in order to avoid such interaction, a pro-active involvement of top management. Based on attribution theory, Hale and Glendon were concerned about how people process information in determining the causality of events. They focused on the non-observable elements of the system: perceptions and decisions. After Wildavsky developed the concept of risk as a social construct, Reason developed his model on organizational accident causation. A next step was taken by Hollnagel who identified the system as the full context in which errors and accidents occur. A gradual development of accident modeling shows three generations of human error modeling, from a sequential accident model, via human information processing accident models towards systemic accident models (Katsakiori et al., 2008). The evolution expands the scope of the investigation from sequencing events towards a representation of the whole system (Roelen et al., in press). In practice however, such accident modeling based on the Reason model proved difficult to apply, resulting in an increasing amount of varieties and simplifications (Sklet, 2004).

Most of the models restrict themselves to the work level and technological systems. Sklet concludes that this means that investigators focusing the government and the regulators in their accident investigation to a great need to base their analysis on experience and practical judgment, more than on the results from formal analytical methods. Much of the accident data are conceptually inadequate and flawed because of the inadequacies of underlying accident models in existing programs (Benner, 1985). Due to these pragmatic objections, during the conduct of an investigation, the limitations and mutual dependence between causation model and investigation methods should be explicitly taken into account. (Kletz, 1991, Sklet, 2004, Katsakiori et al., 2008).

Finally, such modeling and accident phenomenon perceptions do not comply with the needs of investigators: a translation of human error models to practical investigation tools is still in its early phase of development (Benner, 1996, Strauch, 2002, Dekker, 2006). Investigation methods should support the visualization of the accident sequence, providing a structured collection, organization, and integration of collected evidence, identification of information gaps in order to facilitate communication among investigators (Sklet, 2004, Benner and Rimson, 2009, Braut and Nja, 2010).

Creating a second generation of safety management systems, Rasmussen takes this modeling issue one step further (Rasmussen, 1997). He discriminates stable conditions of the past against a present dynamic society, characterized by a very fast change of technology, the steadily increasing scale of industrial installations, the rapid development of information and communication technology and the aggressive and competitive environment which influence the incentives of decision makers on short term financial and survival criteria. In answering the basic question: do we actually have adequate models of accident causation in the present dynamic society, he states that modeling is done by generalizing across systems and their particular hazard sources.

His risk management concept is a control structure, embedded in an adaptive socio-technical system. Since decisions made in a complex and dynamic environment are not only rational and cannot be separated from the social context and value system, a convergence occurs of the economist concept of decision making, the social concept management and the psychological concept of cognitive control. Modeling task sequences and errors is considered not effective for understanding behavior. One has to dig deeper to understand the basic behavior shaping mechanisms. Rather than striving to control behavior by fighting deviations, the focus should be on making the boundaries explicit and known and by giving opportunities to develop coping skills at boundaries.

Task analysis focused on action sequences and occasional deviation in terms of human errors, should be replaced by a model of behavior shaping mechanisms in terms of work system constraints, boundaries of acceptable performance and subjective criteria guiding adaptation to change. System models should be built not by a bottom-up aggregation of models derived from research in the individual disciplines, but top-down, by a systems oriented approach based on control theoretic concepts. A convergence of research paradigms of human sciences should be guided by cognitive science concepts (Italics added).

According to Rasmussen, the fast pace of technology has lead to the introduction of the ‘general due clause’ and has enhanced the regulator ability to protect workers. Each employer ‘shall furnish to each of his employees a place of employment which is free from recognized hazards that may cause death or serious harm’. By stating safety performance objectives, safety becomes just another criterion of a multi-criteria decision making and becomes an integrated part of normal operational decision making. In this way, the safety organization is merged with the line organization. This requires an explicit formulation of value criteria and effective means of communication of values down through society and organizations. The impact of decisions on the objectives and values of all relevant stakeholders are to be adequately and formally considered by ‘ethical accounting’ (Rasmussen, 1997).

Design and operation should be based on reliable predictive models of accident processes and probability of occurrences. A full scale accident then involves simultaneous violations of all the designed defenses. The assumption is that the probability of failure of the defenses individually can and will be verified empirically during operations even if the probability of a stochastic coincidence has to be extremely low. Monitoring the performance of the staff during work is derived from the system design assumptions, not from empirical evidence from past evidence.

Rasmussen identifies a limited series of hazards: loss of control of large accumulations of energy, from ignition of accumulations of inflammable material and loss of containment of hazardous material. When the anatomy is well bounded by the functional structure of a stable system, then the protection against major accidents can be based on termination of the flow of events after release of the hazard. When particular circumstances are at stake, the basis for protection should be on elimination of the causes of release of the hazard.

Preconditions and assumptions should be explicitly stated in a Probabilistic Risk Assessment. In this view, Rasmussen states, fortunately it is not necessary for this purpose to predict performance of operators and management. When a plant is put in operation, data on human performance in operation, maintenance and management can be collected during operations and used for a ‘live’ risk analysis. Thus, predictive risk analysis for operational management should be much simpler than the analysis for a priory acceptance of the design. Such performance data can be collected through other sources than accident investigations; incident analysis and expert opinion extraction may compensate for the lack of abundant accident data.

This second generation of safety management systems comprises of several functional elements: training and communication should provide awareness, adequate feedback and monitoring should facilitate assessment of the safety performance and the company’s ability to learn and improve its management system (Rasmussen and Svedung, 2000). In his overview of the merits of safety management systems, Swuste concludes that scientific proof for the importance of all these elements is sparse. There is only limited empirical evidence that links management and horizontal characteristics to safety performance. Major disasters investigations indicate inadequacies in these systems, deficiencies in hazard recognition, an inability to focus on accident scenarios in decision making and complacency after a long accident free period (Swuste, 2009). Vaughan’s analysis on organizational failures reveal that it is not only difficult for organizations to learn, but that they also are resistant to lessons from the past (Vaughan, 1999). Research indicates that the learning capacity of safety management systems in process industry and nuclear power plants is further hampered by the fact that there is a gap between the designer’s assumptions on the system’s functioning and experienced feedback on hazards and risks from operational systems. Workers are not in the loop and cannot respond rapidly and adequately when required (Swuste, 2009).

At present, the contours of a third generation of safety management concept are dawning: the resilience concept (Hollnagel et al., 2008a, Hollnagel et al., 2008b).

This concept defines failure as a normal phenomenon, being the flipside of success, since they are both the outcome of normal performance variability. Hollnagel defines a resilient system by its ability effectively to adjust its functioning prior to or following changes and disturbances so that it can continue its functioning after a disruption or a major mishap, and in the presence of continuous stress. Therefore, four essential abilities are identified: to cope with the actual, to flexibly monitor the critical, to anticipate the potential in dealing with disruptions and their consequences and to learn from the factual. An increased availability and reliability of functioning on all levels will not only improve safety, but also enhance control (Hollnagel et al., 2008a).

This concept differs fundamentally from the second generation of safety management systems. The allocation of scarce safety resources is not exclusive designated to a specific entity in the organization – such as a safety management system at the corporate level or rescue and emergency resources in public safety – but is made flexible and creates discretionary competences at the operator level to adapt their responses in different system states. Resilience facilitates a merger of staff and line responsibilities for safety, reducing redundancy in a modern concept of ‘mean and lean’ and ‘cheaper, faster and better’ resource management. Originating from the military, this resilience concept facilitates evolutionary change and provides improved business continuity in crisis situations. The concept however requires a series of new system characteristics in order to provide the necessary transparency on the actual and factual functioning of the systems. Accident investigations may serve as a problem provider for systems development by providing a timely transparency in the factual functioning of a system. Empirical information and operational experiences simultaneously provide feedback for a systems redesign and engineering of safety enhancement strategies (Stoop and Dekker, 2008).

Some doubt the concept (Roelen et al., in press). Such more advanced models should not connect to the current practice of safety data collection and analysis. It should be questionable whether safety managers can be made aware of new insights and modeling approaches: ‘The air transport industry is quite conservative, regulatory changes are very slow and Reason’s Swiss Cheese concept is still relatively new to most people within the industry without going a step further. It makes more sense to develop an event chain model that fits current practice rather than to develop models with a completely different concept, however correct these concept might be’ (Roelen et al., in press, Italics added). In accident modeling, the linear sequence of events is a main disadvantage in preventing relative simple major accidents. Such modeling however seems to suffice. The recently developed resilience engineering of dynamic modeling is questioned on its merits. Only time will tell whether the drift into failure will remain another nice metaphor or whether it can be turned into a practical tool offering new insights into risk prevention as Hale concluded in his valedictory lecture (Hale, 2006).

Developing new risk management models at the level of the aviation industry are derived from the assumption that aviation could benefit from the nuclear power industry and the process industry, which have demonstrated the benefits of a unified – probabilistic-model. The question however is whether this assumption complies with the specific characteristics of the aviation industry with its technology, multi-actor involvement and continuous, open access network configuration.

When many actors take part in the same complex and dynamic environment, they all must achieve consensus in order to share their learning (ETSC, 2001). Such self-organising and learning behavior of each of the participants should guarantee a continued safe performance of the system.

Rosenthal wonders ‘If we are not capable to come to an agreement on the causes of disasters, should we restrict ourselves to dealing with the consequences or the resilience capability after a disaster? If we cannot analyse the complex reality and cannot achieve consensus, are we deemed to restrict ourselves to a battlefield of subjective opinions’ (Rosenthal, 1999)? By doing so, the debate on safety needs arbitration between contradictions. Do safety investigations provide a solution in their role as a public safety assessor?

Some dispute the usefulness of such a role for accident investigations. This is not only an academic but also a practical debate, which originates from the beginning of the process industry. In these days, safety experts in the rapidly developing process industry wondered whether the sector was suitable for a safety approach which was also applied in more conventional sectors.

According to Frank Lees it was necessary to distinguish between occupational accidents on one hand – which were quire common, but not representative for the modern process industry – and on the other hand a class of accidents which were critical for the business continuity of the company and its primary processes (Lees, 1960). Because this second class was unacceptable and very uncommon, it should have little learning potential. Accident investigations therefore should be replaced by either modeling and incident analysis or other performance indicators which were more common. In this view, the top management of a company should have the responsibility for quantifying the risk and for developing a separate safety management system.

Lees formulated his doctrine under the conditions which were valid at that time:

  • He dealt with fixed site, stand-alone equipment

  • production processes are almost completely automated

  • there is one central management taking all safety critical decisions

  • decision making is restricted to the private company

  • the state of technology is assumed constant

  • only rational, mathematical decision making models have scientific validity.

This Lees doctrine applies a design concept in which humans are fallible factors and eliminated by design from the system by automation. Their remaining role is restricted to complying with rules which have been imposed by management. There is no room for the operator in taking critical decisions. Learning in practice is replaced by modeling and by a centralized assessment of all interests by a single party; the corporate management. This Lees doctrine has become the role model for safety management systems as postulated by Rasmussen. The assumptions of Rasmussen in his modeling however raise questions about the generalizability of the concept: is it valid to bridge the differences across the industrial domains by creating this general model with a limited category of hazards without a relation to the technological state of the art and a reflection on the role of the human operator?

Section snippets

Differences across industrial domains: the aviation case

Transport systems apply completely different design principles than the process industry or nuclear power supply. First, at the control level the system is designed as a support for the operator; it is a human centered design with delegated responsibilities. Second, there is a strict separation between the planning and control level with respect to capacity management and traffic control. It is a distributed responsibility. Finally, there are differences in concepts of the human operator with

New challenges and contexts

Are Rasmussen’s assumptions still valid if the concept is expanded from the process industry and nuclear power supply to aviation and from the corporate level into higher systems levels? Do these assumptions create new dilemmas?

Are safety investigations proactive?

In selecting risk decision making tools, we should take into account the goal and validity of the available tools rather than stigmatizing tools as ‘obsolete’ or ‘modern’.

Conventional rational risk assessments are part of a wider spectrum of the way in which societal risk acceptance can be expressed. Group risk decision making is the linear part of the spectrum in which acceptance decreases proportionally with the number of victims. This part has a better fit for mathematical modeling than the

Conclusions

In answering the question – Are safety investigations pro-active? – the answer is a combined yes and no. Yes, because of the indispensable feedback that is required from empirical data and operator experiences in order to develop new knowledge and insights in the performance of complex systems. Such knowledge will be incorporated in the next generation of design and operations. A no is a formal correct answer, because this approach is complementary to other approaches in enhancing safety of

Acknowledgement

The authors thank W.R. Beukenkamp for his permission to publish the diagram on the non-linearity of value of life in risk decision making.

References (50)

  • Braut, A., Nja, O., 2010. Learning from accidents (incidents) – theoretical perspectives on investigation reports as...
  • De Bruijn, H., 2007. Een gemakkelijke waarheid. Waarom we niet leren van onderzoekcommissies. NSOB,...
  • S. Dekker

    The Field Guide to Understanding Human Error

    (2006)
  • Edwards, E., 1972. Man and Machine: Systems for Safety. Loughborough: University of Technology, United...
  • ERTMS, 2007. Een onafhankelijk onderzoek naar nut en noodzaak van de aanpassing van het HSL-beveiligingssysteem ERTMS....
  • ETSC, 2001. Transport Accident and Incident Investigations in the European Union. European Transport Safety Council....
  • ETSC, 2005. Europe and its road safety vision-how far to zero? In: The 7th European Transport Safety lecture. Prof....
  • Freer, R., 1986. The Roots of Internationalism. 1783–1903. ICAO Bulletin 41...
  • R. Freer

    ICAO at 50 years: riding the flywheel of technology

    ICAO Journal

    (1994)
  • Hale, A., 2006. Method in Your Madness: System in Your Safety. Valedictory Lecture, 15th September 2006, Delft...
  • Hendrickx, L., 1991. How Versus How Often. The Role of Scenario Information and Frequency Information in Risk Judgement...
  • Hollnagel, E., Nemeth, C., Dekker, S., 2008. Remaining Sensitive to the Possibility of Failure. Resilience Engineering...
  • Hollnagel, E., Pieri, F., Rigaud, E., 2008. In: Proceedings of the Third Resilience Engineering Symposium. October...
  • Johnson, K., 2007. Key Note Address to ITSA Meeting 16th May,...
  • Kahan, J., 1998. Safety Board Methodology. In: Hengst, Smit, Stoop 1998, Second World Congress on Safety of...
  • Cited by (0)

    View full text