Organization of event reporting data for sense making and system improvement
- Correspondence to: H S Kaplan MD College of Physicians and Surgeons, Columbia University, Harkness Pavilion HP417, 622 West 168th Street, New York, NY, USA;
Feedback and demonstrable local usefulness are critical determinants for adopting event reporting by an organization. The classification schemes used by an organization determine whether an event is recognized or ignored. Near miss events, by their frequency and information content concerning recovery, merit recognition. “Just” cultures are learning cultures that provide a safe haven in which errors may be reported without the fear of disciplinary action in events without reckless behavior. As event report databases grow, selection and prioritization for in depth investigation become critical issues. Risk assessment tools and similarity matching approaches such as in case based reasoning are useful in this regard. Root cause analysis provides a framework for the collection, analysis, and trending of event data. The importance of both internal and external risk communication as valuable reporting system components may be overlooked.
If form follows function, the processes and structures constituting event reporting should aid in collecting and organizing data in such a way as to facilitate sense making and system improvement from both single events and from events in aggregate.
For an organization to adopt event reporting rather than to simply comply with its requirement, there must be timely and effective feedback and demonstrable local usefulness. Unfortunately, compliance with the reporting requirements of an external regulatory agency is often perceived as the sole purpose of the system. In such instances there is slim chance of adoption since the use of event data is limited to producing an external report. In addition, since scant attention is paid to events outside the narrow classification boundaries of the regulator, a number of potentially informative events may not even be recognized.
This paper discusses the elements and structure necessary to translate event reporting data into actionable knowledge.
“People only see what they are prepared to see” R W Emerson, 1863
Classification and definition
Detection, the first phase of event reporting, is sensitive to the way events are defined and classified. “You see what you expect to see. You see what you have labels to see.”1 Event classification affects the availability of information for learning: organizations tend to disregard events outside their classification schemes.2 In a mail survey of 53 hospital transfusion services in which 77% responded, 91% of technologists said they would report mistakes resulting in patient harm, while only 27% would report mistakes that they caught and corrected themselves.3
Classification and definition trigger information processing routines that channel the decision maker’s attention.4 The following proposed model of this process underscores the critical triggering role of event definition:
Event definitions trigger → Routines for data gathering and analysis which produce → Information available to support → Decision making
While being mindful that definitions may limit what is classified as an event, clear standardized definitions are obviously necessary to share data and benchmark against others, as well as to study oneself over time.
An event without harm is one in which an act of omission or commission may have had the potential for harm but, through luck or a robust physiology, had no ill effect on the patient.
A near miss is defined as an act of commission or omission that could have harmed the patient but was prevented from completion through a planned or unplanned recovery.
No harm and near miss events
Events without harm and near miss events provide a rich source of useful information. The importance of including them within an error classification scheme derives from their similarity to, and greater frequency than, events with harm. Their study affords some sense of the relative proportions of the categories of failures, and therefore provides better insight into system vulnerabilities than the often atypical misadventure.5 No harm and near miss events also carry less “baggage” in terms of repercussion to the reporter and, as such, may receive more open and honest investigations. This can lead to a more productive learning experience.
Significantly, near miss events allow us to learn why something didn’t happen and, in particular, provide a means to study human recovery. They allow us to recognize the action(s) taken to prevent harm—or to prevent the event from escalating to the point of harm—and to study rescue. Focusing on recovery and rescue as well as on failure brings about an important change in thinking about safety. It represents a shift from an exclusive emphasis on prevention to an equally important emphasis on the promotion of recovery. As van der Schaaf and Kanse have observed: “what is actually desired is the prevention of harm, not errors per se”.6 Consistent with this, data capture should include near miss and no harm events and focus on the factors supporting recovery and rescue. These factors may be organized into categories that reflect whether they contribute to (a) error or failure detection, (b) error localization or diagnosis of what went wrong, or (c) the actual correction of the problem.7
Detection of events by an organization is also delineated by its culture. Organizational cultures have been described as reflecting characteristics of three types: pathological, bureaucratic, and generative.7 The “shoot the messenger” approach of a pathological culture obviously inhibits reporting. The bureaucratic culture may focus on the rule that was violated or on the need for a new rule with little resultant learning or consideration of the generalizability of the information for broader system improvement. This attention to opportunities for the broader application of knowledge is characteristic of the generative (or learning) culture. It also goes beyond a focus on single events looking for patterns in aggregate data.
The term “just culture” has been applied to describe learning cultures that provide a safe haven in which errors may be reported without the fear of disciplinary action in events where there was no intent to harm. Accountability and professional responsibility for reporting are maintained, but the “bright line” for punitive action is limited to reckless behavior. Reckless behavior is defined as a conscious disregard of an unwarranted risk to a patient.8
If the culture is supportive of reporting and the classification scheme is not overly narrow, the rate of event reporting may be quite high. This is especially true if reporting is easy, and when timely feedback assures reporters their efforts do not end up in an administrative black hole. A change in reporting culture to one with these supportive characteristics may result in as much as a tenfold increase in reporting rates.9 As the number of reports multiplies, the ability to triage events and to select their appropriate level of investigation becomes critical.
Even if the resources to handle the increased volume of reports were available and the effort worthwhile, the prioritization of events for analysis and investigation becomes a limiting factor in the effectiveness of the analysis. It has been said that, immediately on the occurrence of an event there exists the naked truth and, in a few days, the event becomes dressed in a full suit of clothes. Studies of human recall show that events are not snapshots but segments of recollection reconstructed upon demand with gaps that may be filled by bias or other extraneous information.10 Also, because memory degrades in time, particularly over the first few days after an event, there is an additional advantage to establishing a priority for timely in depth investigation including performance of root cause analysis.
In general, events with harm, and sometimes new or unique events, are given a priority for investigation. This includes harm to a patient as well as organizational harm (financial loss, loss of reputation, and/or legal ramifications). However, near miss events and events without harm (but with the potential for harm) may require prioritization as well. For this significantly larger group it is useful to establish a means by which prioritization can be carried out. Use of a risk matrix has proved helpful for such purposes in a number of hazardous settings.
The risk matrix framework provides an objective structure for subjective judgments of relative risk. The two matrix factors are the probability of recurrence of the event, and the probability of severe harm should the event recur. The point of matrix intersection or the product of multiplying the two probabilities establishes a level of priority for the event—or at least aids in prioritization—since it is ultimately the reporting system “operator” who defines the level of investigation and order of priority.
If data are captured in a format that allows ready retrieval and manipulation and lessens the burden on the reporter, there is the dual advantage of easier reporting and a more dynamic database.
Use of forms with check boxes and limited narrative lessens reporting time and complexity and enhances subsequent data retrieval/analysis. Each event report represents an occurrence that can be characterized by:
indexes: descriptive features of a situation (surface, in depth or both);
values: symbolic (“technician”), numerical (“103 rpm”), sets (Monday, Tuesday, Wednesday), other (text, images, ...); and
weights: descriptive significance of the index.
Merely entering data into an electronic database does not secure a sufficiently greater accessibility of the data or significantly enhance its usefulness compared with that of data stored in a filing cabinet. As a database grows, memory regarding events may be limited to the most recent or the most dramatic events, with little else readily accessible. There needs to be a method for converting data into meaningful information and, subsequently, information into actionable knowledge.
One approach to avoiding an electronic “data attic” is the use of similarity matching derived from case based reasoning (CBR). This is a technique of artificial intelligence that assists in solving problems based on previous experiences. CBR is the ubiquitous tool employed by help desks in which similar cases are retrieved to look for approaches useful in resolving past similar cases. We can use the CBR concept of “similarity” to identify related reports, report clusters, and frequencies.
In contrast to database queries in which we look for exact matches within the fields of the database, CBR looks for similarities or degrees of matching rather than exact matching. The similarity function identifies reports in the database that are most related to a selected or new event. The basic similarity function uses a vector of weights that corresponds to the vector of variables in a selected event report. The means by which this matching is carried out is through expert weighting of the importance of a field to the matching scheme.
This ability to look for similar cases and thereby not lose cases in the database is of obvious importance for the analysis of aggregate data. Equally useful is its effectiveness in evaluating the relative cumulative significance of new cases. It allows the system operator to be comfortable with limited investigation of an individual case because, once entered in the database, it is not lost but is reliably retrieved when a similar case is encountered. Cases that, by themselves, do not warrant in depth investigation and analysis may, in aggregate, be found to do so. For example, events that do not pose a significant threat of harm to a patient may require so much reworking or correction in aggregate that they compromise limited staff resources and so merit further study and possible intervention.
Initial studies have compared weighted coded field matching against the use of narrative alone and narrative coupled with coding, and have verified a useful retrieval capability by weighted coded fields alone compared with cases matched by domain experts. Narrative text adds an increased level of specificity at the cost of sensitivity. Narrative has some limitations in aggregate analysis but does capture context and nuance. Perhaps expanded coding may narrow this advantage of text further.
DESCRIPTION AND CAUSAL CLASSIFICATION
Root cause analysis and causal trees
A useful way to display information derived from root cause analysis of events warranting an in depth investigation is the use of causal trees11 which are based on fault trees. Causal trees provide a graphic display of the logical relationship of the various antecedent actions and decisions identified in an event. They aid in configuring an investigation and its findings by displaying the chronological evolution of antecedents which led to the top or consequent event (what was discovered to have happened/almost happened). Recognizing the value of recovery, causal trees have both a failure side and a recovery side. As in a fault tree, the causal tree proceeds downward (backward in time) from the consequent event to the antecedent events that preceded it. The tree analogy is appropriate since the emphasis is not only to go backwards chronologically, but also to establish the lateral branches of the tree. The operative questions in the analysis are: “and what else?” repeated in order to thoroughly explore each horizontal level of antecedents, then “why?” asked many times to get the next antecedents proceeding down the tree to the final antecedents. The stopping rule in this process is when, in asking “why “for an antecedent, the answer is outside the span of control of those conducting the causal analysis, or to go further would not shed additional light upon the event. The result of this exercise is a “picture” of the event in its entirety, from discovery (consequent event) backward in time to the occurrences (antecedent events) that led to it.
Furthermore, if the elements of the event reporting system include a comprehensive event coding system, a “box” on the causal tree can be coded. This allows for comparison and categorization of events within the database based on both consequent and antecedent events. Recovery steps can be coded, compared, and trended in a similar manner.
The quality of an investigation is critical in building a causal tree. However well constructed, the tree is only one of a number of possible reconstructions of the event, and reflects the investigative group’s collective biases and, in particular, hindsight bias. Hindsight bias reflects the knowledge gained by the investigator after the event.
As long as we do not use hindsight to affix blame, it has some distinct benefits. For example, since error is context dependent and the context is fixed, we usually have a well defined starting point for an investigation.12 Beyond this, it has an important corrective cognitive function since it is a means by which people revise their incorrect assumptions.13 Secondly, in a meta-analysis of 122 studies, the effects of hindsight bias were found to be significantly less than anticipated, especially in circumstances where the investigators had experience and knowledge in the field.14 Finally, and most importantly, the reconstruction of an event, albeit biased, often provides a useful plan for moving forward even if it makes for “lousy history”. This follows the principle that truth may arise more quickly from error than from confusion.1
An important part of causal analysis is differentiating the “what happened” from the “why did it happen”. Describing the “why” occurs at the ‘stopping point’ of causal tree development, at the lowest level of the tree. This is where causal codes can be ascribed to the earliest antecedent events. An important issue is how finely granular the causal coding scheme should be. The argument that very finely granular coding describes the event most uniquely is probably true. It is, however, a potential weakness when studying events in aggregate and, in particular, in attempting to identify trends in causal data. If an event is described uniquely at the causal level, attempts at trending this kind of causal data are generally not fruitful. The coding scheme developed by van der Schaaf11 and used in our event reporting system, MERS (Medical Event Reporting System),9 has only 20 codes. It has proved to be a useful tool in sense making of individual and aggregate causal data. The causal codes are divided into three categories: latent (technical, organizational), active (human factors), and other (including patient related factors). When applying these root cause codes to an event, it is recommended to first decide if an antecedent event has latent causes before considering human factors. Applying human factors first often results in other factors being overlooked.
The “what” and “why” components of event description can be captured by coding but, as previously discussed, narrative is the best means of capturing the context and nuances of an event which are often quite important. A linkage between narrative and coding in which the narrative can serve to amplify the coding therefore has distinct benefits.9
The expansion of coding schemes may also serve to capture context and nuance. For instance, a coding system that includes contributing factors would allow for the systematic collection of information on underlying concerns such as communication and handoff issues, staff fatigue, distractions, or even poor lighting. Once coded, these issues can be tracked within the database. Furthermore, allowing reporters to choose from this type of code list gives them the opportunity to describe additional influences that came into play during the development of the event (in addition to the chronological event description).
Event reports and their causal analysis help to provide the impetus for system change directed at correcting underlying causes rather than the symptoms themselves. Albeit important for effective change, single event reports may also bring about pressure for continual and possibly excessive change. Deming15 has recognized that “tampering” occurs when too many “corrective” system changes are made and cause system instability. In addition, a change to the system intended to correct a target risk may bring about unintended consequences.16 Use of a risk matrix or decision table17 for taking action based upon a single event provides a framework for deciding whether an event by itself warrants proposing a change, considering a change while gathering further data by a focused audit, or merely monitoring the database for similar occurrences. Monitoring may be considered to be an “action” when the database is accessible, searchable, and dynamic. For instance, similarity matching or CBR may be used to find similar cases in the database each time a new event is entered into the system.
Pointers for future research
Differentiating noise from signal: defining characteristics of events warranting further detailed investigation.
Use of near miss data for the study of recovery.
Determination of best practices appropriate for dissemination of information to targeted audiences (public, regulators, legislators).
COMPUTATION AND INTERPRETATION
The generation of information from aggregate data occurs during the computation and interpretation phase of event reporting. Although causal analysis and the modelling of concerning, new, or unique events are essential to sense making, the potentially valuable analysis of aggregate data is often limited to a minimal summary report. Ideally, aggregate event data can be organized in a number of ways, including frequency distributions for selected date ranges based on risk, root cause codes/categories, and event codes. In addition, date ranges coupled with local rate data (e.g. nursing station specific) provide trend information which is useful in directing safety efforts. The generation of intelligence from aggregate data includes monitoring for weak spots in the system, evaluating the effectiveness of changes to the system, and identifying event clusters. Using the method of “conjunctive query”, the database can also be searched for events meeting a broad range of user specified parameters. Any combination of fields may be selected as search parameters, resulting in an output of events that are exact matches within defined boundaries, and allowing for specific and focused decision making.
The organization of data in support of both internal and external risk communication is critical to the effectiveness of event reporting. The role of risk communication in event reporting is often overlooked, however, and its value disregarded. As discussed below, risk communication is more effective when the receiver is also considered a sender of information.
Internal risk communication underlies early and effective feedback to staff. Sharing the identification of risks, the rationale behind resultant procedural changes, and mutual identification of at risk behaviors and the factors encouraging them reinforces the engagement of staff. Routine actions frequently performed without a conscious active engagement are particularly vulnerable to unanticipated changes in the operational environment.18
The organization and communication of event reporting data should provide for sense making and learning through support of the three “Ms”—modelling, monitoring, and mindfulness.19 The internal risk communication of events as exemplar cases helps increase awareness of risks and re-establishes the active “mindfulness” characteristic of high reliability organizations (HROs).1
Effective risk communication with the external environment is of at least equal importance. Since reports of “medical error” are not viewed as a good thing, the idea that “the more errors reported, the better the reporting system” is counterintuitive to the non-expert public. Its effective explication demands insight gained only by eliciting the public’s beliefs and understanding. It recognizes the critical role of two way dialogue rather than a simple one way communication to inform.20
The classification schemes defined by an organization determine whether an event will be recognized or ignored.
Timely feedback and local usefulness of reporting systems determine user adoption or mere compliance.
An organizational culture that supports learning and does not punish employees for errors (unless there is reckless behavior) is a necessary component of an effective event reporting system.
Critical elements of data organization that go beyond report generation are risk assessment for prioritization, similarity matching for accessibility of data, and standardized classification and coding for analysis and trending of data.
A particular challenge is effective risk communication with regulatory agencies, and understanding their perceptions of the uses of event reporting information and their obligations in communication to the public. Of special concern is the dampening effect of the inappropriate use of event reporting data as a public report card for a specific healthcare facility or an individual practitioner.
Finally, event reporting should be organized to monitor whether system changes directed at correcting target risks have improved the system or have caused even less desirable and unintended contravening risks.
When considering the implementation of an “event reporting system”, it is all too easy to develop a simple data collection form which asks questions that require narrative responses, and then to file the reports in a drawer or enter them into a spreadsheet. Subsequent report generation is limited and reflects the relative inaccessibility of the data. While this meets the requirements for having a system, it is does not provide the means necessary for making sense of the data that leads to system improvement.
Factors determining the success and usefulness of an event reporting system range from the culture of an organization, to the provision of standardized methodologies, classification systems, and tools for analysis, to the feedback given to staff. It is also essential to have a means (such as CBR) to monitor the information in the database so that it does not become an “electronic attic”, and for monitoring operational changes made based on event data. Attention to these features will lead to staff becoming active participants in the event reporting system and subsequent process improvement, rather than simply complying with the mandate to submit reports.