Lessons learned in blood: We keep failing to prevent man-made disasters

By Catherine H. Tinsley

Despite decades of research on how to prevent organizational disasters, they still occur all too frequently. To wit:

On April 20, 2010, the Deepwater Horizon oil rig, owned by Transocean and leased by British Petroleum (BP), exploded, killing 11 crewmembers and injuring several more. Millions of gallons of oil spilled into the Gulf of Mexico, with devasting consequences for the environment and significant economic impacts on the fishing and tourism industries. Just three weeks earlier, at Massey Energy’s Upper Big Branch mine, a coal dust explosion killed 29 of the 31 miners at the site. It was the worst mining accident in the United States since 1970.

More recently, in June 2021, the Champlain Towers South Condominium in Surfside, Florida, partially collapsed, killing 98 people and injuring and displacing dozens of other residents. And this February, a Norfolk Southern train carrying toxic chemicals derailed near East Palestine, Ohio. The chemicals and combustible materials ignited, raising alarms about an explosion, forcing thousands to flee their homes, and possibly contaminating the air, soil, and groundwater.

These man-made catastrophes resulted in needless tragic deaths, contaminated environmental systems, cost millions of dollars in clean-up, and of course, damaged the public’s trust in both corporations and governmental regulators.

Although these large-scale failures may result in some corporate learning, most company leaders, politicians, and public bystanders agree that these are lessons learned in blood. It would behoove everyone if potential dangers were identified in advance. To this end, scholars and practitioners across a wide range of disciplines have proposed the most viable approach to prevent such catastrophes is to observe near-misses—small failures that could have resulted in larger failures had conditions been different—and use these smaller failures to identify and eliminate problems before they produce large failures.

The notion is that large-scale organizational failures are often created by the same antecedents that produce near-misses; the difference in outcome severity being attributable to random error—good luck in the case of near-misses and bad luck in the case of disastrous failure. The hope is that organizations can learn things from near-misses, such as insights about design flaws or unanticipated interactions between system components. Future disasters might then be prevented by eliminating errors, system design flaws, and other problems that can produce both near-misses and failures. Assuming a common genesis for near-misses and failures within a particular system means that organizations that ignore near-misses could be said to be incubating disasters.

An appreciation for learning from near-misses to avoid failure, and indeed even enhance organizational performance, has led to the adoption and use of incident reporting systems across many industries. These incident reporting systems collect and catalog information on any sort of anomaly or “less than desirable” outcome, and are widespread across the transportation, manufacturing, energy, and healthcare sectors. They are regarded as a foundational quality and safety improvement activity to which firms and industry groups devote considerable resources.

One of the best of these is the Federal Aviation Administration’s Aviation Safety Reporting System, used in US commercial aviation, which receives more than 100,000 incident reports per year of events with varying severity, all of which are read, codified, and cataloged. To be more specific, a pilot misunderstanding air traffic control and taxiing onto an active runaway has the potential for an accident, but in the near-miss, the situation is recognized and corrected before any collision occurs. In a medical context, a surgeon accidentally leaving a sponge inside a patient who is about to be sewn up can be a near-miss when a nurse notices it before the stitching begins.

The goal is for firms and industries to learn from all the incidents reported, so as to avoid the human and financial costs associated with future serious failures. Capturing and later analyzing these incidents can show systemic patterns of behavior or processes that may need to be changed. In the first instance, pilots may need to double-check their understanding before taxiing; in the second instance, counting sponges used in any procedure might become an instituted policy.

Unfortunately, my work over the past two decades has shown that certain characteristics of near-miss experiences thwart organizational learning. In some cases, near-misses may be ignored because of organizational culture and toxic leadership. But the reality is that humans may just be hardwired to overlook near-misses.

Naturally, hindsight provides a clear 20/20 vision, yet all the disasters mentioned at the beginning of this article appear to have been preceded by near-miss warnings. For example, a post-mortem on the Deepwater Horizon disaster issued by the National Academy of Engineering found evidence of several near-miss warnings that either BP, Transocean, or its contractors ignored. The rig had experienced near-miss “kicks”—which are mini-blow-outs that occur because the pressure of the gas pushing up is greater than the pressure downward. Though near-miss kicks do commonly occur, they are an indicator that gas is entering the well, which can create a dangerous condition. On the morning of the explosion, there had been abnormalities in critical tests indicating a possible influx of hazardous gases.

Similarly, the explosion at Massey’s Upper Big Branch mine was likely caused by high levels of methane, a mining byproduct. It is critical that this methane is properly vented, and indeed Massey had been cited and fined several times for violating this venting requirement. Near-misses occurred each day there were unsafe levels of methane but no spark to ignite it.

The collapse of the Champlain Towers South Condominium was preceded by a 2018 engineering report which found major structural damage to the concrete slab below the pool deck, waterproofing that was “beyond it’s (sic) useful life”, and pictures of several ominous cracks in the concrete columns. These cracks and other existing structural damage should have served as near-miss warnings for a potential collapse.

Finally, although the investigation of the Norfolk Southern derailment is ongoing, certainly past derailments have been an issue for the company.

So the question again is why these near-miss warning signs are ignored when they offer clear indications of errors, materials, or processes that need attention. Research on organizational learning makes clear that decision-makers should interpret failures as evidence that existing practices and routines are inadequate. Although searching for solutions and instituting new organizational routines is costly, failure should motivate organizations and leaders to become more open to change, deploy resources to search for solutions, and adopt new practices. Whereas experience with successes should stabilize organizational practices and routines, failures should stimulate innovation and experimentation toward novel practices and routines.

Organizational practices that stymie learning from near-misses

When organizations send a clear message that productivity is more important than safety, workers within these organizations will ignore near-miss warnings because they likely believe that addressing the warning will slow down work and negatively impact the bottom line. It is not difficult to imagine that this pressure to perform is behind many organizational accidents, such as the Deepwater Horizon oil spill (the rig was already behind production schedule) and the Massey Energy Upper Big Branch mine explosion.

In fact, the US Coast Guard’s report on the causes of the Deepwater Horizon disaster noted that Transocean had safety processes, such as a:

“Time out for Safety” (TOFS), which “occurs when an observation made by personnel requires the task be stopped for the purpose of addressing an unplanned hazard or a change in expected results. According to the Transocean Health and Safety Policy Statement, “Each employee has the obligation to interrupt an operation to prevent an incident from occurring.” Transocean, however, did not provide the onboard management with a risk assessment tool or other means by which to assess the risks arising from well conditions and the safety-related deficiencies onboard Deepwater Horizon. Not surprisingly, prior to April 20, no crew members took action to institute a safety time-out.

More pernicious organizational practices could invoke fear of reprisals if near-misses were reported to slow down production. For example, that same US Coast Guard report noted:

Transocean also did not create a climate conducive to such analysis and reporting of safety concerns. In March 2010, Transocean hired Lloyd’s Register, a classification society, to conduct a SMS Culture/Climate Review which included auditors conducting surveys at Transocean offices and vessels over a two-week period. The results indicated that “a significant proportion (43.6 percent) of the personnel participating in the perception survey reported that they worked with a fear of reprisal if an incident or near hit occurred…. At a company where employees fear reprisal for whatever reason and when there are significant costs associated with any unscheduled shutdown or delay of drilling activities, it is unlikely that the crew would report safety issues even if it identified risks.  

My research with colleagues has shown that when people believe the organizational culture values safety they are more likely to notice near-miss events, whereas when the organizational culture values risk and exploration, people are more likely to categorize near-misses as successes. That is, in the latter culture they differentiate clear failures that incur significant costs, but for near-miss incidents—even those that are clearly a product of good luck—they make no distinction between these incidents and successes that occurred without luck. Notably, this tendency to categorize near-misses as successes in risk-tolerant cultures were done by managers evaluating a subordinate’s decisions and outcomes. In other words, in these studies, managers’ disregard of near-miss incidents cannot be explained as a natural tendency to want to judge their own decisions and actions in a favorable light.

Cognitive reasons why near-misses may be ignored

That toxic cultures and leadership can encourage employees to ignore safety warnings embedded in near-miss incidents may seem conventional wisdom. Yet my research shows that, even in more enlightened workplaces, near-misses may be ignored or discounted because of two features of their outcomes. First, these failures issue forth small costs to the organization. Second, though their costs could have been larger, this probabilistic thinking often does not come naturally to people.

Imagine you are texting while driving—something you know is not only illegal but personally risky. You swerve a bit into the next lane, but there are no cars there, your car’s system beeps to get your attention, and you gently steer yourself back into your lane. You have just experienced a near-miss—the outcome could have been much worse had there been another car, had your car not had a system to detect moving across lines, had you panicked when you looked up. Has this experience taught you to not text while you drive? This is unlikely because of two cognitive biases.

The first is “outcome bias,” or the tendency to evaluate events based on their ultimate outcome, not based on decisions or processes that led to said outcome. Because decisions and processes are less visible than outcomes, they tend to be discounted. Moreover, we all seem to operate in ways that are a bit Machiavellian, in that the ends become more important than the means.

The second is “probability neglect,” which happens when people or organizations treat probabilistic outcomes as deterministic. In other words, even though we know there was a chance of having another car in the lane, we tend to discount our own good luck and the fact that this outcome was simply one draw from a whole distribution of possible outcomes.

For decades, my colleagues and I have been analyzing how and why most near-miss experiences teach people to become more complacent about the risk of any activity and to engage more and more in related activities—a term we’ve called “risk creep.” We find this to have been true in the production, launch, and operation of spacecraft; operation and maintenance of rigs for oil drilling; computing practices around malware and other cyber threats; evacuation decisions in the face of an impending hurricane; operation and maintenance of mines; and even in personal protection against COVID-19.

What we find is that when people engage in behavior they know to be risky but “get away with it” in terms of no significant consequences, they become complacent about the risk and are more likely to engage in the same risky behavior in the future, compared to if they had no near-miss experiences. This happens even when they explicitly know they got away with it because of luck. They have not necessarily revised their probability estimate of the risk (i.e., that there is a 10 percent chance of something bad happening), merely how they feel about that probability. After experiencing or having information about near-misses in their decision context, that same probability of failure starts to feel less dangerous.

For example, in a set of studies in a controlled laboratory environment, we had participants assume operations of a rover that had been on Mars for five days driving to an observation point eight travel days away. If they arrived within 11 days they received a (real) cash reward and a bonus for every day they were early. However, they were further told that if the rover drove through a sandstorm there was a 40 percent chance of catastrophic wheel damage, and they would not make it to the observation point. Half the participants were then told that there had been three sandstorms on Mars just before the rover landed. The other half of the participants were told there had been three sandstorms through which the rover traveled (on autopilot). All participants were then given the same weather forecast for day 6 (the first day they assumed operational control from the autopilot) for a 95 percent chance of a sandstorm. They then elected to drive or stop and deploy wheel guards.

The second set of participants, who received near-miss information, were significantly more likely to drive on day 6, despite the projected sandstorm, than the first set of participants, who had no near-miss information and who mostly elected to stop and deploy wheel guards. To be specific, in this study, 75 percent of participants with near-miss information chose to drive through the sandstorm, whereas only 13 percent of participants without this information chose to drive. Notably, in both groups, participants reported that the risk of driving was the same—it was just that the second group of participants felt more lackadaisical about that 40 percent risk of failure. We have run similar studies on novices and experts who show the same near-miss bias.

To discount any explanation that perhaps participants in the second condition justifiably thought they had a particularly strong rover, we have run studies with a similar structure, but this time participants anticipate an upcoming cruise and get information about a hurricane forecast and have to decide whether to forgo the cruise. Half get “near-miss” information that their friend last year went on a cruise with a hurricane forecast and nothing happened; half get no additional information. This first half of participants elect to go on the cruise significantly more than those without their friend’s near-miss information. Here, it is unrealistic to infer that their cruise ship was particularly resilient just because their friend had a lucky experience.

Most recently, we found a similar pattern of results in a five-month longitudinal field study that traced people’s activities during the COVID-19 pandemic. Following lockdown and prior to vaccines, we tracked what people did when they left their homes. We found that people’s level of non-discretionary activities (errands for things such as groceries or a drug store) was unchanged during the time period. But people who said they took part in riskier public activities one week (such as a social gathering or going out to dinner) gradually engaged in more subsequent discretionary activities the following week. Again, people show a creeping tolerance of risk from the inconsequential outcomes (near-miss experiences) of their own experimentation with activities they knew to be risky.

What can organizations and leaders do to improve learning from near-misses?

Aside from the obvious cultural prescriptions—to value safety over production and reward rather than punish reporting of near-miss incidents—my research points to two general ways to improve responses to near-miss experiences.

The first is to make near-miss reporting systems as simple and costless as possible. Employees should not be questioning themselves as to whether or not it is worth it to report any anomalies they witness or experience. The system for inputting these incidents should be easily accessible and not time-consuming. Organizations may even think about daily reporting times (or other routine instances that make sense for the operations), during which an employee who witnesses no incidents can proactively put in that entry. In this way, reporting habits can be formed, and barriers such as time and opportunity constraints are alleviated.

Second, organizations should re-frame what failure means and change their narrative to accord with this new meaning of failure. Specifically, for most organizations, failure is associated with significant events and noteworthy costs. Not surprisingly, fear of failure is a real threat. Alternatively, though, failure could be used to capture any less-than-optimal outcome, regardless of how small. In these ways, failure could be seen as an outcome of experimentation that simply is not quite (yet) working. In this way, failure could be seen as a step on the road to success.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: