Skip to main content
Open AccessOriginal Article

Why Learning Opportunities From Aviation Incidents Are Lacking

The Impact of Active and Latent Failures and Confidential Reporting

Published Online:https://doi.org/10.1027/2192-0923/a000204

Abstract

Abstract. The rising trend of fatal aircraft accidents since 2018 suggests a limited safety capability of airlines in terms of learning from incidents (LFI). We evaluated 2,208 voluntary incident reports from commercial European pilots using qualitatively driven mixed methods to investigate LFI “bottlenecks.” The results showed that the report frequency depends on the type of pilots’ active failure causing the incident (performance‐based errors, judgment and decision‐making errors and violations). Learning opportunities were lacking, especially for incidents caused by pilots’ inadequate decision-making. Confidential reporting has positive effects on LFI, as these reports contained more information about latent failures. Furthermore, we identified several latent failures that are risk factors for certain unsafe acts. Our results may support airlines in various LFI activities.

The latest safety report of the International Civil Aviation Organization (ICAO) shows an increasing accident rate since 2018, after a steady decline in the previous years (ICAO, 2019). In 2018, 11 fatal accidents occurred in scheduled commercial operations – the highest number in 5 years (ICAO, 2019). These disturbing statistics show that, despite substantial safety management and organizational learning processes, (preventable) accidents continue to occur, claiming lives, causing financial losses, and impairing the competitiveness of airlines (Stemn et al., 2018). One explanation for this trend may be that in the past, airlines have failed to learn their necessary lessons from accidents, as well as from minor incidents (Drupsteen et al., 2013; Stemn et al., 2018).

The “Value” of Incidents

The latest issue of the World Safety Journal describes the management of safety as an integral part of any organization (Bo, 2020). To maintain safety in environments characterized by change and insecurity, organizations need a “safety capability” that includes identifying and controlling the system’s destabilizing threats and continually adapting operational routines (Griffin et al., 2015). In particular, high-reliability organizations (HROs), such as airlines, require a constant awareness of emerging threats and of factors that threaten this understanding (Hayes & Maslen, 2015; Weick & Sutcliffe, 2007). Airlines also belong to a class of organizations in which learning from fatal accidents only is not a sufficient strategy (Hayes & Maslen, 2015). This requires a substantial ability and willingness for organizational adaptation, so that in the course of learning processes a change in declarative and procedural knowledge can take place (Argote, 2012; Fiol & Lyles, 1985). Organizational learning is intended to change the behavior of organization members by learning from mistakes (single-loop learning) or by modifying the values and norms underlying the behavior (double-loop learning; Argyris & Schön, 1996; Putz et al., 2013). The ability to convert experiences from past incidents into behavior and practices in order to prevent similar events in the future can be described as “learning from incidents” (LFI; Drupsteen & Guldenmund, 2014; Jacobsson et al., 2011). The ICAO (2013) defines an incident as an “occurrence, other than an accident, associated with the operation of an aircraft which affects or could affect the safety of operation”. Since LFI is possible “regardless of the severity of the consequences” of an incident (Drupsteen & Guldenmund, 2014, p. 83), “learning from weak signals” (precursor signals contributing to anticipate occurrences) can be understood as an even more proactive approach to enable organizations to learn successfully (Brizon & Wybo, 2009; Drupsteen & Wybo, 2015). For example, the use of an unapproved departure route would be an incident; the programming of an unapproved departure route during flight preparation, favored by a similarity of the departure route designations, would instead be called a weak signal, if the error was corrected in time. Drupsteen and Wybo (2015) formulate “dealing with everything that may be wrong, from weak signals to incidents” (p. 35) as a fundamental principle of HROs and as an important prerequisite for organizations to learn from their experiences.

In this study, we use the term “incident” as an umbrella term for all types of incidents, near-misses, or weak signals that can provide input for learning and affect the safety of an organization (Drupsteen & Guldenmund, 2014; Rasmussen et al., 2013).

In most cases, incidents are caused by human errors or violations (Wiegmann & Shappell, 2017a). More recent views on human error extend this view to the effect that errors are indeed symptoms of “trouble deeper in the system” of an organization (Dekker, 2006, p. 18). Errors can be defined as “a deliberate action (or the deliberate omission of an action) characterized by the unintended failure to achieve personal goals and/or the unintended deviation from organizational norms and goals which could have been avoided by alternative behaviors of the acting person” (Putz et al., 2013, p. 513). Violations are also intentional acts, but lead to a deliberate, intentional non-compliance with known rules, procedures, or organizational norms (Reason, 2016). To initiate error-related learning processes, the detection of errors in the course of collecting information, the so-called learning product, is an essential step to be able to learn from them (Argyris & Schön, 1996; Cannon & Edmondson, 2001).

Six consecutive phases can be distinguished in the LFI process: “Reporting Incidents, Investigating Incidents, Developing Incident Alerts, Disseminating Information, Contextualizing Information and Implementing Actions” (Littlejohn et al., 2017, p. 82). Even one phase executed improperly can lead to learning becoming ineffective or failing to occur (Drupsteen & Hasle, 2014). Above all, the phases Reporting Incidents and Investigating Incidents have been identified as bottlenecks in several studies (cf., e.g., Drupsteen et al., 2013; Drupsteen & Hasle, 2014; Stemn et al., 2018). We will therefore investigate these two phases more closely in this study.

Reporting Incidents

Within the framework of organizational learning processes, data are derived, for example, from accident investigations, flight data monitoring, crew checks during scheduled flights, or line operations safety audits (LOSA), where safety-relevant findings are generated by various methods such as cockpit observations or crew interviews (Helmreich et al., 2017). The traditional data source for the LFI process of airlines is, however, the written report by an organization member, usually a pilot, about an incident (Margaryan et al., 2017). An aviation safety reporting system was already introduced by the National Aeronautics and Space Administration (NASA) in the mid-1970s (NASA, 1976). The objectives described at that time, namely, to create a reporting system for all members of the organization in which data are stored, evaluated as part of operational routines, and communicated to various stakeholders, continue to form the basis of reporting systems implemented in almost all high-reliability sections, such as nuclear power technology or health care (NASA, 1976; Van der Westhuizen & Stanz, 2017).

Safety management systems (SMS) include a systematic approach to managing safety, including the necessary organizational structures, accountabilities, policies, and procedures (ICAO, 2018b). Within this framework incident reports are used as a source of data to identify risks, develop mitigation measures, and monitor safety (Rasmussen et al., 2013; Van der Westhuizen & Stanz, 2017). The “stories” contained in the reports help operating staff to build their “safety imagination” and evaluate the safety of decisions and are also relevant within the framework of “story-based” learning (Hayes & Maslen, 2015).

Incident reporting can be considered to be a form of change-oriented safety citizenship behavior (SCB; Conchie, 2013). SCB is influenced by person-related antecedents, such as affective engagement and psychological ownership, and situation-related antecedents, such as the safety climate within the organization (Curcuruto & Griffin, 2018; Parker et al., 2010). The safety climate encompasses shared perceptions of safety policies, procedures, and practices among organization members and, in addition to safety behavior mediated by individual motivation, influences forms of safety participation at the discretion of the individual, such as incident reporting (Griffin & Curcuruto, 2016; Griffin & Neal, 2000; Zohar, 2003). The assessment of climatic aspects in relation to the LFI can be conceptualized, for example, with the influence of various environmental factors on learning levels when considering the error-related learning climate (Putz et al., 2013).

In the context of LFI, safety-cultural aspects are often discussed as influencing factors that guide the behavior of organizations and their members through underlying assumptions and values (Curcuruto & Griffin, 2018; Reason, 1998). In a learning culture, errors and the resulting incidents are accepted and explicitly seen as an opportunity to learn from (Littlejohn et al., 2014). To achieve this, it is a fundamental requirement that incidents are reported by organization members (Reason, 1998). Incident reporting is encouraged by a just culture, which includes an atmosphere of trust, where employees are encouraged and even rewarded to share safety information, but where a clear distinction is also made between acceptable and unacceptable behavior (Dekker, 2018).

Impairments in climatic and cultural aspects, such as a lack of trust and openness, but also fear and shame on the part of organization members, are some of the factors that limit safety participation in terms of incident reporting (cf., e.g., Curcuruto & Griffin, 2018; Drupsteen & Guldenmund, 2014; Gilbey et al., 2016; Jausan et al., 2017; Zabari & Southern, 2018). In order to reduce the influence of these hindering factors and increase the reporting rate, confidential reporting systems are widely implemented (Jausan et al., 2017; Langer, 2016; Merry & Henderson, 2017).

Too Few Incidents Are Reported

There are numerous reasons why incidents are not reported (cf. Jausan et al., 2017). In this study, we address the results of a survey conducted by Sieberichs and Kluge (2017) involving commercial pilots, who observed that the probability of an incident being reported depended on the type of incident. We thereby also follow a recommendation of Hayes and Maslen (2015) to focus further research on the types of incidents reported. This is relevant in the context of LFI, as less frequently or unreported types of incidents limit the learning opportunities that arise from them in the course of single-loop learning and reduce the accuracy of the site risk assessment (Argyris & Schön, 1996; Drupsteen & Guldenmund, 2014; Stemn et al., 2018). For safety management tasks, it is also relevant whether the frequency of various incidents differs in flight phases and route segments (Stolzer et al., 2015; Wheeler et al., 2019). Therefore, we ask:

Research Question (RQ1):

Are there incidents that are reported less frequently depending on their causal unsafe acts?

Research Question (RQ1.1):

Does the frequency of reported incidents differ between flight phases and route segments depending on their causal unsafe acts?

Investigating Incidents

When investigating incidents, the “immediate and underlying causes of the incident” should be determined (Littlejohn et al., 2017, p. 82). A frequently applied linear framework for investigation is the Swiss cheese model, which serves as the basis for the Human Factors Analysis and Classification System (HFACS; Reason, 1990; Wiegmann & Shappell, 2017b). In this model, accidents or incidents are caused by a chain of organizational and personal factors and are distinguished into mishap-level factors, such as organizational influences and unsafe supervision, and person-level factors, such as preconditions and unsafe acts. Organizational influence, unsafe supervision and preconditions represent latent failures, whereas unsafe acts are considered active failures (Littlejohn et al., 2017; Wiegmann & Shappell, 2017b). In contrast to this linear view, the system-theoretic accident model and processes (STAMP) approach, incorporates uncertain interactions of different system components and treats safety more as a dynamic control problem (Leveson, 2015).

Even though the Swiss cheese model has been criticized for not being able to capture the real world because it is too static and not specified enough, advantages of the system have been repeatedly highlighted in civil, commercial, and military aviation, but also in other organizations such as hospitals (Cohen et al., 2015; Hollnagel et al., 2006; Larouzee & Le Coze, 2020; Sunaryo et al., 2019). HFACS has also emerged as a reliable system in the framework of incident or weak signal analysis (e.g., Lee et al., 2017; Li & Harris, 2006; Madigan et al., 2016; Miranda, 2018; Munene, 2016). Miranda (2017), for example, was able to identify latent failures that are particularly conducive to certain types of unsafe acts when evaluating major accidents in a military context.

In this study we summarize errors and violations as immediate incident causes with the term “unsafe acts” (cf. Littlejohn et al., 2017; Wiegmann & Shappell, 2017b).

Too Little Information About the Incident Is Given

As stated earlier, confidential reporting systems are widely used to increase the reporting rate (Langer, 2016; Merry & Henderson, 2017). Since critical information about an incident is sometimes withheld in reports (Jausan et al., 2017), in this study we ask whether the use of confidential reporting depends on the type of unsafe act that caused the incident and if confidential reporting also increases the level of information about latent failures. With respect to the aforementioned results of Sieberichs and Kluge (2017), we ask whether the level of information about latent failures in reports depends on whether an error or violation caused the incident.

Research Question (RQ2):

Are there incidents that are more frequently reported confidentially depending on their causal unsafe acts?

Research Question (RQ2.1):

Do confidential reports contain more information about latent failures than non-confidential reports?

Research Question (RQ2.2):

Does the level information about latent failures differ in reports where errors or violations caused the incident?

Latent Conditions Are Not Identified

The identification of latent failures is important to prevent the likelihood of reoccurrence of similar incidents, to facilitate double-loop learning, and to improve safety imagination (Argyris & Schön, 1996; Drupsteen & Guldenmund, 2014; Drupsteen & Hasle, 2014; Hayes & Maslen 2015; Madigan et al., 2016). In addition, identifying the complexity of an incident is important for selecting the necessary learning solutions (Littlejohn et al., 2017). To support airlines in identifying underlying incident causes, we will investigate whether there are latent failures that can be considered risk factors for certain unsafe acts.

Research Question (RQ3):

Are there latent failures that are risk factors for the various unsafe acts and the incidents they cause?

With these six research questions we take up three reasons given by Drupsteen and Hasle (2014) on why organizations are not effectively learning from incidents.

Method

Data Selection

To answer the research questions, we evaluated voluntary written reports from pilots to their airline (originating situation), which contained the description of incidents caused by pilot errors or violations. These were exported from the digital safety database of a European airline operating short- and long-haul flights. The evaluation of reports corresponds to a common procedure of the airline in the context of various SMS activities and is in line with its policy. A permission from the airline for evaluation and publication of the data has been obtained.

The database contained no information about the identity of the author. In addition, in the course of an “absolute anonymization,” neither the date of the report nor the author’s rank was recorded and the information regarding the aircraft type on which the incident occurred was summarized in short- and long-haul (cf. Medjedovic & Witzel, 2010).

The exported reports included all reports that were stored in the database between 2002 and summer 2019 listed under the internally used designation “pilot error.” In addition to the description of the incident in text form, each report contained the attributes flight phase (ground, take-off, flight, approach/landing), route segment (short- or long-haul), and report type (confidential or non-confidential). The data export included 2,208 incident reports.

Research Design

The methodology and the step-by-step methods to be applied were defined in advance in a binding research plan (Mayring, 2020). A sequentially linked, qualitatively driven mixed-method context analysis design was defined as methodology (cf. Kansteiner & König, 2020; Kelle, 2019; Mayring, 2020). A method triangulation of qualitative and quantitative document analyses was used (cf. Baur et al., 2017). The aim of the qualitative steps was to prepare a structuring description of the documents that were to be classified according to theoretically meaningful order aspects (Mayring, 2015). The aim of the quantitative steps was to answer the research questions by using descriptive and inferential statistical evaluation procedures. The description dimensions were quantitative variables transformed from the coded reports and the mentioned attributes. We justify the methodology with the complementarity of the procedures used, since we expected further clarification of the qualitatively obtained results through quantitative evaluation steps (Kansteiner & König, 2020; Schoonenboom & Johnson, 2017). According to Kuckartz (2019) and Mayring (2015), qualitatively oriented classifications are also a good starting point for quantitative analyses. We consider reports to be an adequate data basis for answering the research questions, as previous experience with a comparable methodology is available, for example, from Miranda (2018). In terms of scientific theory, we base the methodology on a pragmatic position that follows the paradigm of dialectical pluralism (Baur et al., 2017; Mayring, 2007).

Procedure: Qualitative Steps

For the classification of the reports, a content-analytical process model according to Mayring was chosen as the analysis technique (Mayring, 2015, p. 62). In accordance with a content analytic communication model (Mayring, 2015, p. 59), the description of the subject matter included in the reports was determined as the direction of the analysis. A complete report was defined as the coding unit and context unit. The analysis unit consisted of the 2,208 reports, which were consecutively evaluated. Although the STAMP approach allows for the modeling of nonlinear relationships and is more appropriate in complex socio-technical systems, we use HFACS because we expect it to be more reliable due to its taxonomic structure and this system has proven to be more useful when analyzing a larger number of case studies (cf. Salmon et al., 2012).

The assignment of categories to the reports was done deductively with categories of the Dod-HFACS 7.0 from the “Dod-HFACS 7.0 Guide” (Air Force Safety Center, 2016). To facilitate the use of these categories that were designed for the military context, Scott Shappell has provided us on request with an overview of anchor examples for the civil context. DoD-HFACS 7.0 contains Mishap-Level Factors with categories for Organizational Influences and Supervision and Person-Level Factors with categories for Preconditions and Unsafe Acts (Air Force Safety Center, 2016).

To test the suitability of the category system, 200 randomly selected reports were initially coded in a pilot phase. Since the Mishap-Level Factors turned out to be unsuitable for the coding units, we only used the Person-Level Factors. In the course of this pilot phase, anchor examples and coding rules were added to the categories and definitions given in the “DoD-HFACS 7.0 Guide,” and a separate analytical scheme in the form of a coding manual was created to ensure a stable perspective for researchers during analysis (cf. American Psychological Association, 2020). Following the recommendation of Jacobsson et al. (2009), the coding rules were formulated in such a way that factors that “were not directly stated in a (…) report, but that can be deduced following the description of the event” (p. 197) could also be considered.

The coding manual contained 13 categories (including three superordinate categories) within the factor Unsafe Acts and 61 categories within the factor Preconditions.

The category Procedure Not Followed Correctly is an example for an unsafe act within the superordinate category Performance-Based Errors. This category describes a factor when a procedure is performed incorrectly or accomplished in the wrong sequence. It is assigned if a procedural error is explicitly mentioned in the report or the report contains a text passage to which the definition applies (Anchor examples: “We did not apply the Oceanic-Crossing-Procedure correctly and did not check the updated route clearance” [Report 207]; “Cleared for SOBRA 3L, we programmed SOBRA 1S, briefing for SOBRA 3L, we flew SOBRA 1S” [Report 505]).

The category Complacency is an example for a precondition. This category describes a factor when the individual has a false sense of security, is unaware of, or ignores hazards and is inattentive to risks. It is assigned if complacency or an equivalent term is explicitly mentioned in the report or the report contains a text passage to which the definition applies (Anchor examples: “I will never again delegate important call-outs to other than cockpit crew members! Complacency at its best!” [Report 1771]. “The main factor was Complacency in the clearance review” [Report 1311]).

Each coding unit was assigned with one category of the factor Unsafe Acts and the applicable categories of the factor Preconditions. This procedure was repeated for the entire evaluation unit by the main coder. To ensure consistency in the analysis process in terms of developing a stable perspective of the researchers, we defined a period of 12 consecutive weeks for the coding process. The time units for content analysis were limited to 45 min. Between two and four units per day were carried out on at least 5 days per week.

We will illustrate the procedure with the following example:

Report 999:

Ramp agent (RA): “Please apply brake, pushback completed.” I answer: “brake set” after looking out of the window to see if the plane is standing still. The RA replies: “Then the yellow light on the nose gear seems to be defect.” I look at the triple indicator and see no brake pressure! After a look at the Parking Brake Selector I see that I have indeed not set the brake. I cannot explain why. Probably classic distraction and fatigue. Fortunately, we had a very alert RA.

The unsafe act Procedure not followed correctly was assigned to this report, because the push-back procedure stipulates that verbal confirmation of the set parking brake may only be given after checking the triple-indicator. Distraction and Fatigue were assigned as Preconditions.

Of the 2,208 exported reports, 464 (21.01%) were excluded from further evaluation:

  • 278 reports that were incomplete or in the wrong category;
  • 113 reports about errors of others (e.g., the author was travelling as a passenger and describes that the pilots did not remove ice from a wing that clearly had to be de-iced);
  • 29 reports with measurable damage (these reports are mandatory);
  • 25 duplicate reports; and
  • 19 reports in which the author was asked to write the report (background: If extreme deviations are detected during flight data monitoring, the airline can require the causing pilot to write a report).

Thus, the description field consisted of 1,744 incident reports.

After the coding of all reports, a second coder was introduced into the coding manual. Both coders jointly coded 14 boundary cases in which the assignment of more than one category of the factor Unsafe Acts would be possible. An example of a boundary case is a report in which a pilot did not correctly apply the wind-shear-escape-maneuver and in addition oversteered the aircraft. In this case the category Procedure Not Followed Correctly was selected, because this unsafe act would have the largest relative contribution to the most credible accident scenario, if the same incident happened again (cf. ICAO, 2018a). To determine inter-rater reliability for both coders, a second evaluation unit was created from randomly selected 10% of the reports, which was coded by the second coder with the described content-analytical model using the coding manual.

The main coder is an active civil airline pilot. Moreover, he has experience in the training of pilots and works in the safety department of a major European airline, where he is dealing with risk assessment, root cause analysis, and the evaluation of safety-related reports. The second coder works full-time in the safety department of the aforementioned airline and is responsible for the processing of all safety-related reports. Both coders have attended a training course on aviation accident investigation, which included training in HFACS. Since specific instructions are recommended for a reliable use of HFACS, the coders have completed a preparatory, web-based HFACS training (Clemson University, 2018a, 2018b; Ergai et al., 2016).

Procedure: Quantitative Steps

To prepare the qualitative results of the content analysis for a quantitative evaluation with SPSS (Version 26), the codes were transformed into binary variables. For each coding unit, a data set with 13 variables of the factor Unsafe Acts and 61 binary variables of the factor Preconditions was created. In addition, the aforementioned attributes of the reports from the database export were assigned to the data records as categorical variables. The quantitative analysis was based on 1,744 data sets.

To determine the frequency of reported incidents depending on the causal unsafe acts, the absolute frequencies of the variables of the factor Unsafe Acts were calculated in response to RQ1. To answer RQ1.1 and RQ2, chi-square tests were calculated to investigate whether the observed frequencies of reports in the flight phases deviated from expected frequencies. To determine the level of information about latent failures in the reports, we conducted t tests for independent samples to answer RQ2.1 and RQ2.2. To answer RQ3, logistic regression analysis models for dichotomous dependent variables were calculated (cf. Eid et al., 2015): The variables of the factor Unsafe Acts were each defined as dependent variables, the variables of the factor Preconditions were independent variables that were simultaneously included in the model calculation.

Results

The incidents described in the reports occurred in four different flight phases: 289 incidents (17%) occurred on ground, 223 (13%) during take-off, 461 (26%) in flight, and 771 (44%) during approach or landing. Altogether 1,117 (64%) occurred on short-haul and 627 (37%) on long-haul flights. Of 1,744 reports, 705 (40%) were non-confidential and 1,039 (60%) confidential.

Qualitative Results: Result of the Content Analysis

Each coding unit (N = 1744) was assigned one category of the factor Unsafe Acts: Within the superordinate category Performance‐Based Errors (n = 1,310) 76 coding units were assigned with the category Unintended Operation of Equipment, 43 with the category Checklist Not Followed Correctly, 733 with the category Procedure Not Followed Correctly, 431 with the category Over‐Controlled/Under‐Controlled Aircraft, 21 with the category Breakdown in Visual Scan, and six with the category Rushed or Delayed a Necessary Action. Within the superordinate category Judgment and Decision‐Making Errors (n = 152), 56 coding units were assigned with the category Inadequate Real‐Time Risk Assessment, four with the category Failure to Prioritize Tasks Adequate, 30 with the category Ignored a Caution/Warning, and 62 with the category Wrong Choice of Action During an Operation. Within the superordinate category Violations (n = 282) 113 coding units were assigned with the category Work‐Around Violation, 130 with the category Widespread/Routine Violation, and 39 with the category Extreme Violation – Lack of Discipline.

Each coding unit (N = 1,744) was assigned with the applicable categories of the factor Preconditions. In total, the reports contained 2,691 text passages that were coded with a category of the factor Preconditions. Figure 1 shows how often each category was assigned. For example, 361 reports were coded with the category Not Paying Attention.

Figure 1 Frequency of preconditions categories.

The following categories from the factor Preconditions of DoD-HFACS 7.0 could not be assigned to any coding unit: Psychological Problem, Turning/Balance Illusion – Vestibular, Temporal/Time Distortion, Substance Effects (alcohol, supplements, medications, drugs), Loss of Consciousness (sudden or prolonged onset), Trapped Gas Disorders, Evolved Gas Disorders, Hypoxia/Hyperventilation, Inadequate Adaptation to Darkness, Dehydration, Body Size/Movement Limitations, Physical Strength and Coordination (inappropriate for task demands), Vibration Affects Vision or Balance, External Force or Object Impeded an Individual’s Movement, Seat and Restraint System Problems.

Cohen’s κ was calculated to determine inter-rater reliability for both coders and amounts to κ = .95. According to Landis and Koch (1977), this indicates an almost perfect and according to Altman (1990) a very good reliability.

Mixed Methods Results: Addressing the Research Questions

For a better overview, when answering research questions Q1.1 and Q2.1, we only report results with cell frequencies greater than 5 and at least small effects (φ ≥ 0.1; Cohen, 1988). When interpreting the effect sizes, a small effect can be assumed for φ = 0.1 or d = 0.2, a medium effect for φ = 0.3 or d = 0.5 and a large effect for φ = 0.5 or d = 0.8 (Cohen, 1988). When interpreting pseudo-determination measures (Cox–Snell R2 and Nagelkerke R2), a model-fit can be considered acceptable with R2 > 0.2 and good for R2 > 0.4 (Backhaus et al., 2016).

RQ1: Are There Incidents That Are Reported Less Frequently Depending on Their Causal Unsafe Acts?

The frequency of reported incidents varied depending on the causal unsafe acts. The results showed that incidents caused by judgment and decision-making errors are reported less frequently. Incidents caused by judgment and decision-making errors (n = 152) accounted for less than 9% of all reports. Within this superordinate category, for example, there were only four reports of incidents where a failure to prioritize tasks adequately was the cause of the incident. About 16% of the reported incidents were caused by violations (n = 282). Within this superordinate category, extreme violations (n = 39) were the least frequently reported. Although performance-based caused more than half of the reported incidents, within this superordinate category there were only six reports of incidents caused by a rushed or delayed necessary action.

RQ1.1: Does the Frequency of Reported Incidents Differ Between Flight Phases and Route Segments Depending on Their Causal Unsafe Acts?

The frequency of reported incidents differed in different flight phases for some of the unsafe acts. The results of the chi-square tests (associations between unsafe acts and flight phases) are presented in Table 1.

Table 1 Unsafe acts by flight phases

If an incident was caused, for example, by an incorrectly followed procedure, the observed frequency of reports in the flight phase Approach was higher than the expected frequency. The opposite applied to the other flight phases.

The results of the chi-square tests (associations between unsafe acts and route segments) showed that the frequency of reported incidents in different route segments did not differ from the expected frequency depending on the causal unsafe acts.

RQ2: Are There Incidents That Are More Frequently Reported Confidentially Depending on Their Causal Unsafe Acts?

The frequency of confidentially reported incidents differed for incidents caused by widespread or routine violations. The results of the chi-square tests (associations between unsafe acts and report type) showed that the observed frequency of confidential reports on incidents caused by widespread or routine violations is higher than the expected frequency, χ2(1) = 59.59, p < .001, φ = −0.19.

RQ2.1: Do Confidential Reports Contain More Information About Latent Failures Than Nonconfidential Reports?

In nonconfidential reports (M = 1.22, SD = 0.89) fewer latent failures were reported than in confidential reports (M = 1.76, SD = 1.20). Nonconfidential reports contained −0.54 latent failures (95% CI [−0.64, −0.44]) less than confidential reports, t(1729.26) = −10.73, p < .001, d = −0.51.

RQ2.2: Does the Level of Information About Latent Failures Differ in Reports Where Errors or Violations Caused the Incident?

The level of information on latent failures differed between reports in which violations caused an incident (M = 2.00, SD = 1.35) and reports in which errors caused an incident (M = 1.46, SD = 1.04). Reports in which violations caused an incident contained 0.54 more latent failures (95% CI [0.40, 0.68]), t(1742) = 7.56, p < .001 d = 0.45.

RQ3: Are There Latent Failures That Are Risk Factors for Various Unsafe Acts and the Incidents They Cause?

We calculated 13 logistic regression models with all categories of the factor Unsafe Acts as dependent variables and all variables of the factor Preconditions as independent variables. Four models failed the omnibus test of the model coefficients and four models had an unacceptable model-fit. Therefore, we calculated logistic regression models with dependent variables summarized at the superordinate category level (Performance-Based Errors, Judgment and Decision-Making Errors and Violations). Tables 2 to 4 show the results of the regression calculations. For a better overview, only the independent variables with Wald value p < .05 are shown.

Table 2 Model summary for performance-based errors
Table 3 Model summary for judgment and decision-making errors
Table 4 Model summary for violations

In all three models, the Hosmer–Lemeshow test of model quality was not significant.

The omnibus test for the model with dependent variable Performance-Based Errors was significant, χ2(46) = 604.54, p < .001. The Cox–Snell index was CS = .293 and the Nagelkerke index was NK = .434.

The omnibus test for the model with dependent variable Judgment and Decision-Making Errors was significant, χ2(46) = 160.62, p < .001. The Cox–Snell index was CS = .088 and the Nagelkerke index was NK = .197.

The omnibus test for the model with dependent variable Violations was significant, χ2(46) = 420.87, p < .001. The Cox–Snell index was CS = .214 and the Nagelkerke index was NK = .365.

Since the estimated regression coefficient cannot be interpreted meaningfully due to nonlinear relationships, we follow the recommendation of Best and Wolf (2010) and explain the results in terms of direction and strength of the odds ratios. As a summary of the results of the regression analyses, Figure 2 shows the odd-ratio values associated with the respective preconditions of the estimated regression coefficients with positive sign for the three dependent variables.

Figure 2 Risk factors for unsafe acts. The numbers shown correspond to the odd-ratio values (Exp(B)) of the preconditions in the three regression analyses; the letters indicate the various dependent variables (P = Performance-Based Errors, J = Judgment & Decision-Making Errors, V = Violations).

The preconditions (latent failures) presented can be considered as risk factors for the various unsafe acts and the incidents they cause. Except for fixation and lack of assertiveness, each precondition is only a risk factor for one unsafe act. For example, not paying attention increases the risk of an incident caused by a performance-based error by up to 4.55 times.

Discussion

The aim of this study was to further analyze why organizations cannot learn effectively from incidents. By looking at hindering factors in the Reporting Incidents phase, we found that airlines suffer from a lower number of reports on some types of incidents depending on the type of active failure, causing the incident (pilots’ unsafe acts) – particularly with regard to judgment and decision-making errors. The frequency of reported incidents for some causal unsafe acts also differed in different flight phases. Confidential reporting had a positive effect on LFI, as these reports contained more information about latent failures than nonconfidential reports; this also applied to reports about incidents caused by violations. Furthermore, confidential reports were more often used to report incidents caused by widespread or routine violations. By looking at hindering factors in the Investigation Incidents phase, we identified a total of 17 person-related latent failures, which can be considered risk factors for various unsafe acts and the incidents they cause.

The identified unequal distribution of incident reports depending on type of active failure causing the incident (pilots’ unsafe acts) suggests that a not insignificant proportion of errors remain undetected – today just as 30 years ago (cf. Reason, 1992). A partially comparable distribution of causal errors and violations to our results could, for example, be demonstrated by Munene (2016) when analyzing accident reports. In our study, reports of incidents caused by judgment and decision-making errors account for less than 10% of all reports evaluated. This relatively small number of reports is already an indication of limited learning opportunities in this area. In addition, the number of reports does not necessarily correspond to the number of actual incidents: In a study by Haslbeck et al. (2015), the highest number of unreported incidents was calculated for incidents caused by poor decision-making (e.g., a landing with less residual fuel than legally required). These results were also confirmed by the aforementioned survey by Sieberichs and Kluge (2017): Here, pilots stated that they are rather unlikely to report incidents caused by operational decision errors. Considering this limited willingness to file a report on this type of unsafe act suggests that learning opportunities are lacking although more learning opportunities could be provided by the pilots.

Furthermore, there is the risk that the few reports on judgment and decision-making errors are overshadowed by the disproportionately high number of reports on performance-based errors by distorting the safety imagination of the operating staff (cf. Hayes & Maslen, 2015). This effect is reinforced by the fact that more than half of all incidents recorded in the database are caused by incorrectly applied procedures. However, this overweight is not surprising given the high number of procedures (normal, supplementary, abnormal, etc.) and has already been highlighted in other research studies (cf., e.g., Shappell et al., 2017). Even though incidents are suitable for providing an overview of the site’s risk (Stemn et al., 2018), the unequal distribution identified suggests that it is likely that an airline will not be aware of all incidents and, in addition, different frequencies in different flight phases must be taken into account.

In the mid-1970s, NASA emphasized the importance of a confidential reporting option as the first feature of a reporting system (NASA, 1976). The results of our study underline that this feature is still important today, as confidential reports contain more information about latent failures. Our results also support Langer’s (2016) findings.

In identifying risk factors for various errors and violations, we were able to confirm certain results of Miranda (2017) at a higher level of abstraction: A misinterpreted instrument is the strongest risk factor for a performance-based error. When evaluating the related reports, it became clear that misread instruments may also occur in connection with so-called mixed-fleet-flying (pilots have the license for aircraft types that differ only slightly from each other). Our results thus underline the analysis of Soo et al. (2016) that, “even the smallest shift in instrument location can cause errors in performances” (p. 454). Lack of attention is the second strongest risk factor for performance-based errors and also the most frequently mentioned risk factor in all evaluated reports. Other studies, such as an investigation of accidents caused by loss of control inflight, also identified issues with flight crew attention as a significant contributing factor (Stephens et al., 2017). To address these “attention-related human performance limitations,” special training courses for attention management are being developed (Stephens et al., 2017, p. 1). Furthermore, results of attention studies with eye-trackers are being used for the human-centered design of flight decks (cf. Li et al., 2016).

We could expand the state of research, indicating that lack of assertiveness is not only a risk factor for judgment and decision-making errors, but also for violations (cf. Miranda, 2017). This result is not surprising, considering that decisions in commercial cockpits are mainly made jointly by the pilots. However, a lack of assertiveness is the strongest risk factor for judgment and decision-making errors and the second strongest for violations. This would not be expected, given the results of a NASA (2004) survey, in which about three quarters of the commercial pilots interviewed stated that they had a high degree of assertiveness. In addition, we were able to identify complacency as a risk factor for incidents caused by violations – a factor that is usually discussed in conjunction with automation surprise (AS; de Boer & Hurts, 2017). One explanation suggested might be that complacency limits the ability of pilots to actively assess risk, thereby increasing their propensity to commit deliberate deviations (Rascher & Schröder, 2016).

Limitations

We have placed particular emphasis on a transparent design and explication of the research process, but due to the large number of reports we could only partially document an empirical anchoring through textual evidence. The criterion of basing the data on evidence is therefore only fulfilled to a limited extent (APA, 2020). In addition, in terms of methodological rigor, intersubjective plausibility is thus partially limited (Renner & Jacob, 2020). When classifying the reports with preconditions it became clear that the categories of DoD-HFACS 7.0 were not exhaustive, as we identified latent failures in some reports that were not covered by a coding rule in the coding manual. The distinction we have chosen between active and latent failures is based on the Swiss cheese model, which has been increasingly criticized in recent years for being too linear, static, and unspecified to capture the real world (Drupsteen & Hasle, 2014; Hollnagel et al., 2006; Larouzee & Le Coze, 2020). Particularly with regard to the formation of a safety imagination through stories, the evaluation carried out here is therefore less suitable (cf. Hayes & Maslen, 2015). The inter-rater reliability (Cohen’s κ = .95) is very satisfactory, but Rädiker and Kuckartz (2019) argue that the use of reliability coefficients is not necessarily appropriate in the context of qualitative content analyses. For the most part, the effect sizes were only in a small to medium section, which limits the validity of most results obtained. When considering the frequencies of the incidents in different flight phases, our evaluation assumed that all four flight phases are of equal length in terms of time – again, limited validity must be expected. Also, the generalizability seems to be limited in this area, because Wheeler et al. (2019) observed a different distribution of incidents in flight phases. A presentation of results in Q3 with conditional effect plots recommended by Best and Wolf (2010) was not realized due to the high number of independent variables. The assessment of the model quality (Q3) was limited with the indices presented, as these indices did not allow for an interpretation of the explained variance. Although we have evaluated the appropriateness of the method selected in advance (cf. Steinke, 2019), we were unable to determine any temporal changes in the aspects investigated, as the date of the incident was not available. Therefore, we were not able to detect any change in the level of information of the reports about latent failure due to cultural aspects, such as just culture; moreover, the generalizability of the results is limited. An absolute anonymization (cf. Medjedovic & Witzel, 2010) can be seen as positive from the perspective of research ethics, but prevents further analyses, such as differences in the reports depending on the author’s rank.

Despite the limitations mentioned our findings are informative and meaningful in relation to the current literature and the study objectives (cf. APA, 2020). In particular, the high number of evaluated, mostly confidential, reports over a period of almost 20 years is a special feature of our research.

Implications for Research

We have shown that frequency of reported incidents varies depending on the causal unsafe acts, but in the context of this research we are unable to explain the reasons why or to quantify the actual number of unreported cases. Further research should therefore focus on the factors that influence the overall reporting behavior of pilots. To this end, following preparatory expert interviews in early 2020, we conducted a survey with civil pilots, supported by a European pilot association. Future research should also investigate the impact of just culture on the frequency of incident reports and the level of information about latent failures. For a classification of incidents and accidents in civil aviation we propose the following extension of HFACS:

  • Time pressure during daily operations – is a factor where time pressure is caused by external, nonorganizational factors such as predetermined takeoff times or irregularities during ground handling.
  • Not relying on gut feeling – is a factor in which the crew has an unarticulated gut feeling regarding a potentially dangerous situation but does not take this into account in the decision-making process.

We suggest checking the transferability of the identified risk factors to other HROs and, due to the aforementioned limitations of linear incident analysis, also by using narrative forms such as storytelling (cf. Maslen & Hayes, 2020).

Implications for Airlines

Airlines should pay particular attention to incidents caused by judgment and decision-making errors in the course of learning and safety management. In the formation of a safety imagination and in assessing the site’s risk, the risk of bias due to the high frequency of incidents caused by performance-based errors should be considered. Due to the described weaknesses of linear evaluation methods, the value of the stories contained in the reports should also be considered in the context of airlines’ organizational learning processes. In addition, sharing stories can be seen as an effective tool against complacency and, according to our findings, reduces the probability of incidents caused by violations (Hayes & Maslen, 2015). Our research shows that a confidential reporting system – despite a just culture that has emerged in many airlines – has a positive effect on the Reporting Incidents phase and therefore we advocate retaining these reporting options. The risk factors we have identified can serve airlines in the Investigation Incidents phase as a basis for identifying latent failures and as focal points for crew training.

The basic idea of a “zero accident vision” is that all (serious) accidents are avoidable (Zwetsloot et al., 2017). If airlines bear in mind the learning potential of less frequently reported incidents and latent failures and recognize the value of confidential reporting, bottlenecks in learning from incidents can be widened and the trend of the aforementioned accident statistics may be reversed.

Sebastian Sieberichs is doctoral candidate at the Department of Industrial and Organizational Psychology at Ruhr University Bochum. In addition to his work as a commercial pilot for a major European airline, he is employed by the airline’s flight safety department. His expertise is in human factors and safety and fatigue management.

Annette Kluge is Full Professor for Industrial and Organizational Psychology at Ruhr University Bochum. She obtained her Diploma in I/O Psychology at RWTH Aachen and her doctorate in ergonomics and vocational training at University of Kassel in 1994. Her expertise is in human factors, training science, skill retention, and safety management.

References

Sebastian Sieberichs, Department of Work, Organizational, and Business Psychology, Ruhr University Bochum, Universitätsstraße 150, 44801 Bochum, Germany, E-mail