Skip to main content
Open AccessEditorial

The Replication Crisis and Open Science in Psychology

Methodological Challenges and Developments

Published Online:https://doi.org/10.1027/2151-2604/a000389

If you were to ask two psychologists, a pessimist and an optimist, as to when psychology entered what we now know as the “replication crisis,” the former might state “decades ago” whereas the latter might be inclined to say “2011.” Despite this disagreement, both are likely to agree that two key occurrences in 2011 that received considerable attention, namely the scientific fraud case of Diederik Stapel (e.g., Levelt, Drenth, & Noort, 2012; Vogel, 2011) and Daryl Bem’s study on extrasensory perception (Bem, 2011), marked the beginning of a broader awareness that something was not quite right in the realm of psychology. In particular, following failed replications and criticisms of Bem’s findings (e.g., Ritchie, Wiseman, & French, 2012; Wagenmakers, Wetzels, Borsboom, & van der Maas, 2011), pervasive problems in psychology regarding the way we report, analyze, and selectively publish data surfaced with a previously unseen intensity. Consequently, many classical findings and an increasing number of recent studies published in high-ranking journals have come under scrutiny in various replication studies: Camerer et al. (2018), for instance, found that only 13 out of 21 social and behavioral science studies published between 2010 and 2015 in Nature and Science could successfully be replicated. With estimated replication rates ranging between 25% for social psychology and 50% for cognitive psychology (Open Science Collaboration, 2015), it became obvious that psychology suffers from a severe replicability problem.

In response to such findings, discussions began to surface about the reasons for why such disappointingly low replication rates were likely being observed. Some reasons such as publication bias, despite being known for decades (Sterling, Rosenbaum, & Weinkam, 1995), were largely ignored. Others, such as questionable research practices (QRPs), were certainly known but had either never been systematically investigated (John, Loewenstein, & Prelec, 2012; Simmons, Nelson, & Simonsohn, 2011), or were not likely regarded as “questionable” within the research community. Moreover, perverted incentive structures of journals and grant organizations were identified as contributing factors (e.g., Fanelli, 2010; Fanelli, Costas, & Larivière, 2015) in addition to statistical misconceptions and misuses (e.g., Gigerenzer, 2018; Greenland et al., 2016; Martinson, Anderson, & De Vries, 2005).

Fortunately, parts of the field neither fell into despair nor adopted self-destructive behavior, but rather started developing various initiatives to improve the way we conduct, analyze, and publish studies. These activities are now commonly known as the open science movement, which regards research transparency as a means against human error, sloppiness, publication bias, and fraud in science. Successful initiatives and implementations of research transparency and replicability include open data and materials, registered reports, and published replication studies. To illustrate this, an ever-growing number of journals (currently 204) offer registered reports as a publishing format (Center for Open Science, n.d.) and replication studies are notably more common in psychology (e.g., Klein et al., 2018; LeBel, McCarthy, Earp, Elson, & Vanpaemel, 2018; Open Science Collaboration, 2015). Initiatives have also resulted in technical implementations facilitating open science practices, such as the widely known Open Science Framework (Foster & Deardorff, 2017), as well as related projects such as PsychArchives (Weichselgartner & Ramthun, 2019). These platforms offer open, centralized workflows for preregistered studies and the research process in general, including developing a research idea, designing a study, storing and analyzing empirical data, and writing and publishing articles (e.g., Spellman, Gilbert, & Corker, 2018).

The open science movement has not only led to the implementation of transparency-related initiatives, but also stimulated large amounts of new research. Much of this research concerns three major topics:

  1. 1.
    Growing awareness of the relevance of publication biases, QRPs, and p-hacking, and the development, refinement, and extensive discussion of methods to detect these problems and to correct for their effects in meta-analyses (e.g., Alinaghi & Reed, 2018; Carter, Schönbrodt, Gervais, & Hilgard, 2019; Simonsohn, Nelson, & Simmons, 2014; Van Assen, Van Aert, & Wicherts, 2015).
  2. 2.
    Meta-scientific interdisciplinary research studying how psychological research is conducted, scrutinizing accepted research paradigms, and monitoring the scientific progress of psychology as a whole (e.g., Bishop, 2019; Nuijten, Van Assen, Hartgerink, Epskamp, & Wicherts, 2017; Van Aert & Van Assen, 2018).
  3. 3.
    An emergent field of research focused on the issue of how to best design, conduct, and analyze replication studies (e.g., Etz & Vandekerckhove, 2016; Patil, Peng, & Leek, 2016; Simonsohn, 2015).

This Topical Issue

This topical issue does not aim to give an extensive overview of the replication crisis and all the developments that have followed from it. Rather, it is intended to give a snapshot of recent developments by providing sample papers related to each of the topics mentioned above. It also seeks to ask what the open science movement has achieved so far and what lies ahead of us.

The first contribution in this issue is a review article by Crüwell and colleagues (2019). The authors introduce seven major facets of methodological reforms which are subsumed under the umbrella term “open science.” Within this issue, the article serves a two-fold purpose: It illustrates the current state of the debate about best practices with regard to such topics as open data and materials, preregistration, reproducible analyses, and replication designs. Thereby, it also highlights to what extent research practices have already changed. At the same time, the article is meant to provide a starting point for researchers who are interested in implementing open science practices. Taking the form of an annotated reading list, the article includes many suggestions for further reading and is a valuable teaching resource.

The two following original article contributions deal with methods to detect and correct for publication bias and QRPs. Erdfelder and Heck (2019) focus on the p-curve tool (Simonsohn et al., 2014), a method that shortly after being developed, in response to the debate around replication problems, rapidly gained popularity. P-curve relies on the distribution of statistically significant p-values in a set of studies to assess the evidential value of these studies. The method infers the presence of a true underlying effect from a right-skew in the distribution of p-values. In contrast, a left-skew is interpreted as an indication of selective reporting of significant results and p-hacking. Erdfelder and Heck demonstrate that both of these interpretations may be overly simplistic: First, they review previous research that identified conditions under which a right-skew in the distribution of p-values can be observed even though there is no true effect. They then use a simulation study to illustrate that some plausible assumptions about the research and publication process can suffice to generate a left-skew even when all studies reflect a true effect and p-hacking is not involved. The authors conclude that no shape of the distribution of p-values allows for an unambiguous diagnosis of evidential value (or lack thereof), and that the results of p-curve analyses should be interpreted much more cautiously than originally suggested by Simonsohn and colleagues.

Renkewitz and Keiner (2019) employ a large Monte Carlo simulation to evaluate the performance of six statistical tools in detecting publication biases in meta-analytic results. Their findings show that some of these tests for bias achieve high statistical power even in small study sets and work exceedingly well as long as nonsignificant results are completely excluded from the literature and all included primary studies have the same underlying true effect size. However, the performance of all methods deteriorates strongly when true effect sizes are heterogeneous or when publication bias is less severe, suggesting that in many actual meta-analyses in psychology, relevant biases will remain undetected. Accordingly, the authors conclude that statistical methods for the detection and correction of bias may be helpful to identify findings that deserve further scrutiny, but will not be able to ensure that the available evidence is trustworthy and valid.

The article by Steiner, Wong, and Anglin (2019) is a contribution to the ongoing debate concerning the design, analysis, and interpretation of replication studies. The authors introduce the causal replication framework which considers “replication” as a research design that is meant to test whether two studies produce the same causal effect. The framework formalizes the conditions under which a successful replication can be expected. Based on these assumptions, the authors discuss why it might often be difficult to interpret the results of (post hoc) replication studies that primarily aim to repeat the methods and procedures of an original study, and illustrate how the framework can be used to design (prospective) replication studies that allow for causal interpretations of replication failures. In sum, the article presents an alternative view on replication that emphasizes the causal interpretability of replication studies over the mere repetition of procedures.

Finally, the contribution by Kossmeier and colleagues (2019) gives an example of a meta-scientific study on psychological research. Previous meta-scientific studies, which investigated the median sample size of studies published in a journal, termed this statistic the N-pact factor (NF) and suggested using the NF as an important indicator of journal quality (e.g., Fraley & Vazire, 2014). Kossmeier and colleagues track the NFs of two personality psychology journals over 38 years and, thus, present the first long-term NF analysis. Their results show that the sample sizes in the scrutinized journals were comparably favorable to NFs in fields like social and sports psychology and increased gradually over time. Furthermore, the ascending NFs in both journals were accompanied by growing impact factors. Thus, we tentatively end on a positive note, with findings that suggest that there are indications of progress in psychological research practices.

Taken together, the diversity of the contributions to this topical issue reflects the vivid developments currently unfolding within psychological methods, meta-science, and research on open sciences practices. On the other hand, it also shows that there is still much work ahead of us. As research practices change rapidly, this process is, however, accompanied by research revealing the many problems still pervasive in psychology that have yet to be solved. Whether improvements in research practices find their way into psychology on a larger scale, or simply lead to a split in research and publication culture between proponents of open science and those of the traditional way of conducting research, remains an open question. Nevertheless, we believe that the articles in this topical issue provide a valuable contribution to the discussion on how we conduct psychological science and hope they help pave a way toward what remains to be done.

We thank Steffi Pohl (the Editor-in-Chief for this issue) and Christina Sarembe (the Editorial Production Manager) for their advice and support in the preparation process. We also thank Michael Bosnjak (Director of ZPID – Leibniz Institute for Psychology Information at the University of Trier) for hosting the associated conference on “Open Science 2019” that took place in Trier, Germany, March 12–14, 2019. We point interested readers to the online repository PsychArchives (https://www.psycharchives.org/) where materials of this conference can be found. In closing, we would like to thank Hogrefe, the publisher of the Zeitschrift für Psychologie, for supporting open science practices and granting open access to all contributions in this issue.

References

  • Alinaghi, N., & Reed, W. R. (2018). Meta-analysis and publication bias: How well does the FAT-PET-PEESE procedure work? Research Synthesis Methods, 9, 285–311. https://doi.org/10.1002/jrsm.1298 First citation in articleCrossrefGoogle Scholar

  • Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100, 407–425. https://doi.org/10.1037/a0021524 First citation in articleCrossrefGoogle Scholar

  • Bishop, D. V. M. (2019). The psychology of experimental psychologists: Overcoming cognitive constraints to improve research, https://doi.org/10.31234/osf.io/hnbex First citation in articleGoogle Scholar

  • Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., … Pfeiffer, T. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2, 637–644. https://doi.org/10.1038/s41562-018-0399-z First citation in articleCrossrefGoogle Scholar

  • Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2, 115–144. https://doi.org/10.1177/2515245919847196 First citation in articleCrossrefGoogle Scholar

  • Center for Open Science. (n.d.). Registered Reports: Peer review before results are known to align scientific values and practices. Retrieved from https://cos.io/rr/ First citation in articleGoogle Scholar

  • Crüwell, S., van Doorn, J., Etz, A., Makel, M. C., Moshontz, H., Niebaum, J. C., … Schulte-Mecklenbeck, M. (2019). Seven easy steps to open science: An annotated reading list. Zeitschrift für Psychologie, 227, 237–248. https://doi.org/10.1027/2151-2604/a000387 First citation in articleAbstractGoogle Scholar

  • Erdfelder, E., & Heck, D.W. (2019). Detecting evidential value and p-hacking with the p-curve tool: A word of caution. Zeitschrift für Psychologie, 227, 249–260. https://doi.org/10.1027/2151-2604/a000383 First citation in articleAbstractGoogle Scholar

  • Etz, A., & Vandekerckhove, J. (2016). A Bayesian perspective on the reproducibility project: Psychology. PLoS One, 11, e0149794. https://doi.org/10.1371/journal.pone.0149794 First citation in articleCrossrefGoogle Scholar

  • Fanelli, D. (2010). Do pressures to publish increase scientists’ bias? An empirical support from US States Data. PLoS One, 5, e10271. https://doi.org/10.1371/journal.pone.0010271 First citation in articleCrossrefGoogle Scholar

  • Fanelli, D., Costas, R., & Larivière, V. (2015). Misconduct policies, academic culture and career stage, not gender or pressures to publish, affect scientific integrity. PLoS One, 10, e0127556. https://doi.org/10.1371/journal.pone.0127556 First citation in articleCrossrefGoogle Scholar

  • Foster, E. D., & Deardorff, A. (2017). Open Science Framework (OSF). Journal of the Medical Library Association, 105, 203–206. https://doi.org/10.5195/jmla.2017.88 First citation in articleGoogle Scholar

  • Fraley, R. C., & Vazire, S. (2014). The N-pact factor: Evaluating the quality of empirical journals with respect to sample size and statistical power. PLoS One, 9, e109019. https://doi.org/10.1371/journal.pone.0109019 First citation in articleCrossrefGoogle Scholar

  • Gigerenzer, G. (2018). Statistical rituals: The replication delusion and how we got there. Advances in Methods and Practices in Psychological Science, 1, 198–218. https://doi.org/10.1177/2515245918771329 First citation in articleCrossrefGoogle Scholar

  • Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31, 337–350. https://doi.org/10.1007/s10654-016-0149-3 First citation in articleCrossrefGoogle Scholar

  • John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524–532. https://doi.org/10.1177/0956797611430953 First citation in articleCrossrefGoogle Scholar

  • Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B. Jr., Alper, S., … Bahník, Š. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1, 443–490. https://doi.org/10.1177/2515245918810225 First citation in articleCrossrefGoogle Scholar

  • Kossmeier, M., Vilsmeier, J., Dittrich, R., Fritz, T., Kolmanz, C., Plessen, C. Y., … Voracek, M. (2019). Long-term trends (1980–2017) in the N-pact factor of journals in personality psychology and individual differences research. Zeitschrift für Psychologie, 227, 293–302. https://doi.org/10.1027/2151-2604/a000384 First citation in articleAbstractGoogle Scholar

  • LeBel, E. P., McCarthy, R. J., Earp, B. D., Elson, M., & Vanpaemel, W. (2018). A unified framework to quantify the credibility of scientific findings. Advances in Methods and Practices in Psychological Science, 1, 389–402. https://doi.org/10.1177/2515245918787489 First citation in articleCrossrefGoogle Scholar

  • Levelt, W. J. M., Drenth, P. J. D., & Noort, E. (2012). Flawed science: The fraudulent research practices of social psychologist Diederik Stapel. Retrieved from http://hdl.handle.net/11858/00-001M-0000-0010-2590-A First citation in articleGoogle Scholar

  • Martinson, B. C., Anderson, M. S., & De Vries, R. (2005). Scientists behaving badly. Nature, 435, 737–738. https://doi.org/10.1038/435737a First citation in articleCrossrefGoogle Scholar

  • Nuijten, M. B., Van Assen, M. A. L. M., Hartgerink, C. H. J., Epskamp, S., & Wicherts, J. (2017). The validity of the tool “statcheck” in discovering statistical reporting inconsistencies. Retrieved from https://osf.io/3cvkn/#! First citation in articleGoogle Scholar

  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. https://doi.org/10.1126/science.aac4716 First citation in articleCrossrefGoogle Scholar

  • Patil, P., Peng, R. D., & Leek, J. T. (2016). What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspectives on Psychological Science, 11, 539–544. https://doi.org/10.1177/1745691616646366 First citation in articleCrossrefGoogle Scholar

  • Renkewitz, F., & Keiner, M. (2019). How to detect publication bias in psychological research: A comparative evaluation of six statistical methods. Zeitschrift für Psychologie, 227, 261–279. https://doi.org/10.1027/2151-2604/a000386 First citation in articleLinkGoogle Scholar

  • Ritchie, S. J., Wiseman, R., & French, C. C. (2012). Failing the future: Three unsuccessful attempts to replicate Bem’s “Retroactive Facilitation of Recall” Effect. PLoS One, 7, e33423. https://doi.org/10.1371/journal.pone.0033423 First citation in articleCrossrefGoogle Scholar

  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology. Psychological Science, 22, 1359–1366. https://doi.org/10.1177/0956797611417632 First citation in articleCrossrefGoogle Scholar

  • Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26, 559–569. https://doi.org/10.1177/0956797614567341 First citation in articleCrossrefGoogle Scholar

  • Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143, 534–547. https://doi.org/10.1037/a0033242 First citation in articleCrossrefGoogle Scholar

  • Spellman, B. A., Gilbert, E. A., & Corker, K. S. (2018). Open science. In J. T. WixtedEd., Stevens’ handbook of experimental psychology and cognitive neuroscience (4th ed., Vol. 5, pp. 729–775). Hoboken, NJ: Wiley. First citation in articleGoogle Scholar

  • Steiner, P. M., Wong, V. C., & Anglin, K. (2019). A causal replication framework for designing and assessing replication efforts. Zeitschrift für Psychologie, 227, 280–292. https://doi.org/10.1027/2151-2604/a000385 First citation in articleAbstractGoogle Scholar

  • Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. American Statistician, 49, 108–112. https://doi.org/10.2307/2684823 First citation in articleGoogle Scholar

  • Van Aert, R. C., & Van Assen, M. A. (2018). Examining reproducibility in psychology: A hybrid method for combining a statistically significant original study and a replication. Behavior Research Methods, 50, 1515–1539. https://doi.org/10.3758/s13428-017-0967-6 First citation in articleCrossrefGoogle Scholar

  • Van Assen, M. A., Van Aert, R., & Wicherts, J. M. (2015). Meta-analysis using effect size distributions of only statistically significant studies. Psychological Methods, 20, 293–309. https://doi.org/10.1037/met0000025 First citation in articleCrossrefGoogle Scholar

  • Vogel, G. (2011). Psychologist accused of fraud on “astonishing scale”. Science, 334, 579. https://doi.org/10.1126/science.334.6056.579 First citation in articleCrossrefGoogle Scholar

  • Wagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data: The case of psi: Comment on Bem (2011). Journal of Personality and Social Psychology, 100, 426–432. https://doi.org/10.1037/a0022790 First citation in articleCrossrefGoogle Scholar

  • Weichselgartner, E., & Ramthun, R. (2019, February). PsychArchives: A trustworthy repository for psychology. Poster presented at the Research Data Alliance (RDA) Germany conference 2019, Potsdam, Germany First citation in articleGoogle Scholar

Frank Renkewitz, Department of Psychology, University of Erfurt, Nordhäuser Straße 63, 99089 Erfurt, Germany,