Skip to main content
Originalarbeit

Zur Methodologie von Replikationsstudien

Published Online:https://doi.org/10.1026/0033-3042/a000387

Zusammenfassung. Replikationsstudien sind in den empirischen Wissenschaften mit unterschiedlichen Zielen verbunden, abhängig davon, ob wir uns im Kontext der Theorieentwicklung oder im Kontext der Theorieüberprüfung bewegen (Entdeckungs- vs. Begründungszusammenhang sensu Reichenbach, 1938). Konzeptuelle Replikationsstudien zielen auf Generalisierung ab und können im Entdeckungszusammenhang nützlich sein. Direkte Replikationsstudien zielen demgegenüber auf den Nachweis der Replizierbarkeit eines bestimmten Forschungsergebnisses unter unabhängigen Bedingungen ab und sind im Begründungszusammenhang unverzichtbar. Ohne die Annahme der direkten Replizierbarkeit wird man sich kaum auf allgemein akzeptierte empirische Tatbestände einigen können, die eine notwendige Voraussetzung für Theorieüberprüfungen in den empirischen Wissenschaften sind. Vor diesem Hintergrund werden Standards für Replikationsstudien vorgeschlagen und begründet. Eine Besonderheit in der Psychologie besteht darin, dass das Replikandum in aller Regel eine statistische Hypothese ist, über die lediglich probabilistisch entschieden werden kann. Dies wirft Folgeprobleme in Bezug auf die Formulierung der Replizierbarkeitshypothese, die Kontrolle statistischer Fehlerwahrscheinlichkeiten bei der Entscheidung über die Replizierbarkeitshypothese, die Bestimmung der zu entdeckenden Effektgröße bei Verzerrung vorliegender Ergebnisse durch Publication Bias, die Festlegung des Stichprobenumfangs und die korrekte Interpretation der Replikationsquote auf, für die Lösungsvorschläge unterbreitet und diskutiert werden.


On the Methodology of Replication Studies

Abstract. Replication studies are associated with different goals in the empirical sciences, depending on whether research aims at developing new theories or at testing existing theories (context of discovery vs. context of justification, cf. Reichenbach, 1938). Conceptual replications strive for generalization and can be useful in the context of discovery. Direct replications, by contrast, target the replicability of a specific empirical research result under independent conditions and are thus indispensable in the context of justification. Without assuming replicability, it is impossible to reach a consensus about generally accepted empirical facts. However, such accepted facts are mandatory for testing theories in the empirical sciences. On the basis of this framework, we suggest and motivate standards for replication studies. A characteristic feature of psychological science is the probabilistic nature of the to-be-replicated empirical claim, which typically takes the form of a statistical hypothesis. This raises a number of methodological problems concerning the nature of the replicability hypothesis, the control of error probabilities in statistical decisions about the replicability hypothesis, the determination of the to-be-detected effect size given distortions of published effect sizes by publication bias, the a priori determination of sample sizes for replication studies, and the correct interpretation of the replication rate (i. e., the success rate in a series of replication studies). We propose and discuss solutions for all these problems.

Literatur

  • Albert, H. (1991). Traktat über kritische Vernunft (5. Aufl.). Stuttgart: UTB. First citation in articleGoogle Scholar

  • Anderson, S. F., Kelley, K. & Maxwell, S. E. (2017). Sample-size planning for more accurate statistical power: A method for adjusting sample effect sizes for publication bias and uncertainty. Psychological Science. Advance online publication, https://doi.org/10.1177/0956797617723724 First citation in articleCrossrefGoogle Scholar

  • Armitage, P., McPherson, C. K. & Rowe, B. C. (1969). Repeated significance tests on accumulating data. Journal of the Royal Statistical Society: Series A (General), 132, 235 – 244. https://doi.org/10.2307/2343787 First citation in articleCrossrefGoogle Scholar

  • Asendorpf, J. B., Conner, M., De Fruyt, F. D., De Houwer, J. D., Denissen, J. J. A. & Fiedler, K. (2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27, 108 – 119. https://doi.org/10.1002/per.1919 First citation in articleCrossrefGoogle Scholar

  • Bargh, J. A., Chen, M. & Burrows, L. (1996). Automaticity of social behavior: Direct effects of trait construct and stereotype-activation on action. Journal of Personality and Social Psychology, 71, 230 – 244. https://doi.org/10.1037/0022-3514.71.2.230 First citation in articleCrossrefGoogle Scholar

  • Berger, M. P. F. & Wong, W. K. (2009). An introduction to optimal designs for social and biomedical research. New York, NY: Wiley. https://doi.org/10.1002/9780470746912 First citation in articleCrossrefGoogle Scholar

  • Borenstein, M., Hedges, L. V., Higgins, J. P. T. & Rothstein, H. R. (2009). Introduction to meta-analysis. Chichester, UK: Wiley & Sons. First citation in articleCrossrefGoogle Scholar

  • Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J. & Giner-Sorolla, R. (2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217 – 224. https://doi.org/10.1016/j.jesp.2013.10.005 First citation in articleCrossrefGoogle Scholar

  • Bredenkamp, J. (1969). Über die Anwendung von Signifikanztests bei theorie-testenden Experimenten. Psychologische Beiträge, 11, 275 – 285. First citation in articleGoogle Scholar

  • Bredenkamp, J. (1972). Der Signifikanztest in der psychologischen Forschung. Frankfurt: Akademische Verlagsgesellschaft. First citation in articleGoogle Scholar

  • Bredenkamp, J. (1980). Theorie und Planung psychologischer Experimente. Darmstadt: Steinkopff. https://doi.org/10.1007/978-3-642-85315-9 First citation in articleCrossrefGoogle Scholar

  • Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J. & Robinson, E. S. J. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365 – 376. https://doi.org/10.1038/nrn3502 First citation in articleCrossrefGoogle Scholar

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. First citation in articleGoogle Scholar

  • Coolin, A., Erdfelder, E., Bernstein, D. M., Thornton, A. & Thornton, W. (2016). Inhibitory control underlies individual differences in older adults’ hindsight bias. Psychology and Aging, 31, 224 – 238. https://doi.org/10.1037/pag0000088 First citation in articleCrossrefGoogle Scholar

  • Crandall, C. & Sherman, J. W. (2016). On the scientific superiority of conceptual replications for scientific progress. Journal of Experimental Social Psychology, 66, 93 – 99. First citation in articleCrossrefGoogle Scholar

  • Deese, J. (1959). On the prediction of occurrence of particular verbal intrusions in immediate free recall. Journal of Experimental Psychology, 58, 17 – 22. https://doi.org/10.1037/h0046671 First citation in articleCrossrefGoogle Scholar

  • Doyen, S., Klein, O., Pichon, C.-L. & Cleeremans, A. (2012). Behavioral priming: It’s all in the mind but who’s mind? PLoS ONE, 7, e29081. https://doi.org/10.137/jpournal.pone.0029081 First citation in articleCrossrefGoogle Scholar

  • Eklund, A., Nichoils, T. E. & Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences, 113, 7900 – 7905. https://doi.org/10.1073/pnas.1602413113 First citation in articleCrossrefGoogle Scholar

  • Erdfelder, E. (1984). Zur Bedeutung und Kontrolle des beta-Fehlers bei der inferenzstatistischen Prüfung log-linearer Modelle. Zeitschrift für Sozialpsychologie, 15, 18 – 32. First citation in articleGoogle Scholar

  • Erdfelder, E. (1994). Erzeugung und Verwendung empirischer Daten. In T. HerrmannW. H. TackHrsg., Methodologische Grundlagen der Psychologie (Enzyklopädie der Psychologie, Serie Forschungsmethoden der Psychologie, Bd. 1, S. 47 – 97). Göttingen: Hogrefe. First citation in articleGoogle Scholar

  • Erdfelder, E. (2004). Über Allgemeine und Differentielle Psychologie. In A. KämmererJ. FunkeHrsg., Seelenlandschaften. Streifzüge durch die Psychologie. 98 persönliche Positionen (S. 78 – 79). Göttingen: Vandenhoeck & Ruprecht. First citation in articleGoogle Scholar

  • Erdfelder, E. & Bredenkamp, J. (1994). Hypothesenprüfung. In T. HerrmannW. H. TackHrsg., Methodologische Grundlagen der Psychologie (Enzyklopädie der Psychologie, Serie Forschungsmethoden der Psychologie, Bd. 1, S. 604 – 648). Göttingen: Hogrefe. First citation in articleGoogle Scholar

  • Feyerabend, P. (1976). Wider den Methodenzwang – Skizze einer anarchistischen Erkenntnistheorie. Frankfurt: Suhrkamp. First citation in articleGoogle Scholar

  • Fiedler, K. (2011). Voodoo correlations are everywhere—not only in neuroscience. Perspectives on Psychological Science, 6, 163 – 171. https://doi.org/10.1177/1745691611400237 First citation in articleCrossrefGoogle Scholar

  • Fiedler, K., Kutzner, F. & Krueger, J. (2012). The long way from α-error control to validity proper: Problems with a short-sighted false-positive debate. Perspectives on Psychological Science, 7, 661 – 669. https://doi.org/10.1177/17456916/2462587 First citation in articleCrossrefGoogle Scholar

  • Francis, G. (2012a). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19, 975 – 991. https://doi.org/10.3758/s13423-012-0322-y First citation in articleCrossrefGoogle Scholar

  • Francis, G. (2012b). Too good to be true: Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin & Review, 19, 151 – 156. https://doi.org/10.3758/s13423-012-0227-9 First citation in articleCrossrefGoogle Scholar

  • Francis, G. (2014). The frequency of excess success for articles in Psychological Science. Psychonomic Bulletin & Review, 21, 1180 – 1187. https://doi.org/10.3758/s13423-014-0601-x First citation in articleCrossrefGoogle Scholar

  • Garcia-Pérez, M. A. (2017). Thou shalt not bear false witness witness against null hypothesis signficance testing. Educational and Psychological Measurement, 77, 631 – 662, https://doi.org/10.1177/0013164416668232. First citation in articleCrossrefGoogle Scholar

  • Greve, W., Bröder, A. & Erdfelder, E. (2013). Result-blind peer-reviews and editorial decisions: A missing pillar of scientific culture. European Psychologist, 18, 286 – 294, https://doi.org/10.1027/1016-9040/a000144 First citation in articleLinkGoogle Scholar

  • Henrich, J., Heine, S. J. & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 61 – 135. https://doi.org/10.1017/S0140525X0999152X First citation in articleCrossrefGoogle Scholar

  • Holling, H. & Schwabe, R. (2013). An introduction to optimal design: Some basic issues using examples from dyscalculia research. Zeitschrift für Psychologie, 221, 124 – 144. https://doi.org/10.1027/2151-2604/a000142 First citation in articleLinkGoogle Scholar

  • IJzerman, H., Brandt, M. & Wolferen, J. van (2017). Rejoice! In Replication. PsyArXiv. September 18. psyarxiv.com/cmdq8. Unpublished manuscript First citation in articleCrossrefGoogle Scholar

  • IJzerman, H. & Koole, S. L. (2011). From perceptual rags to metaphoric riches: Bodily, social, and cultural constraints on sociocognitive metaphors. Psychological Bulletin, 137, 355 – 361. https://doi.org/10.1037/a0022373 First citation in articleCrossrefGoogle Scholar

  • Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2, 686 – 701, https://doi.org/10.1371/journal.pmed.0020124 First citation in articleCrossrefGoogle Scholar

  • Lindsay, D. S. (2015). Replication in Psychological Science, Psychological Science, 26, 1827 – 1832. https://doi.org/10.1177/0956797615616374 First citation in articleCrossrefGoogle Scholar

  • McShane, B. B. & Böckenholt, U. (2014). You cannot step into the same river twice: When power analyses are optimistic. Perspectives on Psychological Science, 9, 612 – 625. https://doi.org/10.1177/1745691614548513 First citation in articleCrossrefGoogle Scholar

  • Miller, J. & Schwarz, W. (2011). Aggregate and individual replication probability within an explicit model of the research process. Psychological Methods, 16, 337 – 360. https://doi.org/10.1037/a0023347 First citation in articleCrossrefGoogle Scholar

  • Miller, J. & Ulrich, R. (2016). Optimizing research payoff. Perspectives on Psychological Science, 11, 664 – 691. https://doi.org/10.1177/1745691616649170 First citation in articleCrossrefGoogle Scholar

  • Mook, D. (1983). In defense of external invalidity. American Psychologist, 38, 379 – 387. https://doi.org/10.1037/14805-005 First citation in articleCrossrefGoogle Scholar

  • Moshagen, M. & Erdfelder, E. (2016). A new strategy for testing structural equation models. Structural Equation Modeling, 23, 54 – 60. https://doi.org/10.1080/10705511.2014.950896 First citation in articleCrossrefGoogle Scholar

  • Nosek, B. A. & Lakens, D. (2014). Registered reports. Social Psychology, 45, 137 – 141. https://doi.org/10.1027/1864-9335/a000192 First citation in articleLinkGoogle Scholar

  • Olejnik, S. & Algina, J. (2003). Generalized eta and omega squared statistics: Measures of effect size for some common research designs. Psychological Methods, 8, 434 – 447. https://doi.org/10.1037/1082-989X.8.4.434 First citation in articleCrossrefGoogle Scholar

  • Open Science Collaboration. (2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science. Perspectives on Psychological Science, 7, 657 – 660. https://doi.org/10.1177/1745691612462588 First citation in articleCrossrefGoogle Scholar

  • Open Science Collaboration. (2015). Estimating the reproducibility of Psychological Science. Science, 349. https://doi.org/10.1126/science.aac4716 First citation in articleCrossrefGoogle Scholar

  • Pashler, H. & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7, 531 – 536. https://doi.org/10.1177/1745691612463401 First citation in articleCrossrefGoogle Scholar

  • Perugini, M., Gallucci, M. & Costantini, G. (2014). Safeguard power as a protection against imprecise power estimates. Perspectives on Psychological Science, 9, 319 – 332. https://doi.org/10.1177/1745691614528519 First citation in articleCrossrefGoogle Scholar

  • Quine, W. V. (1951). Main trends in recent philosophy: Two dogmas of empiricism. The Philosophical Review, 60, 20 – 43. https://doi.org/10.2307/2181906 First citation in articleCrossrefGoogle Scholar

  • Ramscar, M., Shaoul, C. & Baayen, R. H. (). Why many priming results don’t (and won’t) replicate: A quantitative analysis. Retrieved from www.sfs.uni-tuebingen.de/~mramscar/papers/Ramscar-Shaoul-Baayen_replication.pdf First citation in articleGoogle Scholar

  • Reichenbach, H. (1938). Experience and prediction. An analysis of the foundations and the structure of knowledge. Chicago, IL: University of Chicago Press. https://doi.org/10.1086/218075 First citation in articleCrossrefGoogle Scholar

  • Roediger, H. L. & McDermott, K. B. (1995). Creating false memories – Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, & Cognition, 21, 803 – 814. https://doi.org/10.1037/0278-7393.21.4.803 First citation in articleCrossrefGoogle Scholar

  • Rosenthal, R. (1979). The “file drawer problem” and tolerance for null results. Psychological Bulletin, 86, 638 – 641. https://doi.org/10.1037/0033-2909.86.3.638 First citation in articleCrossrefGoogle Scholar

  • Rosenthal, R. & Fode, K. L. (1963). The effect of experimenter bias on the Performance of the albino rat. Behavioral Science, 8, 183 – 189. https://doi.org/10.1002/bs.3830080302 First citation in articleCrossrefGoogle Scholar

  • Schimmack, U. (2012). The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods, 17, 551 – 566. https://doi.org/10.1037/a0029487 First citation in articleCrossrefGoogle Scholar

  • Schwarz, N. & Clore, G. L. (2016). Evaluating psychological research requires more than attention to the N: A comment on Simonsohn’s (2015) „Small Telescopes“. Psychological Science, 27, 1407 – 1409. https://doi.org/10.1177/0956797616653102 First citation in articleCrossrefGoogle Scholar

  • Simmons, J. P., Nelson, L. D. & Simonsohn, U. (2011). False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359 – 1366. https://doi.org/10.1177/0956797611417632 First citation in articleCrossrefGoogle Scholar

  • Simonsohn, U., Nelson, L. D. & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143, 534 – 547. https://doi.org/10.1037/a0033242 First citation in articleCrossrefGoogle Scholar

  • Stoner, J. A. F. (1961). A comparison of individual and group decisions involving risk. Unpublished Master’s Thesis. Massachusetts Institute of Technology. Retrieved from http://dspace.mit.edi/handle/1721.1/11330 First citation in articleGoogle Scholar

  • Strack, F. (2016). Reflection on the smiling registered replication report. Perspectives on Psychological Science, 11, 929 – 930. https://doi.org/10.1177/1745691616674460 First citation in articleCrossrefGoogle Scholar

  • Stroebe, W. & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives in Psychological Science, 9, 59 – 71. https://doi.org/10.1177/1745691613514450 First citation in articleCrossrefGoogle Scholar

  • Szucs, D. & Ioannidis, J. P. A. (2017). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15 (3), 1 – 18. https://doi.org/10.1371/journal.pbio.2000797 First citation in articleCrossrefGoogle Scholar

  • Ulrich, R., Erdfelder, E., Deutsch, R., Strauß, B., Brüggemann, A. & Hannover, B., et al. (2016). Inflation von falsch-positiven Befunden in der psychologischen Forschung: Mögliche Ursachen und Gegenmaßnahmen. Psychologische Rundschau, 67, 163 – 174. https://doi.org/10.1026/0033-3042/a000296 First citation in articleLinkGoogle Scholar

  • Ulrich, R. & Miller, J. (2015). P-hacking by post hoc selection with multiple opportunities: Detectability by skewness test? Comment on Simonsohn, Nelson, and Simmons (2014). Journal of Experimental Psychology: General, 144, 1137 – 1145. https://doi.org/10.1037/xge0000086 First citation in articleCrossrefGoogle Scholar

  • Ulrich, R. & Miller, J. (2017). Some properties of p-curves, with an application to gradual publication bias. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000125 First citation in articleGoogle Scholar

  • Ulrich, R., Miller, J. & Erdfelder, E. (2018). Effect size estimation from t-statistics in the presence of publication bias: A brief review of existing approaches with some extensions. Zeitschrift für Psychologie, 226 (1). https://doi.org/10.1027/2151-2604/a000319 First citation in articleGoogle Scholar

  • Vul, E., Harris, C., Winkielman, P. & Pashler, H. (2009). Puzzling high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4, 274 – 290. https://doi.org/10.1111/j.1745-6924.2009.01125.x First citation in articleCrossrefGoogle Scholar

  • Wagenmakers, E.-J., Wetzels, R., Borsboom, D., Maas, H. L. J. van der & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632 – 638, https://doi.org/10.1177/1745691612463078 First citation in articleCrossrefGoogle Scholar

  • Westfall, J., Judd, C. M. & Kenny, D. A. (2015). Replicating studies in which samples of participants respond to samples of stimuli. Perspectives on Psychological Science, 10, 390 – 399. https://doi.org/10.1177/1745691614564879 First citation in articleCrossrefGoogle Scholar

  • Westermann, R. (2017). Methoden psychologischer Forschung und Evaluation. Grundlagen, Gütekriterien und Anwendungen. Stuttgart: Kohlhammer. First citation in articleGoogle Scholar

  • Westermann, R. & Hager, W. (1986). Error probabilities in educational and psychological research. Journal of Educational Statistics, 11, 117 – 146. https://doi.org/10.2307/1164973 First citation in articleCrossrefGoogle Scholar

  • Whewell, W. (1840). The philosophy of the inductive sciences, founded upon their history, (2nd. ed.). London, UK: John W. Parker. First citation in articleGoogle Scholar