Zur Methodologie von Replikationsstudien
Zusammenfassung. Replikationsstudien sind in den empirischen Wissenschaften mit unterschiedlichen Zielen verbunden, abhängig davon, ob wir uns im Kontext der Theorieentwicklung oder im Kontext der Theorieüberprüfung bewegen (Entdeckungs- vs. Begründungszusammenhang sensu Reichenbach, 1938). Konzeptuelle Replikationsstudien zielen auf Generalisierung ab und können im Entdeckungszusammenhang nützlich sein. Direkte Replikationsstudien zielen demgegenüber auf den Nachweis der Replizierbarkeit eines bestimmten Forschungsergebnisses unter unabhängigen Bedingungen ab und sind im Begründungszusammenhang unverzichtbar. Ohne die Annahme der direkten Replizierbarkeit wird man sich kaum auf allgemein akzeptierte empirische Tatbestände einigen können, die eine notwendige Voraussetzung für Theorieüberprüfungen in den empirischen Wissenschaften sind. Vor diesem Hintergrund werden Standards für Replikationsstudien vorgeschlagen und begründet. Eine Besonderheit in der Psychologie besteht darin, dass das Replikandum in aller Regel eine statistische Hypothese ist, über die lediglich probabilistisch entschieden werden kann. Dies wirft Folgeprobleme in Bezug auf die Formulierung der Replizierbarkeitshypothese, die Kontrolle statistischer Fehlerwahrscheinlichkeiten bei der Entscheidung über die Replizierbarkeitshypothese, die Bestimmung der zu entdeckenden Effektgröße bei Verzerrung vorliegender Ergebnisse durch Publication Bias, die Festlegung des Stichprobenumfangs und die korrekte Interpretation der Replikationsquote auf, für die Lösungsvorschläge unterbreitet und diskutiert werden.
Abstract. Replication studies are associated with different goals in the empirical sciences, depending on whether research aims at developing new theories or at testing existing theories (context of discovery vs. context of justification, cf. Reichenbach, 1938). Conceptual replications strive for generalization and can be useful in the context of discovery. Direct replications, by contrast, target the replicability of a specific empirical research result under independent conditions and are thus indispensable in the context of justification. Without assuming replicability, it is impossible to reach a consensus about generally accepted empirical facts. However, such accepted facts are mandatory for testing theories in the empirical sciences. On the basis of this framework, we suggest and motivate standards for replication studies. A characteristic feature of psychological science is the probabilistic nature of the to-be-replicated empirical claim, which typically takes the form of a statistical hypothesis. This raises a number of methodological problems concerning the nature of the replicability hypothesis, the control of error probabilities in statistical decisions about the replicability hypothesis, the determination of the to-be-detected effect size given distortions of published effect sizes by publication bias, the a priori determination of sample sizes for replication studies, and the correct interpretation of the replication rate (i. e., the success rate in a series of replication studies). We propose and discuss solutions for all these problems.
1991). Traktat über kritische Vernunft (5. Aufl.). Stuttgart: UTB.
(2017). Sample-size planning for more accurate statistical power: A method for adjusting sample effect sizes for publication bias and uncertainty. Psychological Science. Advance online publication, https://doi.org/10.1177/0956797617723724
(1969). Repeated significance tests on accumulating data. Journal of the Royal Statistical Society: Series A (General), 132, 235 – 244. https://doi.org/10.2307/2343787
(2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27, 108 – 119. https://doi.org/10.1002/per.1919
(1996). Automaticity of social behavior: Direct effects of trait construct and stereotype-activation on action. Journal of Personality and Social Psychology, 71, 230 – 244. https://doi.org/10.1037/0022-3514.71.2.230
(2009). An introduction to optimal designs for social and biomedical research. New York, NY: Wiley. https://doi.org/10.1002/9780470746912
(2009). Introduction to meta-analysis. Chichester, UK: Wiley & Sons.
(2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217 – 224. https://doi.org/10.1016/j.jesp.2013.10.005
(1969). Über die Anwendung von Signifikanztests bei theorie-testenden Experimenten. Psychologische Beiträge, 11, 275 – 285.
(1972). Der Signifikanztest in der psychologischen Forschung. Frankfurt: Akademische Verlagsgesellschaft.
(1980). Theorie und Planung psychologischer Experimente. Darmstadt: Steinkopff. https://doi.org/10.1007/978-3-642-85315-9
(2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365 – 376. https://doi.org/10.1038/nrn3502
(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
(2016). Inhibitory control underlies individual differences in older adults’ hindsight bias. Psychology and Aging, 31, 224 – 238. https://doi.org/10.1037/pag0000088
(2016). On the scientific superiority of conceptual replications for scientific progress. Journal of Experimental Social Psychology, 66, 93 – 99.
(1959). On the prediction of occurrence of particular verbal intrusions in immediate free recall. Journal of Experimental Psychology, 58, 17 – 22. https://doi.org/10.1037/h0046671
(2012). Behavioral priming: It’s all in the mind but who’s mind? PLoS ONE, 7, e29081. https://doi.org/10.137/jpournal.pone.0029081
(2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences, 113, 7900 – 7905. https://doi.org/10.1073/pnas.1602413113
(1984). Zur Bedeutung und Kontrolle des beta-Fehlers bei der inferenzstatistischen Prüfung log-linearer Modelle. Zeitschrift für Sozialpsychologie, 15, 18 – 32.
(Erzeugung und Verwendung empirischer Daten . In T. HerrmannW. H. TackHrsg., Methodologische Grundlagen der Psychologie (Enzyklopädie der Psychologie, Serie Forschungsmethoden der Psychologie, Bd. 1, S. 47 – 97). Göttingen: Hogrefe.2004).
(Über Allgemeine und Differentielle Psychologie . In A. KämmererJ. FunkeHrsg., Seelenlandschaften. Streifzüge durch die Psychologie. 98 persönliche Positionen (S. 78 – 79). Göttingen: Vandenhoeck & Ruprecht.1994).
(Hypothesenprüfung . In T. HerrmannW. H. TackHrsg., Methodologische Grundlagen der Psychologie (Enzyklopädie der Psychologie, Serie Forschungsmethoden der Psychologie, Bd. 1, S. 604 – 648). Göttingen: Hogrefe.1976). Wider den Methodenzwang – Skizze einer anarchistischen Erkenntnistheorie. Frankfurt: Suhrkamp.
(2011). Voodoo correlations are everywhere—not only in neuroscience. Perspectives on Psychological Science, 6, 163 – 171. https://doi.org/10.1177/1745691611400237
(2012). The long way from α-error control to validity proper: Problems with a short-sighted false-positive debate. Perspectives on Psychological Science, 7, 661 – 669. https://doi.org/10.1177/17456916/2462587
(2012a). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19, 975 – 991. https://doi.org/10.3758/s13423-012-0322-y
(2012b). Too good to be true: Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin & Review, 19, 151 – 156. https://doi.org/10.3758/s13423-012-0227-9
(2014). The frequency of excess success for articles in Psychological Science. Psychonomic Bulletin & Review, 21, 1180 – 1187. https://doi.org/10.3758/s13423-014-0601-x
(2017). Thou shalt not bear false witness witness against null hypothesis signficance testing. Educational and Psychological Measurement, 77, 631 – 662, https://doi.org/10.1177/0013164416668232.
(2013). Result-blind peer-reviews and editorial decisions: A missing pillar of scientific culture. European Psychologist, 18, 286 – 294, https://doi.org/10.1027/1016-9040/a000144
(2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 61 – 135. https://doi.org/10.1017/S0140525X0999152X
(2013). An introduction to optimal design: Some basic issues using examples from dyscalculia research. Zeitschrift für Psychologie, 221, 124 – 144. https://doi.org/10.1027/2151-2604/a000142
(2017). Rejoice! In Replication. PsyArXiv. September 18. psyarxiv.com/cmdq8. Unpublished manuscript
(2011). From perceptual rags to metaphoric riches: Bodily, social, and cultural constraints on sociocognitive metaphors. Psychological Bulletin, 137, 355 – 361. https://doi.org/10.1037/a0022373
(2005). Why most published research findings are false. PLoS Medicine, 2, 686 – 701, https://doi.org/10.1371/journal.pmed.0020124
(2015). Replication in Psychological Science, Psychological Science, 26, 1827 – 1832. https://doi.org/10.1177/0956797615616374
(2014). You cannot step into the same river twice: When power analyses are optimistic. Perspectives on Psychological Science, 9, 612 – 625. https://doi.org/10.1177/1745691614548513
(2011). Aggregate and individual replication probability within an explicit model of the research process. Psychological Methods, 16, 337 – 360. https://doi.org/10.1037/a0023347
(2016). Optimizing research payoff. Perspectives on Psychological Science, 11, 664 – 691. https://doi.org/10.1177/1745691616649170
(1983). In defense of external invalidity. American Psychologist, 38, 379 – 387. https://doi.org/10.1037/14805-005
(2016). A new strategy for testing structural equation models. Structural Equation Modeling, 23, 54 – 60. https://doi.org/10.1080/10705511.2014.950896
(2014). Registered reports. Social Psychology, 45, 137 – 141. https://doi.org/10.1027/1864-9335/a000192
(2003). Generalized eta and omega squared statistics: Measures of effect size for some common research designs. Psychological Methods, 8, 434 – 447. https://doi.org/10.1037/1082-989X.8.4.434
(2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science. Perspectives on Psychological Science, 7, 657 – 660. https://doi.org/10.1177/1745691612462588
(2015). Estimating the reproducibility of Psychological Science. Science, 349. https://doi.org/10.1126/science.aac4716
(2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7, 531 – 536. https://doi.org/10.1177/1745691612463401
(2014). Safeguard power as a protection against imprecise power estimates. Perspectives on Psychological Science, 9, 319 – 332. https://doi.org/10.1177/1745691614528519
(1951). Main trends in recent philosophy: Two dogmas of empiricism. The Philosophical Review, 60, 20 – 43. https://doi.org/10.2307/2181906
(Why many priming results don’t (and won’t) replicate: A quantitative analysis. Retrieved from www.sfs.uni-tuebingen.de/~mramscar/papers/Ramscar-Shaoul-Baayen_replication.pdf
().1938). Experience and prediction. An analysis of the foundations and the structure of knowledge. Chicago, IL: University of Chicago Press. https://doi.org/10.1086/218075
(1995). Creating false memories – Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, & Cognition, 21, 803 – 814. https://doi.org/10.1037/0278-7393.21.4.803
(1979). The “file drawer problem” and tolerance for null results. Psychological Bulletin, 86, 638 – 641. https://doi.org/10.1037/0033-2909.86.3.638
(1963). The effect of experimenter bias on the Performance of the albino rat. Behavioral Science, 8, 183 – 189. https://doi.org/10.1002/bs.3830080302
(2012). The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods, 17, 551 – 566. https://doi.org/10.1037/a0029487
(2016). Evaluating psychological research requires more than attention to the N: A comment on Simonsohn’s (2015) „Small Telescopes“. Psychological Science, 27, 1407 – 1409. https://doi.org/10.1177/0956797616653102
(2011). False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359 – 1366. https://doi.org/10.1177/0956797611417632
(2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143, 534 – 547. https://doi.org/10.1037/a0033242
(1961). A comparison of individual and group decisions involving risk. Unpublished Master’s Thesis. Massachusetts Institute of Technology. Retrieved from http://dspace.mit.edi/handle/1721.1/11330
(2016). Reflection on the smiling registered replication report. Perspectives on Psychological Science, 11, 929 – 930. https://doi.org/10.1177/1745691616674460
(2014). The alleged crisis and the illusion of exact replication. Perspectives in Psychological Science, 9, 59 – 71. https://doi.org/10.1177/1745691613514450
(2017). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15 (3), 1 – 18. https://doi.org/10.1371/journal.pbio.2000797
(2016). Inflation von falsch-positiven Befunden in der psychologischen Forschung: Mögliche Ursachen und Gegenmaßnahmen. Psychologische Rundschau, 67, 163 – 174. https://doi.org/10.1026/0033-3042/a000296
(2015). P-hacking by post hoc selection with multiple opportunities: Detectability by skewness test? Comment on Simonsohn, Nelson, and Simmons (2014). Journal of Experimental Psychology: General, 144, 1137 – 1145. https://doi.org/10.1037/xge0000086
(2017). Some properties of p-curves, with an application to gradual publication bias. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000125
(2018). Effect size estimation from t-statistics in the presence of publication bias: A brief review of existing approaches with some extensions. Zeitschrift für Psychologie, 226 (1). https://doi.org/10.1027/2151-2604/a000319
(2009). Puzzling high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4, 274 – 290. https://doi.org/10.1111/j.1745-6924.2009.01125.x
(2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 632 – 638, https://doi.org/10.1177/1745691612463078
(2015). Replicating studies in which samples of participants respond to samples of stimuli. Perspectives on Psychological Science, 10, 390 – 399. https://doi.org/10.1177/1745691614564879
(2017). Methoden psychologischer Forschung und Evaluation. Grundlagen, Gütekriterien und Anwendungen. Stuttgart: Kohlhammer.
(1986). Error probabilities in educational and psychological research. Journal of Educational Statistics, 11, 117 – 146. https://doi.org/10.2307/1164973
(1840). The philosophy of the inductive sciences, founded upon their history, (2nd. ed.). London, UK: John W. Parker.