Skip to main content
Published Online:https://doi.org/10.1026/0033-3042/a000388

Zusammenfassung. Es gibt einen breiten Konsens, dass Replikation ein wichtiges Instrument ist, um valide Befunde und solide Forschung zu erkennen. Wenn sie aber wissenschaftlich bedeutsam ist, dann muss auch die Replikationsforschung an strengen methodischen Regeln und an klar artikulierten wissenschaftlichen Zielen gemessen werden. Eine kritische Beschäftigung mit der aktuellen Replikationsforschung – etwa im jüngst veröffentlichten Bericht der Open Science Collaboration – zeigt jedoch, dass eine strenge und forschungslogisch begründete Methodologie für Replikationsstudien bislang weder angewandt noch entwickelt wurde. Infolgedessen bleibt die Validität der Schlüsse, die aus Replikationsstudien gezogen werden dürfen, oftmals unklar. Dieses grundlegende Problem wird hier unter vier Gesichtspunkten diskutiert: Unklarheit des Gegenstandes der Replikation (Replicandum), Vernachlässigung einschlägiger methodischer Probleme (Regressivität; Reliabilität der Veränderungsmessung), einseitige Vermeidung von angeblich kostenträchtigen „Falsch-Positiven“ ohne Versuch einer systematischen Kosten-Nutzen-Messung sowie das vernachlässigte Ziel, Replikationsforschung so zu implementieren, dass sie echte Erkenntnisfortschritte bringt und als exzellente Forschung anerkannt werden kann.


Where Are the Scientific Standards for High-Quality Replication Studies?

Abstract. There is wide consensus that replication affords an important instrument for identifying valid findings and solid research approaches. However, if replication research serves a major scientific function, then it must be evaluated in terms of strict methodological rules and clearly articulated scientific criteria. A critical analysis of contemporary replication projects – such as the recently published report by the Open Science Collaboration – reveals, however, that no logically sound methodology for state-of-the art replication research has been developed and applied so far. As a consequence, the validity of inferences drawn from many replication studies remains equivocal. Four aspects of this fundamental problem are discussed: uncertainty about the objective of replication (replicandum); neglect of specific methodological problems (regressiveness; reliability; change measurement); one-sided focus on the avoidance of allegedly expensive “false-positives” in the absence of any serious attempt to run a cost–benefit analysis; and the sorely neglected goal of implementing excellent replication research that leads to new insights and genuine scientific progress.

Literatur

  • Albarracín, D., Durantini, M. R. & Earl, A. (2006). Empirical and theoretical conclusions of an analysis of outcomes of HIV-prevention interventions. Current Directions in Psychological Science, 15 (2), 73 – 78. https://doi.org/10.1111/j.0963-7214.2006.00410.x First citation in articleGoogle Scholar

  • Baltes, P. B., Nesselroade, J. R., Schaie, K. W. & Labouvie, E. W. (1972). On the dilemma of regression effects in examining ability-level-related differentials in ontogenetic patterns of intelligence. Developmental Psychology, 6 (1), 78 – 84. https://doi.org/10.1037/h0032329 First citation in articleCrossrefGoogle Scholar

  • Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J. & Giner-Sorolla, R. (2014). The Replication Recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217 – 224. https://doi.org/10.1016/j.jesp.2013.10.005 First citation in articleCrossrefGoogle Scholar

  • Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54 (4), 297 – 312. https://doi.org/10.1037/h0040950 First citation in articleCrossrefGoogle Scholar

  • Campbell, D. T. & Kenny, D. A. (1999). A primer on regression artifacts. New York, NY, US: Guilford Press. First citation in articleGoogle Scholar

  • Crandall, C. S. & Sherman, J. W. (2016). On the scientific superiority of conceptual replications for scientific progress. Journal of Experimental Social Psychology, 66, 93 – 99. https://doi.org/10.1016/j.jesp.2015.10.002 First citation in articleCrossrefGoogle Scholar

  • Cronbach, L. J. & Furby, L. (1970). How we should measure „change“: Or should we? Psychological Bulletin, 74 (1), 68. First citation in articleCrossrefGoogle Scholar

  • Denrell, J. (2005). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, 112 (4), 951 – 978. First citation in articleCrossrefGoogle Scholar

  • Denrell, J. & Le Mens, G. (2012). Social judgments from adaptive samples. In J. I. KruegerJ. I. KruegerEds., Social judgment and decision making (pp. 151 – 169). New York, NY, US: Psychology Press. First citation in articleGoogle Scholar

  • Earp, B. D. & Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology, 6. First citation in articleCrossrefGoogle Scholar

  • Erdfelder, E. & Bošnjak, M. (2016). Hotspots in psychology: A new format for special issues of the Zeitschrift für Psychologie. Zeitschrift für Psychologie, 224 (3), 141 – 144. First citation in articleLinkGoogle Scholar

  • Erev, I., Wallsten, T. S. & Budescu, D. V. (1994). Simultaneous over-and underconfidence: The role of error in judgment processes. Psychological Review, 101, 519 – 527. https://doi.org/10.1037/0033-295X.101.3.519 First citation in articleCrossrefGoogle Scholar

  • Feld, G. B. & Born, J. (2015). Exploiting sleep to modify bad attitudes. Science, 348, 971 – 972. https://doi.org/10.1126/science.aab4048 First citation in articleCrossrefGoogle Scholar

  • Fiedler, K. (2011). Voodoo correlations are everywhere—Not only in neuroscience. Perspectives on Psychological Science, 6, 163 – 171. https://doi.org/10.1177/1745691611400237 First citation in articleCrossrefGoogle Scholar

  • Fiedler, K. (2016). Reproducibility and the regression trap: A note on a questionable piece of meta-science. Manuscript submitted for publication. First citation in articleCrossrefGoogle Scholar

  • Fiedler, K. (2017). What constitutes strong psychological science? The (neglected) role of diagnosticity and a prior theorizing. Perspectives on Pychological Science, 12 (1), 46 – 61. First citation in articleCrossrefGoogle Scholar

  • Fiedler, K. & Prager, J. (in press). The regression trap and other pitfalls of replication science – Illustrated by the report of the Open Science Collaboration. Basic and Applies Social Psychology. First citation in articleGoogle Scholar

  • Fiedler, K. & Schott, M. (2017). False negatives. In S. O. Lilienfeld & I. D. Waldman (Eds.), Psychological Science under Scrutiny: Recent Challenges and Proposed Remedies (pp. 53 – 72). Hoboken, NJ: Wiley-Blackwell First citation in articleGoogle Scholar

  • Galton, F. (1877). Typical laws of heredity. Nature, 15, 492 – 495, 512 – 514, 532 – 533. First citation in articleCrossrefGoogle Scholar

  • Gilbert, D. T., King, G., Pettigrew, S. & Wilson, T. D. (2016). Comment on “Estimating the reproducibility of psychological science”. Science, 351, 1037 – 1037. First citation in articleCrossrefGoogle Scholar

  • Gil-Gómez de Liaño, B., Stablum, F. & Umiltà, C. (2016). Can concurrent memory load reduce distraction? A replication study and beyond. Journal of Experimental Psychology: General, 145 (1), e1 – e12. https://doi.org/10.1037/xge0000131 First citation in articleCrossrefGoogle Scholar

  • Gollwitzer, M., Christ, O. & Lemmer, G. (2014). Individual differences make a difference: On the use and the psychometric properties of difference scores in social psychology. European Journal of Social Psychology, 44 (7), 673 – 682. https://doi.org/10.1002/ejsp.2042 First citation in articleCrossrefGoogle Scholar

  • Gollwitzer, P. M., Wicklund, R. A. & Hilton, J. L. (1982). Admission of failure and symbolic self-completion: Extending Lewinian theory. Journal of Personality And Social Psychology, 43, 358 – 371. https://doi.org/10.1037/0022-3514.43.2.358 First citation in articleCrossrefGoogle Scholar

  • Greenberg, J., Solomon, S. & Pyszczynski, T. (1997). Terror management theory of self-esteem and cultural worldviews: Empirical assessments and conceptual refinements. In M. P. ZannaEd., Advances in experimental social psychology (Vol. 29, pp. 61 – 139). San Diego, CA: Academic Press. First citation in articleGoogle Scholar

  • Harris, C. W. (Ed.). (1963). Problems in measuring change. Madison, WI: University of Wisconsin Press. First citation in articleGoogle Scholar

  • Hüffmeier, J., Mazei, J. & Schultze, T. (2016). Reconceptualizing replication as a sequence of different studies: A replication typology. Journal of Experimental Social Psychology. https://doi.org/10.1016/j.jesp.2015.09.009 First citation in articleCrossrefGoogle Scholar

  • Ioannidis, J. P. (2005). Why most published research findings are false. Chance, 18 (4), 40 – 47. First citation in articleCrossrefGoogle Scholar

  • Kassin, S. M. (2008). False confessions: Causes, consequences, and implications for reform. Current Directions in Psychological Science, 17, 249 – 253. https://doi.org/10.1111/j.1467-8721.2008.00584.x First citation in articleCrossrefGoogle Scholar

  • Klauer, K. C., Becker, M. & Spruyt, A. (2016). Evaluative priming in the pronunciation task: A preregistered replication and extension. Experimental Psychology, 63 (1), 70 – 78. https://doi.org/10.1027/1618-3169/a000286 First citation in articleLinkGoogle Scholar

  • Labouvie, E. W. (1982). The concept of change and regression toward the mean. Psychological Bulletin, 92, 251 – 257. First citation in articleCrossrefGoogle Scholar

  • Lilienfeld, S. O. & Waldman, I. D. (2017). Psychological Science under Scrutiny: Recent Challenges and Proposed Remedies. Hoboken, NJ: Wiley-Blackwell First citation in articleCrossrefGoogle Scholar

  • McCarthy, R. J. (2014). Close replication attempts of the heat priming-hostile perception effect. Journal of Experimental Social Psychology, 54, 165 – 169. https://doi.org/10.1016/j.jesp.2014.04.014 First citation in articleCrossrefGoogle Scholar

  • McNemar, Q. (1958). On growth measurement. Educational and Psychological Measurement, 18, 47 – 55. First citation in articleCrossrefGoogle Scholar

  • Meehl, P. E. (1990). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1 (2), 108 – 141. https://doi.org/10.1207/s15327965pli0102_1 First citation in articleCrossrefGoogle Scholar

  • Monin, B. (2016). Concerns about Taylor Holubar & Mike Frank’s 2012 attempt to replicate Monin, Sawyer, & Marquez (2008) included in the 2015 reproducibility project science article. Unpublished comment, Stanford, UK: Stanford University First citation in articleCrossrefGoogle Scholar

  • Nesselroade, J. R., Stigler, S. M. & Baltes, P. B. (1980). Regression toward the mean and the study of change. Psychological Bulletin, 88, 622 – 637. https://doi.org/10.1037/0033-2909.88.3.622 First citation in articleCrossrefGoogle Scholar

  • Norton, M. I., Frost, J. H. & Ariely, D. (2007). Less is more: The lure of ambiguity, or why familiarity breeds contempt. Journal of Personality and Social Psychology, 92 (1), 97 – 105. https://doi.org/10.1037/0022-3514.92.1.97 First citation in articleCrossrefGoogle Scholar

  • Olson, R. K., Davidson, B. J., Kliegl, R. & Davies, S. E. (1984). Development of phonetic memory in disabled and normal readers. Journal of Experimental Child Psychology, 37, 187 – 206. https://doi.org/10.1016/0022-0965(84)90066-3 First citation in articleCrossrefGoogle Scholar

  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349 (6251), aac4716. First citation in articleCrossrefGoogle Scholar

  • Osborn, A. F. (1957). Applied imagination (rev. ed.). New York: Scribner. First citation in articleGoogle Scholar

  • Overall, J. E. & Woodward, J. A. (1975). Unreliability of difference scores: A paradox for measurement of change. Psychological Bulletin, 82 (1), 85 – 86. https://doi.org/10.1037/h0076158 First citation in articleCrossrefGoogle Scholar

  • Pennebaker, J. W. (1989). Confession, inhibition, and disease. In L. BerkowitzL. BerkowitzEds., Advances in experimental social psychology, (Vol. 22, pp. 211 – 244). San Diego, CA: Academic Press. https://doi.org/10.1016/S0065-2601(08)60309-3 First citation in articleCrossrefGoogle Scholar

  • Pleskac, T. J. & Hertwig, R. (2014). Ecologically rational choice and the structure of the environment. Journal of Experimental Psychology: General, 143, 2000 – 2019. First citation in articleCrossrefGoogle Scholar

  • Reichenbach, H. (1952/1938). Experience and prediction. Chicago, Ill.: University of Chicago Press. First citation in articleGoogle Scholar

  • Rogosa, D. R. & Willett, J. B. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20, 335 – 343. https://doi.org/10.1111/j.1745-3984.1983.tb00211.x First citation in articleCrossrefGoogle Scholar

  • Rosenthal, R. (1987). Judgment studies: Design, analysis, and meta-analysis. Cambridge, UK: Cambridge University Press. First citation in articleCrossrefGoogle Scholar

  • Rulon, P. J. (1941). Problems of regression. Harvard Educational Review, 11, 213 – 223. First citation in articleGoogle Scholar

  • Schaller, M. & Park, J. H. (2011). The behavioral immune system (and why it matters). Current Directions in Psychological Science, 20 (2), 99 – 103. https://doi.org/10.1177/0963721411402596 First citation in articleCrossrefGoogle Scholar

  • Schmidt, F. (2010). Detecting and correcting the lies that data tell. Perspectives on Psychological Science, 5, 233 – 242. https://doi.org/10.1177/1745691610369339 First citation in articleCrossrefGoogle Scholar

  • Simmons, J. P., Nelson, L. D. & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359 – 1366. https://doi.org/10.1177/0956797611417632 First citation in articleCrossrefGoogle Scholar

  • Simonsohn, U. (2014). Small Telescopes: Detectability and the Evaluation of Replication Results (SSRN Scholarly Paper No. ID 2259879). Rochester, NY: Social Science Research Network. Retrieved from http://papers.ssrn.com/abstract=2259879 First citation in articleGoogle Scholar

  • Stanley, D. J. & Spence, J. R. (2014). Expectations for replications: Are yours realistic? Perspectives on Psychological Science, 9, 305 – 318. https://doi.org/10.1177/1745691614528518 First citation in articleCrossrefGoogle Scholar

  • Stelzl, I. (1982). Fehler und Fallen der Statistik. Bern: Huber. First citation in articleGoogle Scholar

  • Stroebe, W. (2016). Are most published social psychological findings false? Journal of Experimental Social Psychology, 66, 134 – 144. https://doi.org/10.1016/j.jesp.2015.09.017 First citation in articleCrossrefGoogle Scholar

  • Swets, J. A., Dawes, R. M. & Monahan, J. (2000). Psychological science can improve diagnostic decisions. Psychological Science in The Public Interest, 1 (1), 1 – 26. https://doi.org/10.1111/1529-1006.001 First citation in articleCrossrefGoogle Scholar

  • Verhagen, J. & Wagenmakers, E. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General, 143 (4), 1457 – 1475. https://doi.org/10.1037/a0036731 First citation in articleCrossrefGoogle Scholar

  • Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. The Quarterly Journal of Experimental Psychology, 12, 129 – 140. https://doi.org/10.1080/17470216008416717 First citation in articleCrossrefGoogle Scholar

  • Wells, G. L. & Windschitl, P. D. (1999). Stimulus sampling and social psychological experimentation. Personality and Social Psychology Bulletin, 25, 1115 – 1125. https://doi.org/10.1177/01461672992512005 First citation in articleCrossrefGoogle Scholar

  • Westfall, J., Kenny, D. A. & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143, 2020 – 2045. https://doi.org/10.1037/xge0000014 First citation in articleCrossrefGoogle Scholar

  • Wicklund, R. A. & Braun, O. L. (1987). Incompetence and the concern with human categories. Journal of Personality and Social Psychology, 53, 373 – 382. https://doi.org/10.1037/0022-3514.53.2.373 First citation in articleCrossrefGoogle Scholar

  • Yarkoni, T. (2009). Big correlations in little studies: Inflated fMRI correlations reflect low statistical power—Commentary on Vul et al. (2009). Perspectives on Psychological Science, 4, 294 – 298. First citation in articleCrossrefGoogle Scholar