Abstract
Zusammenfassung. Es gibt einen breiten Konsens, dass Replikation ein wichtiges Instrument ist, um valide Befunde und solide Forschung zu erkennen. Wenn sie aber wissenschaftlich bedeutsam ist, dann muss auch die Replikationsforschung an strengen methodischen Regeln und an klar artikulierten wissenschaftlichen Zielen gemessen werden. Eine kritische Beschäftigung mit der aktuellen Replikationsforschung – etwa im jüngst veröffentlichten Bericht der Open Science Collaboration – zeigt jedoch, dass eine strenge und forschungslogisch begründete Methodologie für Replikationsstudien bislang weder angewandt noch entwickelt wurde. Infolgedessen bleibt die Validität der Schlüsse, die aus Replikationsstudien gezogen werden dürfen, oftmals unklar. Dieses grundlegende Problem wird hier unter vier Gesichtspunkten diskutiert: Unklarheit des Gegenstandes der Replikation (Replicandum), Vernachlässigung einschlägiger methodischer Probleme (Regressivität; Reliabilität der Veränderungsmessung), einseitige Vermeidung von angeblich kostenträchtigen „Falsch-Positiven“ ohne Versuch einer systematischen Kosten-Nutzen-Messung sowie das vernachlässigte Ziel, Replikationsforschung so zu implementieren, dass sie echte Erkenntnisfortschritte bringt und als exzellente Forschung anerkannt werden kann.
Abstract. There is wide consensus that replication affords an important instrument for identifying valid findings and solid research approaches. However, if replication research serves a major scientific function, then it must be evaluated in terms of strict methodological rules and clearly articulated scientific criteria. A critical analysis of contemporary replication projects – such as the recently published report by the Open Science Collaboration – reveals, however, that no logically sound methodology for state-of-the art replication research has been developed and applied so far. As a consequence, the validity of inferences drawn from many replication studies remains equivocal. Four aspects of this fundamental problem are discussed: uncertainty about the objective of replication (replicandum); neglect of specific methodological problems (regressiveness; reliability; change measurement); one-sided focus on the avoidance of allegedly expensive “false-positives” in the absence of any serious attempt to run a cost–benefit analysis; and the sorely neglected goal of implementing excellent replication research that leads to new insights and genuine scientific progress.
Literatur
2006). Empirical and theoretical conclusions of an analysis of outcomes of HIV-prevention interventions. Current Directions in Psychological Science, 15 (2), 73 – 78. https://doi.org/10.1111/j.0963-7214.2006.00410.x
(1972). On the dilemma of regression effects in examining ability-level-related differentials in ontogenetic patterns of intelligence. Developmental Psychology, 6 (1), 78 – 84. https://doi.org/10.1037/h0032329
(2014). The Replication Recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217 – 224. https://doi.org/10.1016/j.jesp.2013.10.005
(1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54 (4), 297 – 312. https://doi.org/10.1037/h0040950
(1999). A primer on regression artifacts. New York, NY, US: Guilford Press.
(2016). On the scientific superiority of conceptual replications for scientific progress. Journal of Experimental Social Psychology, 66, 93 – 99. https://doi.org/10.1016/j.jesp.2015.10.002
(1970). How we should measure „change“: Or should we? Psychological Bulletin, 74 (1), 68.
(2005). Why most people disapprove of me: Experience sampling in impression formation. Psychological Review, 112 (4), 951 – 978.
(2012).
(Social judgments from adaptive samples . In J. I. KruegerJ. I. KruegerEds., Social judgment and decision making (pp. 151 – 169). New York, NY, US: Psychology Press.2015). Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology, 6.
(2016). Hotspots in psychology: A new format for special issues of the Zeitschrift für Psychologie. Zeitschrift für Psychologie, 224 (3), 141 – 144.
(1994). Simultaneous over-and underconfidence: The role of error in judgment processes. Psychological Review, 101, 519 – 527. https://doi.org/10.1037/0033-295X.101.3.519
(2015). Exploiting sleep to modify bad attitudes. Science, 348, 971 – 972. https://doi.org/10.1126/science.aab4048
(2011). Voodoo correlations are everywhere—Not only in neuroscience. Perspectives on Psychological Science, 6, 163 – 171. https://doi.org/10.1177/1745691611400237
(2016). Reproducibility and the regression trap: A note on a questionable piece of meta-science. Manuscript submitted for publication.
(2017). What constitutes strong psychological science? The (neglected) role of diagnosticity and a prior theorizing. Perspectives on Pychological Science, 12 (1), 46 – 61.
((in press). The regression trap and other pitfalls of replication science – Illustrated by the report of the Open Science Collaboration. Basic and Applies Social Psychology.
2017). False negatives. In S. O. Lilienfeld & I. D. Waldman (Eds.), Psychological Science under Scrutiny: Recent Challenges and Proposed Remedies (pp. 53 – 72). Hoboken, NJ: Wiley-Blackwell
(1877). Typical laws of heredity. Nature, 15, 492 – 495, 512 – 514, 532 – 533.
(2016). Comment on “Estimating the reproducibility of psychological science”. Science, 351, 1037 – 1037.
(2016). Can concurrent memory load reduce distraction? A replication study and beyond. Journal of Experimental Psychology: General, 145 (1), e1 – e12. https://doi.org/10.1037/xge0000131
(2014). Individual differences make a difference: On the use and the psychometric properties of difference scores in social psychology. European Journal of Social Psychology, 44 (7), 673 – 682. https://doi.org/10.1002/ejsp.2042
(1982). Admission of failure and symbolic self-completion: Extending Lewinian theory. Journal of Personality And Social Psychology, 43, 358 – 371. https://doi.org/10.1037/0022-3514.43.2.358
(1997).
(Terror management theory of self-esteem and cultural worldviews: Empirical assessments and conceptual refinements . In M. P. ZannaEd., Advances in experimental social psychology (Vol. 29, pp. 61 – 139). San Diego, CA: Academic Press.1963). Problems in measuring change. Madison, WI: University of Wisconsin Press.
(2016). Reconceptualizing replication as a sequence of different studies: A replication typology. Journal of Experimental Social Psychology. https://doi.org/10.1016/j.jesp.2015.09.009
(2005). Why most published research findings are false. Chance, 18 (4), 40 – 47.
(2008). False confessions: Causes, consequences, and implications for reform. Current Directions in Psychological Science, 17, 249 – 253. https://doi.org/10.1111/j.1467-8721.2008.00584.x
(2016). Evaluative priming in the pronunciation task: A preregistered replication and extension. Experimental Psychology, 63 (1), 70 – 78. https://doi.org/10.1027/1618-3169/a000286
(1982). The concept of change and regression toward the mean. Psychological Bulletin, 92, 251 – 257.
(2017). Psychological Science under Scrutiny: Recent Challenges and Proposed Remedies. Hoboken, NJ: Wiley-Blackwell
(2014). Close replication attempts of the heat priming-hostile perception effect. Journal of Experimental Social Psychology, 54, 165 – 169. https://doi.org/10.1016/j.jesp.2014.04.014
(1958). On growth measurement. Educational and Psychological Measurement, 18, 47 – 55.
(1990). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1 (2), 108 – 141. https://doi.org/10.1207/s15327965pli0102_1
(2016). Concerns about Taylor Holubar & Mike Frank’s 2012 attempt to replicate Monin, Sawyer, & Marquez (2008) included in the 2015 reproducibility project science article. Unpublished comment, Stanford, UK: Stanford University
(1980). Regression toward the mean and the study of change. Psychological Bulletin, 88, 622 – 637. https://doi.org/10.1037/0033-2909.88.3.622
(2007). Less is more: The lure of ambiguity, or why familiarity breeds contempt. Journal of Personality and Social Psychology, 92 (1), 97 – 105. https://doi.org/10.1037/0022-3514.92.1.97
(1984). Development of phonetic memory in disabled and normal readers. Journal of Experimental Child Psychology, 37, 187 – 206. https://doi.org/10.1016/0022-0965(84)90066-3
(2015). Estimating the reproducibility of psychological science. Science, 349 (6251), aac4716.
(1957). Applied imagination (rev. ed.). New York: Scribner.
(1975). Unreliability of difference scores: A paradox for measurement of change. Psychological Bulletin, 82 (1), 85 – 86. https://doi.org/10.1037/h0076158
(1989).
(Confession, inhibition, and disease . In L. BerkowitzL. BerkowitzEds., Advances in experimental social psychology, (Vol. 22, pp. 211 – 244). San Diego, CA: Academic Press. https://doi.org/10.1016/S0065-2601(08)60309-32014). Ecologically rational choice and the structure of the environment. Journal of Experimental Psychology: General, 143, 2000 – 2019.
(1952/1938). Experience and prediction. Chicago, Ill.: University of Chicago Press.
(1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20, 335 – 343. https://doi.org/10.1111/j.1745-3984.1983.tb00211.x
(Rosenthal, R. (1987).
Judgment studies: Design, analysis, and meta-analysis . Cambridge, UK: Cambridge University Press.1941). Problems of regression. Harvard Educational Review, 11, 213 – 223.
(2011). The behavioral immune system (and why it matters). Current Directions in Psychological Science, 20 (2), 99 – 103. https://doi.org/10.1177/0963721411402596
(2010). Detecting and correcting the lies that data tell. Perspectives on Psychological Science, 5, 233 – 242. https://doi.org/10.1177/1745691610369339
(2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359 – 1366. https://doi.org/10.1177/0956797611417632
(2014). Small Telescopes: Detectability and the Evaluation of Replication Results (SSRN Scholarly Paper No. ID 2259879). Rochester, NY: Social Science Research Network. Retrieved from http://papers.ssrn.com/abstract=2259879
(2014). Expectations for replications: Are yours realistic? Perspectives on Psychological Science, 9, 305 – 318. https://doi.org/10.1177/1745691614528518
(1982). Fehler und Fallen der Statistik. Bern: Huber.
(2016). Are most published social psychological findings false? Journal of Experimental Social Psychology, 66, 134 – 144. https://doi.org/10.1016/j.jesp.2015.09.017
(2000). Psychological science can improve diagnostic decisions. Psychological Science in The Public Interest, 1 (1), 1 – 26. https://doi.org/10.1111/1529-1006.001
(2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General, 143 (4), 1457 – 1475. https://doi.org/10.1037/a0036731
(1960). On the failure to eliminate hypotheses in a conceptual task. The Quarterly Journal of Experimental Psychology, 12, 129 – 140. https://doi.org/10.1080/17470216008416717
(1999). Stimulus sampling and social psychological experimentation. Personality and Social Psychology Bulletin, 25, 1115 – 1125. https://doi.org/10.1177/01461672992512005
(2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143, 2020 – 2045. https://doi.org/10.1037/xge0000014
(1987). Incompetence and the concern with human categories. Journal of Personality and Social Psychology, 53, 373 – 382. https://doi.org/10.1037/0022-3514.53.2.373
(2009). Big correlations in little studies: Inflated fMRI correlations reflect low statistical power—Commentary on Vul et al. (2009). Perspectives on Psychological Science, 4, 294 – 298.
(