Inflation von falsch-positiven Befunden in der psychologischen Forschung
Mögliche Ursachen und Gegenmaßnahmen
Abstract
Zusammenfassung. In letzter Zeit mehren sich Hinweise darauf, dass gehäuft falsch-positive Befunde in wissenschaftlichen Publikationen berichtet werden und so die Forschungsliteratur ein verzerrtes Bild der Realität widerspiegelt. Das Fachkollegium Psychologie der Deutschen Forschungsgemeinschaft hat dieses Problem aufgegriffen und die möglichen Ursachen von falsch-positiv Befunden diskutiert. Dieser Artikel gibt den Inhalt dieser Diskussion wieder und möchte Antragssteller auffordern, diese Problematik bei Forschungsanträgen stärker zu beachten. Auch appellieren wir an Antragsteller, Gutachter und Herausgeber, den Stellenwert von negativen Befunden sowie von Replikationen bei Forschungsanträgen und wissenschaftlichen Arbeiten einschließlich klinischer Studien stärker zu berücksichtigen.
Abstract. There is growing concern that many studies reported in scientific publications report false-positive results, which means that the research literature reflects a distorted view of reality. The Psychology Review Board of the German Research Foundation has addressed this problem and discussed the possible sources of false-positive results. This article conveys the contents of these discussions and challenges applicants to consider this problem in funding proposals. We also call upon applicants, reviewers, and editors to take into consideration the value of negative results and replication studies in both funding proposals and scientific publications, including clinical studies.
Literatur
2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27, 108 – 119.
(2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7, 543 – 554.
(2012). Raise standards for preclinical cancer research. Nature, 483, 531 – 533.
(2014). The Replication Recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217 – 224.
(2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews. Neuroscience, 14, 365 – 376.
(2014). Increasing value and reducing waste: Addressing inaccessible research. The Lancet, 383, 257 – 266.
(1992). A Power Primer. Psychological Bulletin, 112, 155 – 159.
(2014). An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science, 1, 1 – 15. doi: 10.1098/rsos.140216
(1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629 – 634.
(2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS ONE, 4 (5), e5738.
(2010a). Do pressures to publish increase scientists’ bias? An empirical support from US states data. PLoS ONE, 5 (4), e10271.
(2010b). “Positive“ results increase down the hierarchy of the sciences. PLoS ONE, 5 (4), e10068.
(2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175 – 191.
(2011). Voodoo correlations are everywhere–not only in neuroscience. Perspectives on Psychological Science, 6, 163 – 171.
(2012). The long way from α-error control to validity proper problems with a short-sighted false-positive debate. Perspectives on Psychological Science, 7, 661 – 669.
(2015). Questionable research practices revisited. Social Psychology and Personality Sciences, 1 – 8.
(2015). Best research practices in psychology : Illustrating epistemological and pragmatic considerations with the case of relationship science. Journal of Personality and Social Psychology, 108, 275 – 297.
(2012a). Publication bias and the failure of replication in experimental psychology. Psychonomic Bulletin & Review, 19, 975 – 991.
(2012b). Too good to be true: Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin & Review, 19, 151 – 156.
(2014). Excess success for psychology articles in the journal Science. PLoS ONE, 9, e114255.
(2014). Reducing waste from incomplete or unusable reports of biomedical research. The Lancet, 383, 267 – 276.
(2013). Result-blind peer reviews and editorial decisions: A missing pillar of scientific culture. European Psychologist, 18, 286 – 294.
(2005). Why most published research findings are false. PLoS Medicine, 2, e124.
(2014). Increasing value and reducing waste in research design, conduct, and analysis. The Lancet, 383, 166 – 175.
(2007). An exploratory test for an excess of significant findings. Clinical Trials (London, England), 4, 245 – 253.
(2010). A model-averaging approach to replication: the case of p rep. Psychological Methods, 15, 172 – 181.
(2015, 3./4. März). Öffentlicher Zugang zu allen Daten klinischer Studien in Deutschland: Wunsch und Wirklichkeit. Beitrag vorgestellt auf dem DFG Workshop “Qualitätskriterien patientenorientierter Forschung“, Bonn.
(2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524 – 32.
(2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences of the United States of America, 110, 19313 – 19317.
(1998). HARKing: Hypotheszing after the results are known. Personality and Social Psychology Review, 2, 196 – 217.
(2005). An alternative to null-hypothesis significance tests. Psychological Science, 16, 345 – 353.
(2014). Sailing from the seas of chaos into the corridor of stability: Practical recommendations to increase the informational value of studies. Perspectives on Psychological Science, 9, 278 – 292.
(1978). Estimating effect size: Bias resulting from the significance criterion in editorial decisions. British Journal of Mathematical and Statistical Psychology, 31 (2), 107 – 112.
(2006). Evidence based medicine: The case of the misleading funnel plot. British Medical Journal, 333, 597 – 600.
(2015). P values are just the tip of the iceberg. Nature, 520, 612.
(2014). Biomedical research: Increasing value, reducing waste. The Lancet, 383, 101 – 104.
(2012). Replications in psychology research: How often do they really occur? Perspectives on Psychological Science, 7, 537 – 542.
(2010). Killeen’s (2005) Prep coefficient: Logical and mathematical problems. Psychological Methods, 15, 182 – 191.
(2015). Zur Lage der Psychologie. Psychologische Rundschau, 66, 1 – 30.
(2009). What is the probability of replicating a statistically significant effect? Psychonomic Bulletin & Review, 16, 617 – 640. doi: 10.3758/PBR.16.4.617
(2011). Aggregate and individual replication probability within an explicit model of the research process. Psychological Methods, 16, 337 – 360.
(2015). A new strategy for testing structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 5511 (April), 1 – 7.
(1993). Reviewer bias against replication research. Journal of Social Behavior and Personality, 8, 21 – 29.
(2015). Promoting an open research culture. Science, 348 (6242), 1422 – 1425.
(2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7, 531 – 536.
(1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638 – 641.
(2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225 – 237.
(2012). The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods, 17, 551 – 566.
(2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359 – 1366.
(2013). It really just does not follow, comments on Francis (2013). Journal of Mathematical Psychology, 57 (5), 174 – 176.
(2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143, 534 – 547.
(1964). The imporantce of negative results in psychological research. The Canadian Psychologist, 5, 225 – 232.
(1959). Publication decisions and their possible effects on inferences drawn from tests of significance–or vice versa. Journal of the American Statistical Association, 54, 30 – 34.
(1995). Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. The American Statistician, 49, 108 – 112.
(2015). Editorial. Basic and Applied Social Psychology, 37, 1 – 2.
(2015). p-hacking by post-hoc selection with multiple opportunities: Detectability by skewness test? – Comment on Simonsohn, Nelson, and Simmons (2014). Journal of Experimental Psychology: General, 144, 1137 – 1145.
(2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4, 274 – 290.
(2011). Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). Journal of Personality and Social Psychology, 100, 426 – 432.
(2015, 3./4. März). Klinische Forschung in Deutschland: Problemfelder aus der Sicht der DFG. Beitrag vorgestellt auf dem DFG Workshop „Qualitätskriterien patientenorientierter Forschung“, Bonn.
(