Why Figures with Error Bars Should Replace p Values
Some Conceptual Arguments and Empirical Demonstrations
Abstract
Null-hypothesis significance testing (NHST) is the primary means by which data are analyzed and conclusions made, particularly in the social sciences, but in other sciences as well (notably ecology and economics). Despite this supremacy however, numerous problems exist with NHST as a means of interpreting and understanding data. These problems have been articulated by various observers over the years, but are being taken seriously by researchers only slowly, if at all, as evidenced by the continuing emphasis on NHST in statistics classes, statistics textbooks, editorial policies and, of course, the day-to-day practices reported in empirical articles themselves (Cumming et al., 2007). Over the past several decades, observers have suggested a simpler approach – plotting the data with appropriate confidence intervals (CIs) around relevant sample statistics – to supplement or take the place of hypothesis testing. This article addresses these issues.
References
1997). On the surprising longevity of flogged horses: Why there is a case for the significance test. Psychological Science, 8, 12–15.
(1995). Absence of evidence is not evidence of absence. British Medical Journal, 311, 485.
(1966). The test of significance in psychological research. Psychological Bulletin, 66, 423–437.
(1978). The case against statistical significance testing. Harvard Educational Review, 48, 378–399.
(1962). The statistical power of abnormal – social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145–153.
(2007). Statistical reform in psychology: Is anything changing? Psychological Science, 18, 230–232.
(2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532–574.
(2005). Inference by eye: Confidence intervals, and how to read pictures of data. American Psychologist, 60, 170–180.
(1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory and Psychology, 5, 75–98.
(2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement, 61, 181–210.
(2006). Impact of criticism of null hypothesis significance testing on statistical reporting practices in conservation biology. Conservation Biology, 20, 1539–1544.
(2005). Toward improved statistical reporting in the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 73, 136–143.
(1996). Effect Sizes and p-values: What should be reported and what should be replicated? Psychophysiology, 33, 175–183.
(2002). Misinterpretations of significance: A problem students share with their teachers? Methods of Psychological Research, 7, 1–20.
(2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55, 19–24.
(2004). Beyond significance testing: Reforming data-analysis methods in behavioral research. Washington DC: American Psychological Association.
(2003). Even statisticians are not immune to misinterpretations of null hypothesis significance tests. International Journal of Psychology, 38, 37–45.
(1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology, 36, 102–105.
(1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 161–171.
(1994) Using confidence intervals in within-subjects designs. Psychonomic Bulletin & Review, 1, 476–490.
(1968). Statistical significance in psychological research. Psychological Bulletin, 70, 131–139.
(1967). Theory testing in psychology and physics: A methodological paradox. Philosophy of Science, 34, 103–115.
(1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, Monograph Supplement, 1–V66.
(2000). Proportions and their differences. In , Statistics with confidence (pp. 39–50). London: BMJ Books.
(1986). Statistical inference: A commentary for the social and behavioural sciences. Chichester: Wiley.
(1963). The interpretation of levels of significance by psychological researchers. Journal of Psychology, 55, 33–38.
(1985). Statistical analysis: Summarizing evidence versus establishing facts. Psychological Bulletin, 97, 527–529.
(1990). Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology, 58, 646–656.
(1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin, 57, 416–428.
(1994). Data analysis methods and cumulative knowledge in Psychology: Implications for the training of researchers. APA (Division 5) Presidential Address.
(1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 115–129.
(1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309–315.
(