Normality and Sample Size Do Not Matter for the Selection of an Appropriate Statistical Test for Two-Group Comparisons
Abstract
Abstract. Many applied researchers are taught to use the t-test when distributions appear normal and/or sample sizes are large and non-parametric tests otherwise, and fear inflated error rates if the “wrong” test is used. In a simulation study (four tests: t-test, Mann-Whitney test, Robust t-test, Permutation test; seven sample sizes between 2 × 10 and 2 × 500; four distributions: normal, uniform, log-normal, bimodal; under the null and alternate hypotheses), we show that type 1 errors are well controlled in all conditions. The t-test is most powerful under the normal and the uniform distributions, the Mann-Whitney test under the lognormal distribution, and the robust t-test under the bimodal distribution. Importantly, even the t-test was more powerful under asymmetric distributions than under the normal distribution for the same effect size. It appears that normality and sample size do not matter for the selection of a test to compare two groups of same size and variance. The researcher can opt for the test that fits the scientific hypothesis the best, without fear of poor test performance.
References
1983). Statistical guidelines for contributors to medical journals. British Medical Journal, 286, 1489–1493.
(1980). On the relative power of the U and t tests. British Journal of Mathematical and Statistical Psychology, 33, 114–120.
(2000). An introduction to medical statistics (3rd ed.). Oxford, UK: Oxford University Press.
(1999). Increasing physician’s awareness of the impact of statistics on research outcomes:Comparative power of the t-test and Wilcoxon rank-sum test in small samples applied research. Journal of Clinical Epidemiology, 52, 229–235.
(1992). A power primer. Psychological Bulletin, 112, 155–159.
(1990). Basic and clinical biostatistics. Norwalk, CT: Appleton & Lange.
(2004). Permutation methods: A basis for exact inference. Statistical Science, 19, 676–685.
(2012). T-tests, non-parametric tests, and large studies – A paradox of statistical practice? BMC Medical Research Methodology, 12, 78. http://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-12-78
(2009). Performance of five two-sample location tests for skewed distributions with unequal variances. Contemporary Clinical Trials, 30, 490–496.
(2010). Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics Surveys, 4, 1–39.
(1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1743, 29–36.
(2001). Mann-Whitney test is not just a test of medians: Differences in spread can be important. British Medical Journal, 323, 391–393.
(2009). Robust methods in biostatistics. Chichester, UK: Wiley.
(1956). The efficiency of some nonparametric competitors of the t-test. Annals of Mathematical Statistics, 27, 324–335.
(1963). Estimation of location based on ranks. Annals of Mathematical Statistics, 34, 598–611.
(2014). Formulating appropriate statistical hypotheses for treatment comparisons in clinical trial design and analysis. Contemporary Clinical Trials, 39, 294–302.
(2002). The importance of the normality assumption in large public health data sets. Annual Review of Public Health, 23, 151–169.
(1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50–60.
(2006). Robust statistics. Chichester, UK: Wiley.
(1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166.
(2014). Author center, statistical methods, new manuscripts. Retrieved from http://www.nejm.org/page/author-center/manuscript-submission
. (2003). Review of the use of statistics in infection and immunity. Infection and Immunity, 71, 6689–6692.
(2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org/
. (2012). Parametric v non-parametric statistical tests. British Medical Journal, 344, e1753. http://www.bmj.com/content/344/bmj.e1753
(2001). Should we always choose a nonparametric test when comparing two apparently nononormal distributions? Journal of Clinical Epidemiology, 54, 86–92.
(2007). The use of statistics in medical research. American Statistician, 61, 47–55.
(2000). How should cost data in pragmatic randomised trials be analysed? British Medical Journal, 320, 1197–1199.
(1974). Just a moment! Water Resources Research, 10, 211–219.
(2013). Robust: Robust Library. R package version 0.4–15. Retrieved from http://CRAN.R-project.org/package=robust
(1998). How many discoveries have been lost by ignoring modern statistical methods? The American Psychologist, 53, 300–314.
(1992). Parametric alternatives to the Student t test under violations of normality and homogeneity of variance. Perception and Motor Skills, 74, 835–844.
(