Effect Sizes and F Ratios < 1.0
Abstract
Abstract. Standard statistics texts indicate that the expected value of the F ratio is 1.0 (more precisely: N/(N-2)) in a completely balanced fixed-effects ANOVA, when the null hypothesis is true. Even though some authors suggest that the null hypothesis is rarely true in practice (e.g., Meehl, 1990), F ratios < 1.0 are reported quite frequently in the literature. However, standard effect size statistics (e.g., Cohen's f) often yield positive values when F < 1.0, which appears to create confusion about the meaningfulness of effect size statistics when the null hypothesis may be true. Given the repeated emphasis on reporting effect sizes, it is shown that in the face of F < 1.0 it is misleading to only report sample effect size estimates as often recommended. Causes of F ratios < 1.0 are reviewed, illustrated by a short simulation study. The calculation and interpretation of corrected and uncorrected effect size statistics under these conditions are discussed. Computing adjusted measures of association strength and incorporating effect size confidence intervals are helpful in an effort to reduce confusion surrounding results when sample sizes are small. Detailed recommendations are directed to authors, journal editors, and reviewers.
References
2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author
(Balluerka N. , Gómez J. , Hidalgo D. (2005). The controversy over null hypothesis significance testing revisited. Methodology, 1, 55– 70Carroll R.M. , Nordholm L.A. (1975). Sampling characteristics of Kelley's ε2 and Hays' ω2 . Educational and Psychological Measurement, 35, 541– 554Cohen J. (1962). The statistical power of abnormal-social psychological research. Journal of Abnormal and Social Psychology, 65, 145– 153Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: ErlbaumCohen J. (1992). A power primer. Psychological Bulletin, 112(1), 155– 159Cohen J. (1994). The earth is round (p < .05). American Psychologist, 49, 997– 1003Cohen J. , Cohen P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: ErlbaumCumming G. , Finch S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532– 574Edgington E.S. (1974). A new tabulation of statistical procedures used in APA journals. American Psychologist, 29, 25– 26Elmore P.B. , Woehlke P.L. (1998, April). Twenty years of research methods employed in American Educational Research Journal, Educational Researcher, and Review of Educational Research . Paper presented at the Annual meeting of the American Educational Research Association, San Diego, CA (ERIC Document Reproduction Service No. 420 701)Ezekiel M. (1930). Methods of correlational analysis . New York: WileyFinch S. , Cumming G. , Thomason N. (2001). Reporting of statistical evidence in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement, 61, 281– 210Fleischman A.I. (1980). Confidence intervals for correlation ratios. Educational and Psychological Measurement, 40, 659– 670Gigerenzer G. (2000). Adaptive thinking: Rationality in the real world . Oxford: Oxford University PressGoodwin L.D. , Goodwin W.L. (1985). An analysis of statistical techniques used in the Journal of Educational Psychology. Educational Psychologist, 20, 13– 21Harlow L.L. , Mulaik S.A. , Steiger J.H. Eds. (1997). What if there were no significance tests? . Mahwah, NJ: ErlbaumHays W.L. (1994). Statistics (5th ed.). Orlando: Harcourt BraceHerzberg P.A. (1969). The parameters of cross-validation. Psychometrika Monograph Supplement, 16, 1– 67Huberty C.J. , Mourad S.A. (1980). Estimation in multiple correlation/prediction. Educational and Psychological Measurement, 40, 101– 112Hunter J.E. , Schmidt F.L. (2004). Methods of meta-analysis: Correcting error and bias in research findings (2nd ed.). Thousand Oaks, CA: SageHyde J.S. (2001). Reporting effect sizes: The roles of editors, textbook authors, and publication manuals. Educational and Psychological Measurement, 61, 225– 228Kelley T.L. (1935). An unbiased correlation ratio measure. Proceedings of the National Academy of Sciences, 21, 554– 559Kellow T.J. (1998). Beyond statistical significant tests: The importance of using other estimates of treatment effects to interpret evaluation results. The American Journal of Evaluation, 19(1), 123– 134Keselman H.J. , Huberty C.J. , Lix L.M. , Olejnik S. , Cribbie R.A. , Donahue B. , et al. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350– 386Kieffer K.M. , Reese R.J. , Thompson B. (2001). Statistical techniques employed in AERJ and JCP articles from 1988 to 1997: A methodological review. Journal of Experimental Education, 69, 280– 309Kirk R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746– 759Kline R.B. (2004). Beyond significance testing: Performing data analysis methods in behavioral research . Washington, DC: American Psychological AssociationKromrey J.D. , Hines C.V. (1996). Estimating the coefficient of cross-validity in multiple regression: A comparison of analytical and empirical methods. Journal of Experimental Education, 64, 240– 266Larson S.C. (1931). The shrinkage of the coefficient of multiple correlation. Journal of Educational Psychology, 22, 45– 55Lord F.M. (1950). Efficiency of prediction when a regression equation from one sample is used in a new sample . (Research Bulletin 50-110). Princeton, NJ: Educational Testing ServiceMaxwell S.E. (2004). The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. Psychological Methods, 9, 147– 163Maxwell S.E. , Camp C.J. , Avery R.D. (1981). Measures of strength of association: A comparative examination. Journal of Applied Psychology, 66, 525– 534Maxwell S.E. , Delaney H.D. (1990). Designing experiments and analyzing data: A model comparison perspective . Belmont, CA: WadsworthMcLean J.E. , Ernest J.M. (1998). The role of statistical significance testing in educational research. Research in the Schools, 5, 15– 22Meehl P.E. (1990). Appraising and amending theories: The strategy of Lakatosian defence and two principles that warrant it. Psychological Inquiry, 1(2), 108– 141Montgomery D.B. , Morrison D.C.A. (1973). A note on adjusting R2 . Journal of Finance, 28, 1009– 1013Moore D.S. , McCabe G.P. (1999). Introduction to the practice of statistics (3rd ed.). New York: FreemanO'Grady K.E. (1982). Measures of explained variance: Cautions and limitations. Psychological Bulletin, 92, 766– 777Olejnik S. , Algina J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25, 241– 286Pierce C.A. , Block R.A. , Aguinis H. (2004). Cautionary note on reporting η2 values from multifactor ANOVA designs. Educational and Psychological Measurement, 64, 916– 924Popper K.R. (1989). Logik der Forschung . [The logic of scientific discovery] (9th ed.). Tübingen, Germany: Mohr2004). R: A language and environment for statistical computing . Vienna: R Foundation for Statistical Computing
(Roberts J.K. , Henson R.K. (2002). Correction for bias in estimating effect sizes. Educational and Psychological Measurement, 62, 241– 253Rosenthal R. , Rosnow R.L. , Rubin D.B. (2000). Contrasts and effect sizes in behavioral research: A correlational approach . New York: Cambridge University PressRosnow R.L. , Rosenthal R. (1996). Computing contrasts, effect sizes, and counternulls on other people's published data: General procedures for research consumers. Psychological Methods, 1, 331– 340Schmidt F.L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 1, 115– 129Sedlmeier P. , Gigerenzer G. (1989). Do studies of statistical power have an effect on the power of studies. Psychological Bulletin, 105, 309– 316Smithson M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement, 61, 605– 632Smithson M. (2002). Confidence intervals . Thousand Oaks, CA: SageSnyder P.A. , Lawson S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education, 61, 334– 349Snyder P.A. , Thompson B. (1998). Use of tests of statistical significance and other analytic choices in a school psychology journal: Review of practices and suggested alternatives. School Psychology Quarterly, 13, 335– 348Steiger J.H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164– 182Steiger J.H. , Fouladi R.T. (1997). Noncentrality interval estimation and the evaluation of statistical models. In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 221-257). Mahwah, NJ: ErlbaumThompson B. (1999a). Journal editorial policies regarding statistical significance tests: Heat is to fire as p is to importance. Educational Psychology Review, 11, 157– 169Thompson B. (1999b). Why “encouraging” effect size reporting is not working: The etiology of researcher resistance to changing practices. The Journal of Psychology, 133, 133– 140Thompson B. (2002a). “Statistical,” “practical,” and “clinical”: How many kinds of significance do counselors need to consider?. Journal of Counseling and Development, 80(1), 64– 71Thompson B. (2002b). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 24– 31Thompson B. , Snyder P.A. (1997). Statistical significance testing practices in the Journal of Experimental Education. Journal of Experimental Education, 66, 75– 83Vacha-Haase T. , Nilsson J.E. (1998). Statistical significance reporting: Current trends and usages within MECD. Measurement and Evaluation in Counseling and Development, 31, 46– 57Vacha-Haase T. , Nilsson J.E. , Reetz D.R. , Lance T.S. , Thompson B. (2000). Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory and Psychology, 10, 413– 425Vacha-Haase T. , Thompson B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology, 51, 473– 481Wherry R.J. (1931). A new formula for predicting the shrinkage of the coefficient of multiple correlation. Annals of Mathematical Statistics, 2, 440– 451Wilkinson L. APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594– 604Yin P. , Fan X. (2001). Estimating R2 shrinkage in multiple regression: A comparison of different analytical methods. Journal of Experimental Education, 69, 203– 224