Original Articles

Effect Sizes and F Ratios < 1.0

Manuel C. Voelkle

University of Mannheim, Germany

Search for more papers by this author

Phillip L. Ackerman

Georgia Institute of Technology, Atlanta, GA, USA

Search for more papers by this author

, and

Werner W. Wittmann

University of Mannheim, Germany

Search for more papers by this author

Published Online:March 09, 2007https://doi.org/10.1027/1614-2241.3.1.35

Abstract

Abstract. Standard statistics texts indicate that the expected value of the F ratio is 1.0 (more precisely: N/(N-2)) in a completely balanced fixed-effects ANOVA, when the null hypothesis is true. Even though some authors suggest that the null hypothesis is rarely true in practice (e.g., Meehl, 1990), F ratios < 1.0 are reported quite frequently in the literature. However, standard effect size statistics (e.g., Cohen's f) often yield positive values when F < 1.0, which appears to create confusion about the meaningfulness of effect size statistics when the null hypothesis may be true. Given the repeated emphasis on reporting effect sizes, it is shown that in the face of F < 1.0 it is misleading to only report sample effect size estimates as often recommended. Causes of F ratios < 1.0 are reviewed, illustrated by a short simulation study. The calculation and interpretation of corrected and uncorrected effect size statistics under these conditions are discussed. Computing adjusted measures of association strength and incorporating effect size confidence intervals are helpful in an effort to reduce confusion surrounding results when sample sizes are small. Detailed recommendations are directed to authors, journal editors, and reviewers.

References

American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author First citation in article Google Scholar
Balluerka N. , Gómez J. , Hidalgo D. (2005). The controversy over null hypothesis significance testing revisited. Methodology, 1, 55– 70 First citation in article Link, Google Scholar
Carroll R.M. , Nordholm L.A. (1975). Sampling characteristics of Kelley's ε² and Hays' ω² . Educational and Psychological Measurement, 35, 541– 554 First citation in article Crossref, Google Scholar
Cohen J. (1962). The statistical power of abnormal-social psychological research. Journal of Abnormal and Social Psychology, 65, 145– 153 First citation in article Crossref, Google Scholar
Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Cohen J. (1992). A power primer. Psychological Bulletin, 112(1), 155– 159 First citation in article Crossref, Google Scholar
Cohen J. (1994). The earth is round (p < .05). American Psychologist, 49, 997– 1003 First citation in article Crossref, Google Scholar
Cohen J. , Cohen P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum First citation in article Google Scholar
Cumming G. , Finch S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532– 574 First citation in article Crossref, Google Scholar
Edgington E.S. (1974). A new tabulation of statistical procedures used in APA journals. American Psychologist, 29, 25– 26 First citation in article Crossref, Google Scholar
Elmore P.B. , Woehlke P.L. (1998, April). Twenty years of research methods employed in American Educational Research Journal, Educational Researcher, and Review of Educational Research . Paper presented at the Annual meeting of the American Educational Research Association, San Diego, CA (ERIC Document Reproduction Service No. 420 701) First citation in article Google Scholar
Ezekiel M. (1930). Methods of correlational analysis . New York: Wiley First citation in article Google Scholar
Finch S. , Cumming G. , Thomason N. (2001). Reporting of statistical evidence in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement, 61, 281– 210 First citation in article Google Scholar
Fleischman A.I. (1980). Confidence intervals for correlation ratios. Educational and Psychological Measurement, 40, 659– 670 First citation in article Crossref, Google Scholar
Gigerenzer G. (2000). Adaptive thinking: Rationality in the real world . Oxford: Oxford University Press First citation in article Google Scholar
Goodwin L.D. , Goodwin W.L. (1985). An analysis of statistical techniques used in the Journal of Educational Psychology. Educational Psychologist, 20, 13– 21 First citation in article Crossref, Google Scholar
Harlow L.L. , Mulaik S.A. , Steiger J.H. Eds. (1997). What if there were no significance tests? . Mahwah, NJ: Erlbaum First citation in article Google Scholar
Hays W.L. (1994). Statistics (5th ed.). Orlando: Harcourt Brace First citation in article Google Scholar
Herzberg P.A. (1969). The parameters of cross-validation. Psychometrika Monograph Supplement, 16, 1– 67 First citation in article Google Scholar
Huberty C.J. , Mourad S.A. (1980). Estimation in multiple correlation/prediction. Educational and Psychological Measurement, 40, 101– 112 First citation in article Crossref, Google Scholar
Hunter J.E. , Schmidt F.L. (2004). Methods of meta-analysis: Correcting error and bias in research findings (2nd ed.). Thousand Oaks, CA: Sage First citation in article Google Scholar
Hyde J.S. (2001). Reporting effect sizes: The roles of editors, textbook authors, and publication manuals. Educational and Psychological Measurement, 61, 225– 228 First citation in article Crossref, Google Scholar
Kelley T.L. (1935). An unbiased correlation ratio measure. Proceedings of the National Academy of Sciences, 21, 554– 559 First citation in article Crossref, Google Scholar
Kellow T.J. (1998). Beyond statistical significant tests: The importance of using other estimates of treatment effects to interpret evaluation results. The American Journal of Evaluation, 19(1), 123– 134 First citation in article Google Scholar
Keselman H.J. , Huberty C.J. , Lix L.M. , Olejnik S. , Cribbie R.A. , Donahue B. , et al. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350– 386 First citation in article Crossref, Google Scholar
Kieffer K.M. , Reese R.J. , Thompson B. (2001). Statistical techniques employed in AERJ and JCP articles from 1988 to 1997: A methodological review. Journal of Experimental Education, 69, 280– 309 First citation in article Crossref, Google Scholar
Kirk R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746– 759 First citation in article Crossref, Google Scholar
Kline R.B. (2004). Beyond significance testing: Performing data analysis methods in behavioral research . Washington, DC: American Psychological Association First citation in article Google Scholar
Kromrey J.D. , Hines C.V. (1996). Estimating the coefficient of cross-validity in multiple regression: A comparison of analytical and empirical methods. Journal of Experimental Education, 64, 240– 266 First citation in article Crossref, Google Scholar
Larson S.C. (1931). The shrinkage of the coefficient of multiple correlation. Journal of Educational Psychology, 22, 45– 55 First citation in article Crossref, Google Scholar
Lord F.M. (1950). Efficiency of prediction when a regression equation from one sample is used in a new sample . (Research Bulletin 50-110). Princeton, NJ: Educational Testing Service First citation in article Google Scholar
Maxwell S.E. (2004). The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. Psychological Methods, 9, 147– 163 First citation in article Crossref, Google Scholar
Maxwell S.E. , Camp C.J. , Avery R.D. (1981). Measures of strength of association: A comparative examination. Journal of Applied Psychology, 66, 525– 534 First citation in article Crossref, Google Scholar
Maxwell S.E. , Delaney H.D. (1990). Designing experiments and analyzing data: A model comparison perspective . Belmont, CA: Wadsworth First citation in article Google Scholar
McLean J.E. , Ernest J.M. (1998). The role of statistical significance testing in educational research. Research in the Schools, 5, 15– 22 First citation in article Google Scholar
Meehl P.E. (1990). Appraising and amending theories: The strategy of Lakatosian defence and two principles that warrant it. Psychological Inquiry, 1(2), 108– 141 First citation in article Crossref, Google Scholar
Montgomery D.B. , Morrison D.C.A. (1973). A note on adjusting R² . Journal of Finance, 28, 1009– 1013 First citation in article Google Scholar
Moore D.S. , McCabe G.P. (1999). Introduction to the practice of statistics (3rd ed.). New York: Freeman First citation in article Google Scholar
O'Grady K.E. (1982). Measures of explained variance: Cautions and limitations. Psychological Bulletin, 92, 766– 777 First citation in article Crossref, Google Scholar
Olejnik S. , Algina J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25, 241– 286 First citation in article Crossref, Google Scholar
Pierce C.A. , Block R.A. , Aguinis H. (2004). Cautionary note on reporting η² values from multifactor ANOVA designs. Educational and Psychological Measurement, 64, 916– 924 First citation in article Crossref, Google Scholar
Popper K.R. (1989). Logik der Forschung . [The logic of scientific discovery] (9th ed.). Tübingen, Germany: Mohr First citation in article Google Scholar
R Development Core Team. (2004). R: A language and environment for statistical computing . Vienna: R Foundation for Statistical Computing First citation in article Google Scholar
Roberts J.K. , Henson R.K. (2002). Correction for bias in estimating effect sizes. Educational and Psychological Measurement, 62, 241– 253 First citation in article Crossref, Google Scholar
Rosenthal R. , Rosnow R.L. , Rubin D.B. (2000). Contrasts and effect sizes in behavioral research: A correlational approach . New York: Cambridge University Press First citation in article Google Scholar
Rosnow R.L. , Rosenthal R. (1996). Computing contrasts, effect sizes, and counternulls on other people's published data: General procedures for research consumers. Psychological Methods, 1, 331– 340 First citation in article Crossref, Google Scholar
Schmidt F.L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 1, 115– 129 First citation in article Crossref, Google Scholar
Sedlmeier P. , Gigerenzer G. (1989). Do studies of statistical power have an effect on the power of studies. Psychological Bulletin, 105, 309– 316 First citation in article Crossref, Google Scholar
Smithson M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement, 61, 605– 632 First citation in article Crossref, Google Scholar
Smithson M. (2002). Confidence intervals . Thousand Oaks, CA: Sage First citation in article Google Scholar
Snyder P.A. , Lawson S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education, 61, 334– 349 First citation in article Crossref, Google Scholar
Snyder P.A. , Thompson B. (1998). Use of tests of statistical significance and other analytic choices in a school psychology journal: Review of practices and suggested alternatives. School Psychology Quarterly, 13, 335– 348 First citation in article Crossref, Google Scholar
Steiger J.H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164– 182 First citation in article Crossref, Google Scholar
Steiger J.H. , Fouladi R.T. (1997). Noncentrality interval estimation and the evaluation of statistical models. In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 221-257). Mahwah, NJ: Erlbaum First citation in article Google Scholar
Thompson B. (1999a). Journal editorial policies regarding statistical significance tests: Heat is to fire as p is to importance. Educational Psychology Review, 11, 157– 169 First citation in article Crossref, Google Scholar
Thompson B. (1999b). Why “encouraging” effect size reporting is not working: The etiology of researcher resistance to changing practices. The Journal of Psychology, 133, 133– 140 First citation in article Crossref, Google Scholar
Thompson B. (2002a). “Statistical,” “practical,” and “clinical”: How many kinds of significance do counselors need to consider?. Journal of Counseling and Development, 80(1), 64– 71 First citation in article Crossref, Google Scholar
Thompson B. (2002b). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 24– 31 First citation in article Google Scholar
Thompson B. , Snyder P.A. (1997). Statistical significance testing practices in the Journal of Experimental Education. Journal of Experimental Education, 66, 75– 83 First citation in article Crossref, Google Scholar
Vacha-Haase T. , Nilsson J.E. (1998). Statistical significance reporting: Current trends and usages within MECD. Measurement and Evaluation in Counseling and Development, 31, 46– 57 First citation in article Crossref, Google Scholar
Vacha-Haase T. , Nilsson J.E. , Reetz D.R. , Lance T.S. , Thompson B. (2000). Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory and Psychology, 10, 413– 425 First citation in article Crossref, Google Scholar
Vacha-Haase T. , Thompson B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology, 51, 473– 481 First citation in article Crossref, Google Scholar
Wherry R.J. (1931). A new formula for predicting the shrinkage of the coefficient of multiple correlation. Annals of Mathematical Statistics, 2, 440– 451 First citation in article Crossref, Google Scholar
Wilkinson L. APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594– 604 First citation in article Crossref, Google Scholar
Yin P. , Fan X. (2001). Estimating R² shrinkage in multiple regression: A comparison of different analytical methods. Journal of Experimental Education, 69, 203– 224 First citation in article Crossref, Google Scholar

Volume 3Issue 1January 2007

ISSN: 1614-1881eISSN: 1614-2241

Licenses & Copyright

Keywords

PDF download

Verify Phone

Congrats!

Effect Sizes and F Ratios < 1.0

Abstract

References

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Effect Sizes and F Ratios < 1.0

Abstract

References

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners