Original Article

Pooling ANOVA Results From Multiply Imputed Datasets

A Simulation Study

Simon Grund

Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Centre for International Student Assessment, Germany

Search for more papers by this author

Oliver Lüdtke

Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Centre for International Student Assessment, Germany

Search for more papers by this author

, and

Alexander Robitzsch

Leibniz Institute for Science and Mathematics Education, Kiel, Germany

Centre for International Student Assessment, Germany

Search for more papers by this author

Published Online:October 05, 2016https://doi.org/10.1027/1614-2241/a000111

Abstract

Abstract. The analysis of variance (ANOVA) is frequently used to examine whether a number of groups differ on a variable of interest. The global hypothesis test of the ANOVA can be reformulated as a regression model in which all group differences are simultaneously tested against zero. Multiple imputation offers reliable and effective treatment of missing data; however, recommendations differ with regard to what procedures are suitable for pooling ANOVA results from multiply imputed datasets. In this article, we compared several procedures (known as D₁, D₂, and D₃) using Monte Carlo simulations. Even though previous recommendations have advocated that D₂ should be avoided in favor of D₁ or D₃, our results suggest that all procedures provide a suitable test of the ANOVA’s global null hypothesis in many plausible research scenarios. In more extreme settings, D₁ was most reliable, whereas D₂ and D₃ suffered from different limitations. We provide guidelines on how the different methods can be applied in one- and two-factorial ANOVA designs and information about the conditions under which some procedures may perform better than others. Computer code is supplied for each method to be used in freely available statistical software.

References

Allison, P. D. (2001). Missing data. Thousand Oaks, CA: Sage. First citation in article Google Scholar
Asparouhov, T. & Muthén, B. O. (2008). Chi-square statistics with multiple imputation, (Technical Appendix). Retrieved from http://www.statmodel.com First citation in article Google Scholar
Bodner, T. E. (2008). What improves with increased missing data imputations? Structural Equation Modeling: A Multidisciplinary Journal, 15, 651–675. doi: 1080/10705510802339072 First citation in article Crossref, Google Scholar
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152. doi: 1111/j.2044-8317.1978.tb00581.x First citation in article Crossref, Google Scholar
Carpenter, J. R. & Kenward, M. G. (2013). Multiple imputation and its application. Hoboken, NJ: Wiley. First citation in article Crossref, Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. First citation in article Google Scholar
Cohen, J., Cohen, P., West, S. & Aiken, L. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). New York, NY: Routledge. First citation in article Google Scholar
Collins, L. M., Schafer, J. L. & Kam, C.-M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330–351. doi: 1037/1082-989X.6.4.330 First citation in article Crossref, Google Scholar
Consentino, F. & Claeskens, G. (2010). Order selection tests with multiply imputed data. Computational Statistics & Data Analysis, 54, 2284–2295. doi: 1016/j.csda.2010.04.009 First citation in article Crossref, Google Scholar
Enders, C. K. (2008). A note on the use of missing auxiliary variables in full information maximum likelihood-based structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 15, 434–448. doi: 1080/10705510802154307 First citation in article Crossref, Google Scholar
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford Press. First citation in article Google Scholar
Graham, J. W. (2012). Missing data: Analysis and design. New York, NY: Springer. First citation in article Crossref, Google Scholar
Graham, J. W., Olchowski, A. E. & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213. doi: 1007/s11121-007-0070-9 First citation in article Crossref, Google Scholar
Graham, J. W., Taylor, B. J., Olchowski, A. E. & Cumsille, P. E. (2006). Planned missing data designs in psychological research. Psychological Methods, 11, 323–343. doi: 1037/1082-989X.11.4.323 First citation in article Crossref, Google Scholar
Grund, S., Robitzsch, A. & Lüdtke, O. (2016). mitml: Tools for multiple imputation in multilevel modeling (Version 0.3-1) [Computer software]. Retrieved from http://CRAN.R-project.org/package=mitml First citation in article Google Scholar
Harel, O. (2007). Inferences on missing information under multiple imputation and two-stage multiple imputation. Statistical Methodology, 4, 75–89. doi: 1016/j.stamet.2006.03.002 First citation in article Crossref, Google Scholar
Harel, O. (2009). The estimation of R2 and adjusted R2 in incomplete data sets using multiple imputation. Journal of Applied Statistics, 36, 1109–1118. doi: 1080/02664760802553000 First citation in article Crossref, Google Scholar
Kientoff, C. J. (2011). Development of weighted model fit indexes for structural equation models using multiple imputation. (Doctoral dissertation). Iowa State University. First citation in article Crossref, Google Scholar
Li, K.-H., Meng, X.-L., Raghunathan, T. E. & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1, 65–92. First citation in article Google Scholar
Li, K. H., Raghunathan, T. E. & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association, 86, 1065–1073. doi: 10.1080/01621459.1991.10475152 First citation in article Crossref, Google Scholar
Licht, C. (2010). New methods for generating significance levels from multiply-imputed data. (Doctoral dissertation). Universität Bamberg. First citation in article Google Scholar
Little, R. J. A. (1992). Regression with missing X’s: A review. Journal of the American Statistical Association, 87, 1227–1237. First citation in article Google Scholar
Little, R. J. A. & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: Wiley. First citation in article Crossref, Google Scholar
Marshall, A., Altman, D. G., Holder, R. L. & Royston, P. (2009). Combining estimates of interest in prognostic modelling studies after multiple imputation: Current practice and guidelines. BMC Medical Research Methodology, 9, 57. doi: 1186/1471-2288-9-57 First citation in article Crossref, Google Scholar
Maxwell, S. E. & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (2nd ed.). New York, NY: Psychology Press. First citation in article Google Scholar
Meng, X.-L. & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79, 103–111. doi: 1093/biomet/79.1.103 First citation in article Crossref, Google Scholar
Mistler, S. A. (2013). A SAS macro for computing pooled likelihood ratio tests with multiply imputed data. In SAS Institute Inc.Eds., Proceedings of the SAS® Global Forum 2013 Conference (paper 438-2013). Cary, NC: SAS Institute Inc. Retrieved fromhttp://support.sas.com/resources/papers/proceedings13/440-2013.pdf First citation in article Google Scholar
Pornprasertmanit, S. (2014). semTools: Useful tools for structural equation modeling (Version 0.4-6) [Computer software]. Retrieved from http://cran.r-project.org/package=semTools First citation in article Google Scholar
R Core Team. (2014). R: A language and environment for statistical computing (Version 3.1.2) [Computer software]. Retrieved fromhttp://www.R-project.org First citation in article Google Scholar
Raghunathan, T. & Dong, Q. (2011). Analysis of variance from multiply imputed data sets, Unpublished manuscript, University of Michigan, Ann Arbor, MI. First citation in article Google Scholar
Reiter, J. P. (2007). Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data. Biometrika, 94, 502–508. doi: 1093/biomet/asm028 First citation in article Crossref, Google Scholar
Reiter, J. P. & Raghunathan, T. E. (2007). The multiple adaptations of multiple imputation. Journal of the American Statistical Association, 102, 1462–1471. doi: 1198/016214507000000932 First citation in article Crossref, Google Scholar
Rhemtulla, M., Savalei, V. & Little, T. D. (2016). On the asymptotic relative efficiency of planned missingness designs. Psychometrika, 81, 60–89. doi: 1007/s11336-014-9422-0 First citation in article Crossref, Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592. doi: 1093/biomet/63.3.581 First citation in article Crossref, Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley. First citation in article Crossref, Google Scholar
Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: CRC Press. First citation in article Crossref, Google Scholar
Schafer, J. L. & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177. doi: 1037/1082-989X.7.2.147 First citation in article Crossref, Google Scholar
Snijders, T. A. B. & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage. First citation in article Google Scholar
van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press. First citation in article Crossref, Google Scholar
van Buuren, S. & Groothuis-Oudshoorn, K. (2011). MICE: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. First citation in article Google Scholar
van Ginkel, J. R. & Kroonenberg, P. M. (2014). Analysis of variance of multiply imputed data. Multivariate Behavioral Research, 49, 78–91. doi: 1080/00273171.2013.855890 First citation in article Crossref, Google Scholar
von Hippel, P. T. (2007). Regression with missing Y s: An improved strategy for analyzing multiply imputed data. Sociological Methodology, 37, 83–117. doi: 1111/j.1467-9531.2007.00180.x First citation in article Crossref, Google Scholar

Volume 12Issue 3July 2016

ISSN: 1614-1881eISSN: 1614-2241

History

ReceivedJune 8, 2015
RevisedNovember 30, 2015
AcceptedMay 9, 2016
Published onlineOctober 5, 2016

Licenses & Copyright

Keywords

PDF download

Verify Phone

Congrats!

Pooling ANOVA Results From Multiply Imputed Datasets

A Simulation Study

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Pooling ANOVA Results From Multiply Imputed Datasets

A Simulation Study

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners