Abstract
Abstract. The analysis of variance (ANOVA) is frequently used to examine whether a number of groups differ on a variable of interest. The global hypothesis test of the ANOVA can be reformulated as a regression model in which all group differences are simultaneously tested against zero. Multiple imputation offers reliable and effective treatment of missing data; however, recommendations differ with regard to what procedures are suitable for pooling ANOVA results from multiply imputed datasets. In this article, we compared several procedures (known as D1, D2, and D3) using Monte Carlo simulations. Even though previous recommendations have advocated that D2 should be avoided in favor of D1 or D3, our results suggest that all procedures provide a suitable test of the ANOVA’s global null hypothesis in many plausible research scenarios. In more extreme settings, D1 was most reliable, whereas D2 and D3 suffered from different limitations. We provide guidelines on how the different methods can be applied in one- and two-factorial ANOVA designs and information about the conditions under which some procedures may perform better than others. Computer code is supplied for each method to be used in freely available statistical software.
References
2001). Missing data. Thousand Oaks, CA: Sage.
(2008). Chi-square statistics with multiple imputation, (Technical Appendix). Retrieved from http://www.statmodel.com
(2008). What improves with increased missing data imputations? Structural Equation Modeling: A Multidisciplinary Journal, 15, 651–675. doi: 1080/10705510802339072
(1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152. doi: 1111/j.2044-8317.1978.tb00581.x
(2013). Multiple imputation and its application. Hoboken, NJ: Wiley.
(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
(2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). New York, NY: Routledge.
(2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330–351. doi: 1037/1082-989X.6.4.330
(2010). Order selection tests with multiply imputed data. Computational Statistics & Data Analysis, 54, 2284–2295. doi: 1016/j.csda.2010.04.009
(2008). A note on the use of missing auxiliary variables in full information maximum likelihood-based structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 15, 434–448. doi: 1080/10705510802154307
(2010). Applied missing data analysis. New York, NY: Guilford Press.
(2012). Missing data: Analysis and design. New York, NY: Springer.
(2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213. doi: 1007/s11121-007-0070-9
(2006). Planned missing data designs in psychological research. Psychological Methods, 11, 323–343. doi: 1037/1082-989X.11.4.323
(2016). mitml: Tools for multiple imputation in multilevel modeling (Version 0.3-1) [Computer software]. Retrieved from http://CRAN.R-project.org/package=mitml
(2007). Inferences on missing information under multiple imputation and two-stage multiple imputation. Statistical Methodology, 4, 75–89. doi: 1016/j.stamet.2006.03.002
(2009). The estimation of R2 and adjusted R2 in incomplete data sets using multiple imputation. Journal of Applied Statistics, 36, 1109–1118. doi: 1080/02664760802553000
(2011). Development of weighted model fit indexes for structural equation models using multiple imputation. (Doctoral dissertation). Iowa State University.
(1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1, 65–92.
(1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association, 86, 1065–1073. doi: 10.1080/01621459.1991.10475152
(2010). New methods for generating significance levels from multiply-imputed data. (Doctoral dissertation). Universität Bamberg.
(1992). Regression with missing X’s: A review. Journal of the American Statistical Association, 87, 1227–1237.
(2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: Wiley.
(2009). Combining estimates of interest in prognostic modelling studies after multiple imputation: Current practice and guidelines. BMC Medical Research Methodology, 9, 57. doi: 1186/1471-2288-9-57
(2004). Designing experiments and analyzing data: A model comparison perspective (2nd ed.). New York, NY: Psychology Press.
(1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79, 103–111. doi: 1093/biomet/79.1.103
(2013).
(A SAS macro for computing pooled likelihood ratio tests with multiply imputed data . In SAS Institute Inc.Eds., Proceedings of the SAS® Global Forum 2013 Conference (paper 438-2013). Cary, NC: SAS Institute Inc. Retrieved fromhttp://support.sas.com/resources/papers/proceedings13/440-2013.pdf2014). semTools: Useful tools for structural equation modeling (Version 0.4-6) [Computer software]. Retrieved from http://cran.r-project.org/package=semTools
(2014). R: A language and environment for statistical computing (Version 3.1.2) [Computer software]. Retrieved fromhttp://www.R-project.org
. (2011). Analysis of variance from multiply imputed data sets, Unpublished manuscript, University of Michigan, Ann Arbor, MI.
(2007). Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data. Biometrika, 94, 502–508. doi: 1093/biomet/asm028
(2007). The multiple adaptations of multiple imputation. Journal of the American Statistical Association, 102, 1462–1471. doi: 1198/016214507000000932
(2016). On the asymptotic relative efficiency of planned missingness designs. Psychometrika, 81, 60–89. doi: 1007/s11336-014-9422-0
(1976). Inference and missing data. Biometrika, 63, 581–592. doi: 1093/biomet/63.3.581
(1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley.
(1997). Analysis of incomplete multivariate data. Boca Raton, FL: CRC Press.
(2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177. doi: 1037/1082-989X.7.2.147
(2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage.
(2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press.
(2011). MICE: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67.
(2014). Analysis of variance of multiply imputed data. Multivariate Behavioral Research, 49, 78–91. doi: 1080/00273171.2013.855890
(2007). Regression with missing Y s: An improved strategy for analyzing multiply imputed data. Sociological Methodology, 37, 83–117. doi: 1111/j.1467-9531.2007.00180.x
(