Abstract
Abstract. Cognitive diagnostic modeling has been adopted to support various diagnostic measuring processes. Specifically, this approach allows practitioners and/or researchers to investigate an individual’s status with regard to certain latent variables of interest. However, the diagnostic information provided by traditional estimation approaches often suffers from low accuracy, especially under small sample conditions. This paper adopts an AdaBoost technique, popular in the field of machine learning, to estimate latent variables. Further, the proposed approach involves the construction of a simple iterative algorithm that is based upon the AdaBoost technique – such that the area under the curve (AUC) is minimized. The algorithmic details are elaborated via pseudo codes with line-to-line verbal explanations. Simulation studies were conducted such that the improvement of latent variable estimates via the proposed approach can be examined.
References
2015). A comparison of programming languages in macroeconomics. Journal of Economic Dynamics and Control, 58, 265–273. https://doi.org/10.1016/j.jedc.2015.05.009
(1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–139. https://doi.org/10.1023/A:1007515423169
(1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459. https://doi.org/10.1007/BF02293801
(2017, September 28). Package “doParallel”. Retrieved from https://cran.r-project.org/web/packages/doParallel/doParallel.pdf
(2016). Stan: A probabilistic programming language. Journal of Statistical Software, 20, 1–37. https://doi.org/10.18637/jss.v076.i01
(2016, August). Package “fastAdaboost”. Retrieved from https://cran.r-project.org/web/packages/fastAdaboost/fastAdaboost.pdf
(2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225–250. https://doi.org/10.1007/s00357-013-9132-9
(2017). Cognitive diagnosis for small educational programs: The general nonparametric classification method. Psychometrika, 83, 355–375. https://doi.org/10.1007/s11336-017-9595-4
(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. https://doi.org/10.1007/BF02310555
(1986). Introduction to classical and modern test theory. Orlando, FL: Holt, Rinehart and Winston.
(1969). Estimating the components of a mixture of normal distributions. Biometrika, 56, 463–474. https://doi.org/10.1093/biomet/56.3.463
(2011). The generalized DINA model framework. Psychometrika, 76, 179–199. https://doi.org/10.1007/s11336-011-9207-7
(2017). Data classification with binary response through the Boosting algorithm and logistic regression. Expert Systems With Applications, 69, 62–73. https://doi.org/10.1016/j.eswa.2016.08.014
(2007). Guest editors’ introduction and overview: IRT‐based cognitive diagnostic models and related methods. Journal of Educational Measurement, 44, 285–291. https://doi.org/10.1111/j.1745-3984.2007.00039.x
(2016). The R package CDM for cognitive diagnosis models. Journal of Statistical Software, 74, 1–24. https://doi.org/10.18637/jss.v074.i02
(1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55, 119–139. https://doi.org/10.1006/jcss.1997.1504
(2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232. Retrieved from http://www.jstor.org/stable/2699986
(2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191. https://doi.org/10.1007/s11336-008-9089-5
(2018). Using the linear mixed-effect model framework to estimate generalizability variance components in R. Methodology, 14, 133–142.
(2018). Using Hamiltonian Monte Carlo to estimate the log-linear cognitive diagnosis model via Stan. Behavior Research Methods. Advance online publication. https://doi.org/10.3758/s13428-018-1069-9
(2018). Integrating differential evolution optimization to cognitive diagnosis model estimation. Frontiers in Psychology, 9, 2142.
(2018). A Bayesian approach to estimating variance components within a multivariate generalizability theory framework. Behavior Research Methods, 50, 2193–2214. https://doi.org/10.3758/s13428-017-0986-3
(2018a). Constructing Gibbs samplers for bayesian logistic item response models. Multivariate behavioral research, 53, 132.
(2018b). Gibbs samplers for logistic item response models via the Pólya-gamma distribution: A computationally efficient data-augmentation strategy. Psychometrika. Advance online publication. https://doi.org/10.1007/s11336-018-9641-x.
(2018). Improving generalizability coefficient estimate accuracy: A way to incorporate auxiliary information. Methodological Innovations, 11, 1–14.
(2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272. https://doi.org/10.1177/01466210122032064
(2018). Misspecification of attribute structure in diagnostic measurement. Educational and Psychological Measurement, 78, 605–634. https://doi.org/10.1177/0013164417702458
(2017). The impact of Q-matrix designs on diagnostic classification accuracy in the presence of attribute hierarchies. Educational and psychological measurement, 77, 220–240.
(2018). Retrofitting diagnostic classification models to responses from IRT-based assessment forms. Educational and Psychological Measurement, 78, 357–383.
(1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
(2013).
(Understanding variable importances in forests of randomized trees . In C. BurgesL. BottouM. WellingZ. GhahramaniK. Q. WeinbergerEds., Proceedings of Advances in neural information processing systems (pp. 431–439). Cambridge, MA: MIT Press.2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69, 253–275. https://doi.org/10.1111/bmsp.12070
(2018). GDINA: The generalized DINA model framework [Computer software version 2.1]. Retrieved from https://CRAN.R-project.org/package=GDINA
(2013). Mplus user’s guide (Version 6.1) [Computer software and manual]. Los Angeles, CA: Muthén & Muthén.
(2011).
(A comparison of random forests, boosting and support vector machines for genomic selection . In M. SzydlowskiEd., Proceedings of Proceedings of the 14th European workshop on QTL mapping and marker assisted selection (QTL-MAS) (p. S11) Bydgoszcz, Poland: BioMed Central.2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.Rproject.org/
. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68, 78–96. https://doi.org/10.1177/0013164407301545
(2010). Diagnostic assessment: Theory, methods, and applications. New York, NY: Guilford Press.
(2003).
(The boosting approach to machine learning: An overview . In D. D. DenisonM. H. HansenC. C. HolmesB. MallickB. YuEds., Nonlinear estimation and classification (pp. 149–171). New York, NY: Springer.2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79, 317–339. https://doi.org/10.1007/s11336-013-9362-0
(2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice, 32, 37–50. https://doi.org/10.1111/emip.12010
(1985). Statistical analysis of finite mixture distributions. Chichester, UK: John Wiley.
(1970). Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research, 5, 329–350. https://doi.org/10.1207/s15327906mbr0503_6
(2013). A primer on receiver operating characteristic analysis and diagnostic efficiency statistics for pediatric psychology: We are ready to ROC. Journal of Pediatric Psychology, 39, 204–221. https://doi.org/10.1093/jpepsy/jst062
(