Original Article

Applying AdaBoost to Improve Diagnostic Accuracy

A Simulation Study

Zhehan Jiang

https://orcid.org/0000-0002-1376-9439

University Libraries, The University of Alabama, Tucaloosa, AL, USA

Search for more papers by this author

Kevin Walker

University Libraries, The University of Alabama, Tucaloosa, AL, USA

Search for more papers by this author

, and

Dexin Shi

Department of Psychology, University of South Carolina, Columbia, SC, USA

Search for more papers by this author

Published Online:May 27, 2019https://doi.org/10.1027/1614-2241/a000166

Abstract

Abstract. Cognitive diagnostic modeling has been adopted to support various diagnostic measuring processes. Specifically, this approach allows practitioners and/or researchers to investigate an individual’s status with regard to certain latent variables of interest. However, the diagnostic information provided by traditional estimation approaches often suffers from low accuracy, especially under small sample conditions. This paper adopts an AdaBoost technique, popular in the field of machine learning, to estimate latent variables. Further, the proposed approach involves the construction of a simple iterative algorithm that is based upon the AdaBoost technique – such that the area under the curve (AUC) is minimized. The algorithmic details are elaborated via pseudo codes with line-to-line verbal explanations. Simulation studies were conducted such that the improvement of latent variable estimates via the proposed approach can be examined.

References

Aruoba, S. B., & Fernández-Villaverde, J. (2015). A comparison of programming languages in macroeconomics. Journal of Economic Dynamics and Control, 58, 265–273. https://doi.org/10.1016/j.jedc.2015.05.009 First citation in article Crossref, Google Scholar
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–139. https://doi.org/10.1023/A:1007515423169 First citation in article Crossref, Google Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459. https://doi.org/10.1007/BF02293801 First citation in article Crossref, Google Scholar
Calaway, R., Microsoft Corporation, Weston, S., & Tenenbaum, D. (2017, September 28). Package “doParallel”. Retrieved from https://cran.r-project.org/web/packages/doParallel/doParallel.pdf First citation in article Google Scholar
Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., … Riddell, A. (2016). Stan: A probabilistic programming language. Journal of Statistical Software, 20, 1–37. https://doi.org/10.18637/jss.v076.i01 First citation in article Google Scholar
Chatterjee, S. (2016, August). Package “fastAdaboost”. Retrieved from https://cran.r-project.org/web/packages/fastAdaboost/fastAdaboost.pdf First citation in article Google Scholar
Chiu, C. Y., & Douglas, J. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225–250. https://doi.org/10.1007/s00357-013-9132-9 First citation in article Crossref, Google Scholar
Chiu, C. Y., Sun, Y., & Bian, Y. (2017). Cognitive diagnosis for small educational programs: The general nonparametric classification method. Psychometrika, 83, 355–375. https://doi.org/10.1007/s11336-017-9595-4 First citation in article Crossref, Google Scholar
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. https://doi.org/10.1007/BF02310555 First citation in article Crossref, Google Scholar
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Orlando, FL: Holt, Rinehart and Winston. First citation in article Google Scholar
Day, N. E. (1969). Estimating the components of a mixture of normal distributions. Biometrika, 56, 463–474. https://doi.org/10.1093/biomet/56.3.463 First citation in article Crossref, Google Scholar
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199. https://doi.org/10.1007/s11336-011-9207-7 First citation in article Crossref, Google Scholar
de Menezes, F. S., Liska, G. R., Cirillo, M. A., & Vivanco, M. J. (2017). Data classification with binary response through the Boosting algorithm and logistic regression. Expert Systems With Applications, 69, 62–73. https://doi.org/10.1016/j.eswa.2016.08.014 First citation in article Crossref, Google Scholar
DiBello, L. V., & Stout, W. (2007). Guest editors’ introduction and overview: IRT‐based cognitive diagnostic models and related methods. Journal of Educational Measurement, 44, 285–291. https://doi.org/10.1111/j.1745-3984.2007.00039.x First citation in article Crossref, Google Scholar
George, A. C., Robitzsch, A., Kiefer, T., Groß, J., & Ünlü, A. (2016). The R package CDM for cognitive diagnosis models. Journal of Statistical Software, 74, 1–24. https://doi.org/10.18637/jss.v074.i02 First citation in article Crossref, Google Scholar
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55, 119–139. https://doi.org/10.1006/jcss.1997.1504 First citation in article Crossref, Google Scholar
Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232. Retrieved from http://www.jstor.org/stable/2699986 First citation in article Crossref, Google Scholar
Henson, R. A., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191. https://doi.org/10.1007/s11336-008-9089-5 First citation in article Crossref, Google Scholar
Jiang, Z. (2018). Using the linear mixed-effect model framework to estimate generalizability variance components in R. Methodology, 14, 133–142. First citation in article Link, Google Scholar
Jiang, Z., & Carter, R. (2018). Using Hamiltonian Monte Carlo to estimate the log-linear cognitive diagnosis model via Stan. Behavior Research Methods. Advance online publication. https://doi.org/10.3758/s13428-018-1069-9 First citation in article Google Scholar
Jiang, Z., & Ma, W. (2018). Integrating differential evolution optimization to cognitive diagnosis model estimation. Frontiers in Psychology, 9, 2142. First citation in article Crossref, Google Scholar
Jiang, Z., & Skorupski, W. (2018). A Bayesian approach to estimating variance components within a multivariate generalizability theory framework. Behavior Research Methods, 50, 2193–2214. https://doi.org/10.3758/s13428-017-0986-3 First citation in article Crossref, Google Scholar
Jiang, Z., & Templin, J. (2018a). Constructing Gibbs samplers for bayesian logistic item response models. Multivariate behavioral research, 53, 132. First citation in article Crossref, Google Scholar
Jiang, Z., & Templin, J. (2018b). Gibbs samplers for logistic item response models via the Pólya-gamma distribution: A computationally efficient data-augmentation strategy. Psychometrika. Advance online publication. https://doi.org/10.1007/s11336-018-9641-x. First citation in article Google Scholar
Jiang, Z., Walker, K., Shi, D., & Cao, J. (2018). Improving generalizability coefficient estimate accuracy: A way to incorporate auxiliary information. Methodological Innovations, 11, 1–14. First citation in article Crossref, Google Scholar
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272. https://doi.org/10.1177/01466210122032064 First citation in article Crossref, Google Scholar
Liu, R. (2018). Misspecification of attribute structure in diagnostic measurement. Educational and Psychological Measurement, 78, 605–634. https://doi.org/10.1177/0013164417702458 First citation in article Crossref, Google Scholar
Liu, R., Huggins-Manley, A. C., & Bradshaw, L. (2017). The impact of Q-matrix designs on diagnostic classification accuracy in the presence of attribute hierarchies. Educational and psychological measurement, 77, 220–240. First citation in article Crossref, Google Scholar
Liu, R., Huggins-Manley, A. C., & Bulut, O. (2018). Retrofitting diagnostic classification models to responses from IRT-based assessment forms. Educational and Psychological Measurement, 78, 357–383. First citation in article Crossref, Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum. First citation in article Google Scholar
Louppe, G., Wehenkel, L., Sutera, A., & Geurts, P. (2013). Understanding variable importances in forests of randomized trees. In C. BurgesL. BottouM. WellingZ. GhahramaniK. Q. WeinbergerEds., Proceedings of Advances in neural information processing systems (pp. 431–439). Cambridge, MA: MIT Press. First citation in article Google Scholar
Ma, W., & de la Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69, 253–275. https://doi.org/10.1111/bmsp.12070 First citation in article Crossref, Google Scholar
Ma, W., & de la Torre, J. (2018). GDINA: The generalized DINA model framework [Computer software version 2.1]. Retrieved from https://CRAN.R-project.org/package=GDINA First citation in article Google Scholar
Muthén, L. K., & Muthén, B. O. (2013). Mplus user’s guide (Version 6.1) [Computer software and manual]. Los Angeles, CA: Muthén & Muthén. First citation in article Google Scholar
Ogutu, J. O., Piepho, H. P., & Schulz-Streeck, T. (2011). A comparison of random forests, boosting and support vector machines for genomic selection. In M. SzydlowskiEd., Proceedings of Proceedings of the 14th European workshop on QTL mapping and marker assisted selection (QTL-MAS) (p. S11) Bydgoszcz, Poland: BioMed Central. First citation in article Google Scholar
R Core Team. (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.Rproject.org/ First citation in article Google Scholar
Rupp, A. A., & Templin, J. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68, 78–96. https://doi.org/10.1177/0013164407301545 First citation in article Crossref, Google Scholar
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic assessment: Theory, methods, and applications. New York, NY: Guilford Press. First citation in article Google Scholar
Schapire, R. E. (2003). The boosting approach to machine learning: An overview. In D. D. DenisonM. H. HansenC. C. HolmesB. MallickB. YuEds., Nonlinear estimation and classification (pp. 149–171). New York, NY: Springer. First citation in article Google Scholar
Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79, 317–339. https://doi.org/10.1007/s11336-013-9362-0 First citation in article Crossref, Google Scholar
Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational Measurement: Issues and Practice, 32, 37–50. https://doi.org/10.1111/emip.12010 First citation in article Crossref, Google Scholar
Titterington, D. M., Smith, A. F., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. Chichester, UK: John Wiley. First citation in article Google Scholar
Wolfe, J. H. (1970). Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research, 5, 329–350. https://doi.org/10.1207/s15327906mbr0503_6 First citation in article Crossref, Google Scholar
Youngstrom, E. A. (2013). A primer on receiver operating characteristic analysis and diagnostic efficiency statistics for pediatric psychology: We are ready to ROC. Journal of Pediatric Psychology, 39, 204–221. https://doi.org/10.1093/jpepsy/jst062 First citation in article Crossref, Google Scholar

Volume 15Issue 2April 2019

ISSN: 1614-1881eISSN: 1614-2241

History

ReceivedJuly 2, 2018
RevisedNovember 21, 2018
AcceptedJanuary 10, 2019
Published onlineMay 27, 2019

Licenses & Copyright

Keywords

PDF download

Funding:

This work received no funding from any organization or agency.

Verify Phone

Congrats!

Applying AdaBoost to Improve Diagnostic Accuracy

A Simulation Study

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Applying AdaBoost to Improve Diagnostic Accuracy

A Simulation Study

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners