Skip to main content
Original Article

Performance of Combined Models in Discrete Binary Classification

Published Online:https://doi.org/10.1027/1614-2241/a000117

Abstract. Diverse Discrete Discriminant Analysis (DDA) models perform differently in different samples. This fact has encouraged research in combined models which seems particularly promising when the a priori classes are not well separated or when small or moderate sized samples are considered, which often occurs in practice. In this study, we evaluate the performance of a convex combination of two DDA models: the First-Order Independence Model (FOIM) and the Dependence Trees Model (DTM). We use simulated data sets with two classes and consider diverse data complexity factors which may influence performance of the combined model – the separation of classes, balance, and number of missing states, as well as sample size and also the number of parameters to be estimated in DDA. We resort to cross-validation to evaluate the precision of classification. The results obtained illustrate the advantage of the proposed combination when compared with FOIM and DTM: it yields the best results, especially when very small samples are considered. The experimental study also provides a ranking of the data complexity factors, according to their relative impact on classification performance, by means of a regression model. It leads to the conclusion that the separation of classes is the most influential factor in classification performance. The ratio between the number of degrees of freedom and sample size, along with the proportion of missing states in the minority class, also has significant impact on classification performance. An additional gain of this study, also deriving from the estimated regression model, is the ability to successfully predict the precision of classification in a real data set based on the data complexity factors.

References

  • Abbott, D. W. (1999). Combining models to improve classifier accuracy and robustness. Proceedings of Second International Conference on Information Fusion, Fusion’99, (Vol. 1, pp. 289–295), San Jose, CA. First citation in articleGoogle Scholar

  • Amershi, S. & Conati, C. (2009). Combining unsupervised and supervised classification to build user models for exploratory. JEDM-Journal of Educational Data Mining, 1, 18–71. First citation in articleGoogle Scholar

  • Bacelar-Nicolau, H. (1985). The affinity coefficient in cluster analysis. Methods of Operations Research, 53, 507–512. First citation in articleGoogle Scholar

  • Bache, K. & Lichman, M. (2013). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. First citation in articleGoogle Scholar

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140. First citation in articleCrossrefGoogle Scholar

  • Breiman, L. (1998). Half & Half bagging and hard boundary points (Technical Report). Berkeley, CA: Statistics Department, University of California. First citation in articleGoogle Scholar

  • Brito, I. (2002). Combinaison de modèles en analyse discriminante dans un contexte gaussien [Combining models in discriminant analysis in a Gaussian context]. (PhD thesis). France: Grenoble 1 University. First citation in articleGoogle Scholar

  • Brito, I., Celeux, G. & Sousa Ferreira, A. (2006). Combining methods in supervised classification: A comparative study on discrete and continuous problems. REVSTAT – Statistical Journal, 4, 201–225. First citation in articleGoogle Scholar

  • Celeux, G. & Nakache, J. P. (1994). Analyse discriminante sur variables qualitatives [Discrete discriminant analyses]. Paris: Polytechnica. First citation in articleGoogle Scholar

  • Cesa-Bianchi, N., Claudio, G. & Luca, Z. (2006). Hierarchical classification: Combining Bayes with SVM. Proceedings of the 23rd international conference on machine learning. New York, NY: ACM. First citation in articleCrossrefGoogle Scholar

  • Dietterich, T. G. (1997). Machine-learning research. AI Magazine, 18, 97–136. First citation in articleGoogle Scholar

  • Duarte, A. (2009). A satisfação do consumidor nas instituições culturais: O caso do Centro Cultural de Belém [Consumer satisfaction in cultural institutions: The case of Centro Cultural de Belém]. (Master thesis). Portugal: ISCTE – IUL. First citation in articleGoogle Scholar

  • Elder, J. F. & Pregibon, D. (1996). A statistical perspective on knowledge discovery in databases. In U. M. FayyadG. Piatetsky-ShapiroP. SmythR. UthurusamyEds., Advances in knowledge discovery and data mining (pp. 83–116). Menlo Park, CA: AAAI/MIT Press. First citation in articleGoogle Scholar

  • Finch, H. & Schneider, M. K. (2007). Classification accuracy of neural networks vs. discriminant analysis, logistic regression, and classification and regression trees. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 2, 47–57. DOI: 10.1027/1614-2241.3.2.47 First citation in articleLinkGoogle Scholar

  • Freund, Y. & Schapire, R. E. (1996, July). Experiments with a new boosting algorithm. ICML’96: Proceedings of the 13th International Conference on Machine Learning, (Vol. 96, pp. 148–156). Bari, Italy. First citation in articleGoogle Scholar

  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232. First citation in articleCrossrefGoogle Scholar

  • Friedman, J. H., Hastie, T. & Tibsharani, R. (1998). Additive logistic regression: A statistical view of boosting. (Technical Report). Stanford, CA: Statistics Department, Stanford University. First citation in articleGoogle Scholar

  • Friedman, J. H. & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2, 916–954. First citation in articleCrossrefGoogle Scholar

  • Goldstein, M. & Dillon, W. R. (1978). Discrete discriminant analysis. New York, NY: Wiley. First citation in articleGoogle Scholar

  • Henningsen, A. (2010). Estimating Censored Regression Models in R using the censReg Package. R package vignettes. First citation in articleGoogle Scholar

  • Ho, T. K. & Basu, M. (2002). Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 289–300. First citation in articleCrossrefGoogle Scholar

  • Jain, A. K., Duin, R. P. W. & Mao, J. (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 4–37. First citation in articleCrossrefGoogle Scholar

  • Janusz, A. (2010). Combining multiple classification or regression models using genetic algorithms. In Marcin SzczukaMarzena KryszkiewiczSheela RamannaRichard JensenQinghua HuEds., Rough Sets and Current Trends in Computing. (pp. 130–137). Berlin-Heidelberg, Germany: Springer. First citation in articleGoogle Scholar

  • Kotsiantis, S. (2011). Combining bagging, boosting, rotation forest and random subspace methods. Artificial Intelligence Review, 35, 223–240. First citation in articleCrossrefGoogle Scholar

  • Kotsiantis, S. B., Zaharakis, I. D. & Pintelas, P. E. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26, 159–190. First citation in articleCrossrefGoogle Scholar

  • Macia, N., Bernadó-Mansilla, E. & Orriols-Puig, A. (2008). Preliminary approach on synthetic data sets generation based on class separability measure. ICPR 2008 – 19th International Conference on Pattern Recognition. Tampa, FL: IEEE, 1–4. First citation in articleCrossrefGoogle Scholar

  • Marques, A., Sousa Ferreira, A. & Cardoso, M. G. M. S. (2010). Classification and combining models. Proceedings of Stochastic Modeling Techniques and Data Analysis International Conference (CD-rom), Chania, Crete, Greece. First citation in articleGoogle Scholar

  • Marques, A., Sousa Ferreira, A. & Cardoso, M. G. M. S. (2013). Selection of variables in discrete discriminant analysis. Biometrical Letters, 50, 1–14. First citation in articleCrossrefGoogle Scholar

  • Marques, A., Sousa Ferreira, A. & Cardoso, M. G. M. S. (2015). Combining models in discrete discriminant analysis. International Journal of Data Analysis Techniques and Strategies, 2. http://www.inderscience.com/jhome.php?jcode=ijdats First citation in articleGoogle Scholar

  • Matusita, K. (1955). Decision rules, based on the distance, for problems of fit, two samples, and estimation. The Annals of Mathematical Statistics, 26, 631–640. First citation in articleCrossrefGoogle Scholar

  • Milgram, J., Cheriet, M. & Sabourin, R. (2004). Speeding up the decision making of support vector classifiers. Ninth International Workshop on Frontiers in Handwriting Recognition, (IWFHR-2004). Kokubunji, Tokyo: IEEE, 57–62. First citation in articleCrossrefGoogle Scholar

  • Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann. First citation in articleGoogle Scholar

  • Pinches, G. E. (1980). Factors influencing classification results from multiple discriminant analysis. Journal of Business Research, 8, 429–456. First citation in articleCrossrefGoogle Scholar

  • Prati, R. C., Batista, G. E. & Monard, M. C. (2004). Class imbalances versus class overlapping: an analysis of a learning system behavior. In R. MonroyG. Arroyo-FigueroaL. E. SucarH. SossaEds., MICAI 2004: Advances in Artificial Intelligence (pp. 312–321). Berlin-Heidelberg, Germany: Springer. First citation in articleGoogle Scholar

  • Prazeres, N. L. (1996). Ensaio de um Estudo sobre Alexitimia com o Rorschach e a Escala de Alexitimia deToronto (TAS-20) [Study assay on alexithymia with the Rorschach and the Alexithymia Scale of Toronto (TAS-20)]. (Master thesis). Lisbon: Universidade de Lisboa. First citation in articleGoogle Scholar

  • Raudys, S. J. & Jain, A. K. (1991). Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 252–264. First citation in articleCrossrefGoogle Scholar

  • Re, M. & Valentini, G. (2012). Ensemble methods: A review. In Michael J. WayJeffrey D. ScargleKamal M. AliAshok N. SrivastavaEds., Advances in machine learning and data mining for astronomy (pp. 563–582). Boca Raton, FL: Chapman & Hall/CRC Press. First citation in articleGoogle Scholar

  • Sotoca, J. M., Sanchez, J. S. & Mollineda, R. A. (2005). A review of data complexity measures and their applicability to pattern classification problems. Actas del III Taller Nacional de Mineria de Datos y Aprendizaje. TAMIDA, 77–83. Granada, Spain. First citation in articleGoogle Scholar

  • Sousa Ferreira, A. (2000). Combining models in discrete discriminant analysis. (PhD Thesis, in Portuguese). Lisboa: University Nova de Lisboa. First citation in articleGoogle Scholar

  • Sousa Ferreira, A. (2004). Combining Models in discrete discriminant analysis through a committee of methods. In D. BanksL. HouseF. R. McMorrisP. ArabieW. GaulEds., Classification clustering, and data mining applications (pp. 151–156). Berlin-Heidelberg, Germany: Springer. First citation in articleGoogle Scholar

  • Sousa Ferreira, A. (2010). A comparative study on discrete discriminant analysis through a hierarchical coupling approach. In H. Locarek-JungeC. WeihsEds., Classification as a tool for research, studies in classification, data analysis, and knowledge organization (pp. 137–145). Berlin-Heidelberg, Germany: Springer. First citation in articleGoogle Scholar

  • Sousa Ferreira, A., Celeux, G. & Bacelar-Nicolau, H. (2000). Discrete Discriminant Analysis: The performance of combining models by a hierarchical coupling approach. In H. A. L. KiersJ.-P. RassonP. J. F. GroenenM. SchaderEds., Data analysis, classification, and related methods (pp. 181–186). Berlin-Heidelberg, Germany: Springer. First citation in articleGoogle Scholar

  • Steinberg, D. (1997). CART user’s manual. San Diego, CA: Salford Systems. First citation in articleGoogle Scholar

  • Weiss, G. M. & Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315–354. First citation in articleCrossrefGoogle Scholar

  • Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259. First citation in articleCrossrefGoogle Scholar