Performance of Combined Models in Discrete Binary Classification
Abstract
Abstract. Diverse Discrete Discriminant Analysis (DDA) models perform differently in different samples. This fact has encouraged research in combined models which seems particularly promising when the a priori classes are not well separated or when small or moderate sized samples are considered, which often occurs in practice. In this study, we evaluate the performance of a convex combination of two DDA models: the First-Order Independence Model (FOIM) and the Dependence Trees Model (DTM). We use simulated data sets with two classes and consider diverse data complexity factors which may influence performance of the combined model – the separation of classes, balance, and number of missing states, as well as sample size and also the number of parameters to be estimated in DDA. We resort to cross-validation to evaluate the precision of classification. The results obtained illustrate the advantage of the proposed combination when compared with FOIM and DTM: it yields the best results, especially when very small samples are considered. The experimental study also provides a ranking of the data complexity factors, according to their relative impact on classification performance, by means of a regression model. It leads to the conclusion that the separation of classes is the most influential factor in classification performance. The ratio between the number of degrees of freedom and sample size, along with the proportion of missing states in the minority class, also has significant impact on classification performance. An additional gain of this study, also deriving from the estimated regression model, is the ability to successfully predict the precision of classification in a real data set based on the data complexity factors.
References
1999). Combining models to improve classifier accuracy and robustness. Proceedings of Second International Conference on Information Fusion, Fusion’99, (Vol. 1, pp. 289–295), San Jose, CA.
(2009). Combining unsupervised and supervised classification to build user models for exploratory. JEDM-Journal of Educational Data Mining, 1, 18–71.
(1985). The affinity coefficient in cluster analysis. Methods of Operations Research, 53, 507–512.
(2013). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.
(1996). Bagging predictors. Machine Learning, 24, 123–140.
(1998). Half & Half bagging and hard boundary points (Technical Report). Berkeley, CA: Statistics Department, University of California.
(2002). Combinaison de modèles en analyse discriminante dans un contexte gaussien
([Combining models in discriminant analysis in a Gaussian context] . (PhD thesis). France: Grenoble 1 University.2006). Combining methods in supervised classification: A comparative study on discrete and continuous problems. REVSTAT – Statistical Journal, 4, 201–225.
(1994). Analyse discriminante sur variables qualitatives
([Discrete discriminant analyses] . Paris: Polytechnica.2006). Hierarchical classification: Combining Bayes with SVM. Proceedings of the 23rd international conference on machine learning. New York, NY: ACM.
(1997). Machine-learning research. AI Magazine, 18, 97–136.
(2009). A satisfação do consumidor nas instituições culturais: O caso do Centro Cultural de Belém
([Consumer satisfaction in cultural institutions: The case of Centro Cultural de Belém] . (Master thesis). Portugal: ISCTE – IUL.1996).
(A statistical perspective on knowledge discovery in databases . In U. M. FayyadG. Piatetsky-ShapiroP. SmythR. UthurusamyEds., Advances in knowledge discovery and data mining (pp. 83–116). Menlo Park, CA: AAAI/MIT Press.2007). Classification accuracy of neural networks vs. discriminant analysis, logistic regression, and classification and regression trees. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 2, 47–57. DOI: 10.1027/1614-2241.3.2.47
(1996, July). Experiments with a new boosting algorithm. ICML’96: Proceedings of the 13th International Conference on Machine Learning, (Vol. 96, pp. 148–156). Bari, Italy.
(2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 1189–1232.
(1998). Additive logistic regression: A statistical view of boosting. (Technical Report). Stanford, CA: Statistics Department, Stanford University.
(2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2, 916–954.
(1978). Discrete discriminant analysis. New York, NY: Wiley.
(2010). Estimating Censored Regression Models in R using the censReg Package. R package vignettes.
(2002). Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 289–300.
(2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 4–37.
(2010).
(Combining multiple classification or regression models using genetic algorithms . In Marcin SzczukaMarzena KryszkiewiczSheela RamannaRichard JensenQinghua HuEds., Rough Sets and Current Trends in Computing. (pp. 130–137). Berlin-Heidelberg, Germany: Springer.2011). Combining bagging, boosting, rotation forest and random subspace methods. Artificial Intelligence Review, 35, 223–240.
(2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26, 159–190.
(2008). Preliminary approach on synthetic data sets generation based on class separability measure. ICPR 2008 – 19th International Conference on Pattern Recognition. Tampa, FL: IEEE, 1–4.
(2010). Classification and combining models. Proceedings of Stochastic Modeling Techniques and Data Analysis International Conference (CD-rom), Chania, Crete, Greece.
(2013). Selection of variables in discrete discriminant analysis. Biometrical Letters, 50, 1–14.
(2015). Combining models in discrete discriminant analysis. International Journal of Data Analysis Techniques and Strategies, 2. http://www.inderscience.com/jhome.php?jcode=ijdats
(1955). Decision rules, based on the distance, for problems of fit, two samples, and estimation. The Annals of Mathematical Statistics, 26, 631–640.
(2004). Speeding up the decision making of support vector classifiers. Ninth International Workshop on Frontiers in Handwriting Recognition, (IWFHR-2004). Kokubunji, Tokyo: IEEE, 57–62.
(1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.
(1980). Factors influencing classification results from multiple discriminant analysis. Journal of Business Research, 8, 429–456.
(2004).
(Class imbalances versus class overlapping: an analysis of a learning system behavior . In R. MonroyG. Arroyo-FigueroaL. E. SucarH. SossaEds., MICAI 2004: Advances in Artificial Intelligence (pp. 312–321). Berlin-Heidelberg, Germany: Springer.1996). Ensaio de um Estudo sobre Alexitimia com o Rorschach e a Escala de Alexitimia deToronto (TAS-20)
([Study assay on alexithymia with the Rorschach and the Alexithymia Scale of Toronto (TAS-20)] . (Master thesis). Lisbon: Universidade de Lisboa.1991). Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 252–264.
(2012).
(Ensemble methods: A review . In Michael J. WayJeffrey D. ScargleKamal M. AliAshok N. SrivastavaEds., Advances in machine learning and data mining for astronomy (pp. 563–582). Boca Raton, FL: Chapman & Hall/CRC Press.2005). A review of data complexity measures and their applicability to pattern classification problems. Actas del III Taller Nacional de Mineria de Datos y Aprendizaje. TAMIDA, 77–83. Granada, Spain.
(2000). Combining models in discrete discriminant analysis. (PhD Thesis, in Portuguese). Lisboa: University Nova de Lisboa.
(2004).
(Combining Models in discrete discriminant analysis through a committee of methods . In D. BanksL. HouseF. R. McMorrisP. ArabieW. GaulEds., Classification clustering, and data mining applications (pp. 151–156). Berlin-Heidelberg, Germany: Springer.2010).
(A comparative study on discrete discriminant analysis through a hierarchical coupling approach . In H. Locarek-JungeC. WeihsEds., Classification as a tool for research, studies in classification, data analysis, and knowledge organization (pp. 137–145). Berlin-Heidelberg, Germany: Springer.2000).
(Discrete Discriminant Analysis: The performance of combining models by a hierarchical coupling approach . In H. A. L. KiersJ.-P. RassonP. J. F. GroenenM. SchaderEds., Data analysis, classification, and related methods (pp. 181–186). Berlin-Heidelberg, Germany: Springer.1997). CART user’s manual. San Diego, CA: Salford Systems.
(2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315–354.
(1992). Stacked generalization. Neural Networks, 5, 241–259.
(