Effect of the Number of Response Categories on the Reliability and Validity of Rating Scales
The Likert-type format is one of the most widely used in all types of scales in the field of social sciences. Nevertheless, there is no definitive agreement on the number of response categories that optimizes the psychometric properties of the scales. The aim of the present work is to determine in a systematic fashion the number of response alternatives that maximizes the fundamental psychometric properties of a scale: reliability and validity. The study is carried out with data simulated using the Monte Carlo method. We simulate responses to 30 items with correlations between them ranging from 0.2 to 0.9. We also manipulate sample size, analyzing four different sizes: 50, 100, 200, and 500 cases. The number of response options employed ranges from two to nine. The results show that as the number of response alternatives increases, both reliability and validity improve. The optimum number of alternatives is between four and seven. With fewer than four alternatives the reliability and validity decrease, and from seven alternatives onwards psychometric properties of the scale scarcely increase further. Some applied implications of the results are discussed.
1983). Number of response categories and statistics on a teacher rating scale. Educational and Psychological Measurement, 43, 397–401.(
1999). Testing the equality of two independent α coefficients adjusted by the Spearman-Brown formula. Applied Psychological Measurement, 23, 363–370.(
1987). The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional assumptions. Journal of Marketing Research, 24, 222–228.(
1996). The effect of nonnormality and number of response categories on reliability. Applied Measurement in Education, 9, 151–160.(
1989). Factoring items and factoring scales are different: Spurious evidence for multidimensionality due to item categorization. Psychological Bulletin, 76, 186–204.(
1981). Reliability testing of psychographic scales: Five-point or seven-point? Anchored or labeled?. Journal of Advertising Research, 21, 53–60.(
1991). Customer evaluation of retail salespeople utilizing the SOCO scale: A replication, extension, and application. Journal of Academy Marketing Science, 9, 347–351.(
1985). The effect of number of rating scale categories on levels of interrater reliability: A Monte Carlo investigation. Applied Psychological Measurement, 9, 31–36.(
1982). Comparison of factor analytic results with two choice and seven choice personality item formats. Applied Psychological Measurement, 6, 285–289.(
2002). The impact of categorization with confirmatory factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 9, 327–346.(
1994). Factor analysis of variables with 2, 3, 5 and 7 response categories: A comparison of categorical variable estimator using simulated data. British Journal of Mathematical and Statistical Psychology, 47, 309–326.(
2006). Testing the difference between two α coefficients with small samples of subjects and raters. Educational and Psychological Measurement, 66, 589–600.(
1995). Equivalencia entre los formatos Likert y continuo en ítems de personalidad: un estudio empírico [(
Equivalence between the Likert and continuous format in personality items: An empirical study]. Psicológica, 16, 417–428.
2000). Testing the equivalence among different item response formats in personality measurement: A structural equation modeling approach. Structural Equation Modeling, 7(2), 271–286.(
1997). Effect of the number of the scale points on χ² fit indices in confirmatory factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 4, 108–120.(
1976). A K-sample significance test for independent α coefficients. Psychometrika, 41, 219–231.(
2000). Comportamiento del modelo de respuesta graduada en función del número de categorías de la escala [(
Graded model trend based on the number of the response categories of the scale]. Psicothema, Suppl. 2, 288–291.
1998). Behaviour of descriptive fit indexes in confirmatory factor analysis using ordered categorical data. Structural Equation Modeling: A Multidisciplinary Journal, 5, 344–364.(
1977). A Monte Carlo study of factors affecting three indices of composite scale reliability. Journal of Applied Psychology, 62, 392–398.(
1993). PRELIS 2 users reference guide. Chicago: Scientific Software.(
1983). Dichotomous and multipoint scales using bipolar adjectives. Applied Psychological Measurement, 7, 173–180.(
1997). Introduction aux théories des tests en sciences humaines [(
Introduction to the test theory in the human sciences]. Bruxelles: De Boeck Université.
1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 44–53.(
1.975). Effects of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60, 10–13.(
2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19–40.(
1988). Comparison of response formats for multidimensional health locus of control scales: Six levels versus two levels. Journal of Personality Assessment, 52, 732–736.(
1978). Graphic rating scales. How many categories? British Journal of Psychology, 69, 185–202.(
2004). Directrices para la construcción de ítems de elección múltiple [(
A guide to the multiple choice items development]. Psicothema, 16, 490–497.
2003). Teoría clásica de los tests [(
Classical test theory]. Madrid: Pirámide.
2000). The influence of the number of categories of the items on the psychometric properties of the scale. Paper presented at the XXVII International Congress of Psychology, Stockholm, Sweden.(
2005). Item format and the psychometric properties of the Eysenck Personality Questionnaire. Personality and Individual Differences, 38, 61–69.(
1979). Effects of categorization on relationships in bivariate distributions and applications to rating scales. Dissertation Abstracts International, 40, 2262-B.(
1970). Psychometric theory. New York: McGraw-Hill.(
1989). Item format and the structure of the EPI: A replication. Journal of Personality Assessment, 44, 283–288.(
1973). Effects of number of categories in rating scales on precision of estimation of scale values. Psychometrika, 38, 513–532.(
1990). Formato de respuesta, fiabilidad y validez en la medición de conflicto de rol [(
Answer format, reliability, and validity in the roll conflict measurement]. Psicológica, 11, 167–175.
1998). Efectos de formato de respuesta y método de estimación en el análisis factorial confirmatorio [(
Effect of the format answer and estimation method in the confirmatory factor analysis]. Psicothema, 10, 197–208.
1984). Item format and the structure of the personal orientation inventory. Applied Psychological Measurement, 8, 409–419.(
2004). Impact of the number of response categories and anchor labels on coefficient α test-retest reliability. Educational and Psychological Measurement, 64, 956–972.(