Skip to main content
Published Online:https://doi.org/10.1027/1614-2241/a000112

Abstract. Using two meta-analytic datasets, we investigated the effect that two scale-item characteristics – number of item response categories and item response-category label format – have on the reliability of multi-item rating scales. The first dataset contained 289 reliability coefficients harvested from 100 samples that measured Big Five traits. The second dataset contained 2,524 reliability coefficients harvested from 381 samples that measured a wide variety of constructs in psychology, marketing, management, and education. We performed moderator analyses on the two datasets with the two item characteristics and their interaction. As expected, as the number of item response categories increased, so did reliability, but more importantly, there was a significant interaction between the number of item response categories and item response-category label format. Increasing the number of response categories increased reliabilities for scale-items with all response categories labeled more so than for other item response-category label formats. We explain that the interaction may be due to both statistical and psychological factors. The present results help to explain why findings on the relationships between the two scale-item characteristics and reliability have been mixed.

References

  • Alwin, D. F. (1997). Feeling thermometers versus 7-point scales: Which are better? Sociological Methods & Research, 25, 318–340. doi: 10.1177/0049124197025003003 First citation in articleCrossrefGoogle Scholar

  • Alwin, D. F. & Krosnick, J. A. (1991). The reliability of survey attitude measurement: The influence of question and respondent attributes. Sociological Methods & Research, 20, 139–181. doi: 10.1177/0049124191020001005 First citation in articleCrossrefGoogle Scholar

  • Andrews, F. M. (1984). Construct validity and error components of survey measures: A structural modeling approach. Public Opinion Quarterly, 48, 409–442. doi: 10.1086/268840 First citation in articleCrossrefGoogle Scholar

  • Bandalos, D. L. & Enders, C. K. (1996). The effects of nonnormality and number of response categories on reliability. Applied Measurement in Education, 9, 151–160. doi: 10.1207/s15324818ame0902_4 First citation in articleCrossrefGoogle Scholar

  • Bendig, A. W. (1953). The reliability of self-rating scales as a function of the amount of verbal anchoring and the number of categories on the scale. Journal of Applied Psychology, 37, 38–41. doi: 10.1037/h0057911 First citation in articleCrossrefGoogle Scholar

  • Borenstein, M., Hedges, L. V., Higgins, J. P. T. & Rothstein, H. R. (2009). Introduction to meta-analysis. West Sussex, UK: Wiley. First citation in articleCrossrefGoogle Scholar

  • Chiaburu, D. S., Oh, I., Berry, C. M., Li, N. & Gardner, R. G. (2011). The Five-Factor Model of personality traits and organizational citizenship behaviors: A meta-analysis. Journal of Applied Psychology, 96, 1140–1166. doi: 10.1037/a0024004 First citation in articleCrossrefGoogle Scholar

  • Churchill, G. A. Jr. & Peter, J. P. (1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21, 360–375. doi: 10.2307/3151463 First citation in articleCrossrefGoogle Scholar

  • Cohen, J., Cohen, P., West, S. G. & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). New York, NY: Routledge. First citation in articleGoogle Scholar

  • Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98–104. doi: 10.1037/0021-9010.78.1.98 First citation in articleCrossrefGoogle Scholar

  • Dobson, K. S. & Mothersill, K. J. (1979). Equidistant category labels for construction of Likert-type scales. Perceptual and Motor Skills, 49, 575–580. doi: 10.2466/pms.1979.49.2.575 First citation in articleCrossrefGoogle Scholar

  • Finn, R. H. (1972). Effects of some variation in rating scale characteristics on the means and reliabilities of ratings. Educational and Psychological Measurement, 32, 255–265. doi: 10.1177/001316447203200203 First citation in articleCrossrefGoogle Scholar

  • Greer, T., Dunlap, W. P., Hunter, S. T. & Berman, M. E. (2006). Skew and internal consistency. Journal of Applied Psychology, 91, 1351–1358. doi: 10.1037/0021-9010.91.6.1351 First citation in articleCrossrefGoogle Scholar

  • Hamby, T. & Ickes, W. (2015). Do the readability and average item length of personality scales affect their reliability? Some meta-analytic answers. Journal of Individual Differences, 36, 54–63. doi: 10.1027/1614-2241/a000154 First citation in articleLinkGoogle Scholar

  • Hamby, T. & Levine, D. (2016). Response-scale formats and psychological distances between categories. Applied Psychological Measurement, 40, 73–75. doi: 10.1177/0146621615597961 First citation in articleCrossrefGoogle Scholar

  • Hamby, T., Taylor, W., Snowden, A. K. & Peterson, R. A. (2016). A meta-analysis of the reliability of free and for-pay Big Five scales. Journal of Psychology, 150, 422–430. doi: 10.1080/00223980.2015.1060186 First citation in articleCrossrefGoogle Scholar

  • Hox, J. J. (2010). Multilevel analysis: Techniques and applications (2nd ed.). New York, NY: Routledge. First citation in articleCrossrefGoogle Scholar

  • Jacoby, J. & Matell, M. S. (1971). Three-point Likert scales are good enough. Journal of Marketing Research, 8, 495–500. doi: 10.2307/3150242 First citation in articleCrossrefGoogle Scholar

  • Jenkins, C. G. Jr. & Taber, T. D. (1977). A Monte Carlo study of factors affecting three indices of composite scale reliability. Journal of Applied Psychology, 62, 392–398. doi: 10.1037/0021-9010.62.4.392 First citation in articleCrossrefGoogle Scholar

  • Johnson, D. R. & Creech, J. C. (1983). Ordinal measures in multiple indicator models: A simulation study of categorization error. American Sociological Review, 48, 398–407. doi: 10.2307/2095231 First citation in articleCrossrefGoogle Scholar

  • Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537–567. doi: 10.1146/annurev.psych.50.1.537 First citation in articleCrossrefGoogle Scholar

  • Krosnick, J. A. & Presser, S. (2010). Question and questionnaire design. In J. D. WrightP. V. MarsdenEds., Handbook of Survey Research (2nd ed., pp. 263–314). San Diego, CA: Elsevier. First citation in articleGoogle Scholar

  • Lee, J. & Paek, I. (2014). In search of the optimal number of response categories in a rating scale. Journal of Psychoeducational Assessment, 32, 663–673. doi: 10.1177/0734282914522200 First citation in articleCrossrefGoogle Scholar

  • Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage. First citation in articleGoogle Scholar

  • Lissitz, R. W. & Green, S. B. (1975). Effect of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60, 10–13. doi: 10.1037/h0076268 First citation in articleCrossrefGoogle Scholar

  • Lozano, L. M., García-Cueto, E. & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 4, 73–79. doi: 10.1027/1614-2241.4.2.73 First citation in articleLinkGoogle Scholar

  • Nunnally, J. C. & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill. First citation in articleGoogle Scholar

  • Oaster, T. R. F. (1989). Number of alternatives per choice point and stability of Likert-type scales. Perceptual and Motor Skills, 68, 549–550. doi: 10.2466/pms.1989.68.2.549 First citation in articleCrossrefGoogle Scholar

  • Peters, D. L. & McCormick, E. J. (1966). Comparative reliability of numerically anchored versus job-task anchored rating scales. Journal of Applied Psychology, 50, 90–92. doi: 10.1037/h0022935 First citation in articleCrossrefGoogle Scholar

  • Peterson, R. A. (1994). A meta-analysis of Cronbach’s coefficient alpha. Journal of Consumer Research, 21, 381–391. doi: 10.1086/209405 First citation in articleCrossrefGoogle Scholar

  • Peterson, R. A. & Kim, Y. (2013). On the relationship between coefficient alpha and composite reliability. Journal of Applied Psychology, 98, 194–198. doi: 10.1037/a0030767 First citation in articleCrossrefGoogle Scholar

  • Preston, C. C. & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1–15. doi: 10.1016/s0001-6918(99)00050-5 First citation in articleCrossrefGoogle Scholar

  • Rodriguez, M. C. & Maeda, Y. (2006). Meta-analysis of coefficient alpha. Psychological Methods, 11, 306–322. doi: 10.1037/1082-989x.11.3.306 First citation in articleCrossrefGoogle Scholar

  • Romano, J. L. & Kromrey, J. D. (2009). What are the consequences if the assumption of independent observations is violated in reliability generalization meta-analysis studies? Educational and Psychological Measurement, 69, 404–428. doi: 10.1177/0013164408323237 First citation in articleCrossrefGoogle Scholar

  • Schaeffer, N. C. & Presser, S. (2003). The science of asking questions. Annual Review of Sociology, 29, 65–88. doi: 10.1146/annurev.soc.29.110702.110112 First citation in articleCrossrefGoogle Scholar

  • Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120. doi: 10.1007/s11336-008-9101-0 First citation in articleCrossrefGoogle Scholar

  • Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101. doi: 10.2307/1412159 First citation in articleCrossrefGoogle Scholar

  • Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295. doi: 10.1111/j.2044-8295.1910.tb00206.x First citation in articleCrossrefGoogle Scholar

  • Spector, P. E. (1976). Choosing response categories for summated rating scales. Journal of Applied Psychology, 61, 374–375. doi: 10.1037/0021-9010.61.3.374 First citation in articleCrossrefGoogle Scholar

  • Trapmann, S., Hell, B., Hirn, J. W. & Schular, H. (2007). Meta-analysis of the Big Five and academic success at university. Zeitschrift für Psychologie, 215, 132–151. doi: 10.1027/0044-3409.215.2.132 First citation in articleLinkGoogle Scholar

  • Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30, 261–293. doi: 10.3102/10769986030003261 First citation in articleCrossrefGoogle Scholar

  • Viswesvaran, C. & Ones, D. (2000). Measurement error in “Big Five Factors” personality assessment: Reliability generalization across studies and measures. Educational and Psychological Measurement, 60, 224–235. doi: 10.1177/00131640021970475 First citation in articleCrossrefGoogle Scholar

  • Wakita, T., Ueshima, N. & Noguchi, H. (2012). Psychological distance between categories in the Likert scale: Comparing different numbers of options. Educational and Psychological Measurement, 72, 533–546. doi: 10.1177/0013164411431162 First citation in articleCrossrefGoogle Scholar

  • Weng, L. J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test–retest reliability. Educational and Psychological Measurement, 64, 956–972. doi: 10.1177/0013164404268674 First citation in articleCrossrefGoogle Scholar