A Meta-Analytic Investigation of the Relationship Between Scale-Item Length, Label Format, and Reliability
Abstract
Abstract. Using two meta-analytic datasets, we investigated the effect that two scale-item characteristics – number of item response categories and item response-category label format – have on the reliability of multi-item rating scales. The first dataset contained 289 reliability coefficients harvested from 100 samples that measured Big Five traits. The second dataset contained 2,524 reliability coefficients harvested from 381 samples that measured a wide variety of constructs in psychology, marketing, management, and education. We performed moderator analyses on the two datasets with the two item characteristics and their interaction. As expected, as the number of item response categories increased, so did reliability, but more importantly, there was a significant interaction between the number of item response categories and item response-category label format. Increasing the number of response categories increased reliabilities for scale-items with all response categories labeled more so than for other item response-category label formats. We explain that the interaction may be due to both statistical and psychological factors. The present results help to explain why findings on the relationships between the two scale-item characteristics and reliability have been mixed.
References
1997). Feeling thermometers versus 7-point scales: Which are better? Sociological Methods & Research, 25, 318–340. doi: 10.1177/0049124197025003003
(1991). The reliability of survey attitude measurement: The influence of question and respondent attributes. Sociological Methods & Research, 20, 139–181. doi: 10.1177/0049124191020001005
(1984). Construct validity and error components of survey measures: A structural modeling approach. Public Opinion Quarterly, 48, 409–442. doi: 10.1086/268840
(1996). The effects of nonnormality and number of response categories on reliability. Applied Measurement in Education, 9, 151–160. doi: 10.1207/s15324818ame0902_4
(1953). The reliability of self-rating scales as a function of the amount of verbal anchoring and the number of categories on the scale. Journal of Applied Psychology, 37, 38–41. doi: 10.1037/h0057911
(2009). Introduction to meta-analysis. West Sussex, UK: Wiley.
(2011). The Five-Factor Model of personality traits and organizational citizenship behaviors: A meta-analysis. Journal of Applied Psychology, 96, 1140–1166. doi: 10.1037/a0024004
(1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21, 360–375. doi: 10.2307/3151463
(2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). New York, NY: Routledge.
(1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98–104. doi: 10.1037/0021-9010.78.1.98
(1979). Equidistant category labels for construction of Likert-type scales. Perceptual and Motor Skills, 49, 575–580. doi: 10.2466/pms.1979.49.2.575
(1972). Effects of some variation in rating scale characteristics on the means and reliabilities of ratings. Educational and Psychological Measurement, 32, 255–265. doi: 10.1177/001316447203200203
(2006). Skew and internal consistency. Journal of Applied Psychology, 91, 1351–1358. doi: 10.1037/0021-9010.91.6.1351
(2015). Do the readability and average item length of personality scales affect their reliability? Some meta-analytic answers. Journal of Individual Differences, 36, 54–63. doi: 10.1027/1614-2241/a000154
(2016). Response-scale formats and psychological distances between categories. Applied Psychological Measurement, 40, 73–75. doi: 10.1177/0146621615597961
(2016). A meta-analysis of the reliability of free and for-pay Big Five scales. Journal of Psychology, 150, 422–430. doi: 10.1080/00223980.2015.1060186
(2010). Multilevel analysis: Techniques and applications (2nd ed.). New York, NY: Routledge.
(1971). Three-point Likert scales are good enough. Journal of Marketing Research, 8, 495–500. doi: 10.2307/3150242
(1977). A Monte Carlo study of factors affecting three indices of composite scale reliability. Journal of Applied Psychology, 62, 392–398. doi: 10.1037/0021-9010.62.4.392
(1983). Ordinal measures in multiple indicator models: A simulation study of categorization error. American Sociological Review, 48, 398–407. doi: 10.2307/2095231
(1999). Survey research. Annual Review of Psychology, 50, 537–567. doi: 10.1146/annurev.psych.50.1.537
(2010).
(Question and questionnaire design . In J. D. WrightP. V. MarsdenEds., Handbook of Survey Research (2nd ed., pp. 263–314). San Diego, CA: Elsevier.2014). In search of the optimal number of response categories in a rating scale. Journal of Psychoeducational Assessment, 32, 663–673. doi: 10.1177/0734282914522200
(2001). Practical meta-analysis. Thousand Oaks, CA: Sage.
(1975). Effect of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60, 10–13. doi: 10.1037/h0076268
(2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 4, 73–79. doi: 10.1027/1614-2241.4.2.73
(1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.
(1989). Number of alternatives per choice point and stability of Likert-type scales. Perceptual and Motor Skills, 68, 549–550. doi: 10.2466/pms.1989.68.2.549
(1966). Comparative reliability of numerically anchored versus job-task anchored rating scales. Journal of Applied Psychology, 50, 90–92. doi: 10.1037/h0022935
(1994). A meta-analysis of Cronbach’s coefficient alpha. Journal of Consumer Research, 21, 381–391. doi: 10.1086/209405
(2013). On the relationship between coefficient alpha and composite reliability. Journal of Applied Psychology, 98, 194–198. doi: 10.1037/a0030767
(2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1–15. doi: 10.1016/s0001-6918(99)00050-5
(2006). Meta-analysis of coefficient alpha. Psychological Methods, 11, 306–322. doi: 10.1037/1082-989x.11.3.306
(2009). What are the consequences if the assumption of independent observations is violated in reliability generalization meta-analysis studies? Educational and Psychological Measurement, 69, 404–428. doi: 10.1177/0013164408323237
(2003). The science of asking questions. Annual Review of Sociology, 29, 65–88. doi: 10.1146/annurev.soc.29.110702.110112
(2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120. doi: 10.1007/s11336-008-9101-0
(1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101. doi: 10.2307/1412159
(1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295. doi: 10.1111/j.2044-8295.1910.tb00206.x
(1976). Choosing response categories for summated rating scales. Journal of Applied Psychology, 61, 374–375. doi: 10.1037/0021-9010.61.3.374
(2007). Meta-analysis of the Big Five and academic success at university. Zeitschrift für Psychologie, 215, 132–151. doi: 10.1027/0044-3409.215.2.132
(2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30, 261–293. doi: 10.3102/10769986030003261
(2000). Measurement error in “Big Five Factors” personality assessment: Reliability generalization across studies and measures. Educational and Psychological Measurement, 60, 224–235. doi: 10.1177/00131640021970475
(2012). Psychological distance between categories in the Likert scale: Comparing different numbers of options. Educational and Psychological Measurement, 72, 533–546. doi: 10.1177/0013164411431162
(2004). Impact of the number of response categories and anchor labels on coefficient alpha and test–retest reliability. Educational and Psychological Measurement, 64, 956–972. doi: 10.1177/0013164404268674
(