Original Article

A Meta-Analytic Investigation of the Relationship Between Scale-Item Length, Label Format, and Reliability

Tyler Hamby

Department of Psychology, The University of Texas at Arlington, TX, USA

Search for more papers by this author

and

Robert A. Peterson

IC Institute, The University of Texas at Austin, TX, USA

Search for more papers by this author

Published Online:October 05, 2016https://doi.org/10.1027/1614-2241/a000112

Abstract

Abstract. Using two meta-analytic datasets, we investigated the effect that two scale-item characteristics – number of item response categories and item response-category label format – have on the reliability of multi-item rating scales. The first dataset contained 289 reliability coefficients harvested from 100 samples that measured Big Five traits. The second dataset contained 2,524 reliability coefficients harvested from 381 samples that measured a wide variety of constructs in psychology, marketing, management, and education. We performed moderator analyses on the two datasets with the two item characteristics and their interaction. As expected, as the number of item response categories increased, so did reliability, but more importantly, there was a significant interaction between the number of item response categories and item response-category label format. Increasing the number of response categories increased reliabilities for scale-items with all response categories labeled more so than for other item response-category label formats. We explain that the interaction may be due to both statistical and psychological factors. The present results help to explain why findings on the relationships between the two scale-item characteristics and reliability have been mixed.

References

Alwin, D. F. (1997). Feeling thermometers versus 7-point scales: Which are better? Sociological Methods & Research, 25, 318–340. doi: 10.1177/0049124197025003003 First citation in article Crossref, Google Scholar
Alwin, D. F. & Krosnick, J. A. (1991). The reliability of survey attitude measurement: The influence of question and respondent attributes. Sociological Methods & Research, 20, 139–181. doi: 10.1177/0049124191020001005 First citation in article Crossref, Google Scholar
Andrews, F. M. (1984). Construct validity and error components of survey measures: A structural modeling approach. Public Opinion Quarterly, 48, 409–442. doi: 10.1086/268840 First citation in article Crossref, Google Scholar
Bandalos, D. L. & Enders, C. K. (1996). The effects of nonnormality and number of response categories on reliability. Applied Measurement in Education, 9, 151–160. doi: 10.1207/s15324818ame0902_4 First citation in article Crossref, Google Scholar
Bendig, A. W. (1953). The reliability of self-rating scales as a function of the amount of verbal anchoring and the number of categories on the scale. Journal of Applied Psychology, 37, 38–41. doi: 10.1037/h0057911 First citation in article Crossref, Google Scholar
Borenstein, M., Hedges, L. V., Higgins, J. P. T. & Rothstein, H. R. (2009). Introduction to meta-analysis. West Sussex, UK: Wiley. First citation in article Crossref, Google Scholar
Chiaburu, D. S., Oh, I., Berry, C. M., Li, N. & Gardner, R. G. (2011). The Five-Factor Model of personality traits and organizational citizenship behaviors: A meta-analysis. Journal of Applied Psychology, 96, 1140–1166. doi: 10.1037/a0024004 First citation in article Crossref, Google Scholar
Churchill, G. A. Jr. & Peter, J. P. (1984). Research design effects on the reliability of rating scales: A meta-analysis. Journal of Marketing Research, 21, 360–375. doi: 10.2307/3151463 First citation in article Crossref, Google Scholar
Cohen, J., Cohen, P., West, S. G. & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). New York, NY: Routledge. First citation in article Google Scholar
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98–104. doi: 10.1037/0021-9010.78.1.98 First citation in article Crossref, Google Scholar
Dobson, K. S. & Mothersill, K. J. (1979). Equidistant category labels for construction of Likert-type scales. Perceptual and Motor Skills, 49, 575–580. doi: 10.2466/pms.1979.49.2.575 First citation in article Crossref, Google Scholar
Finn, R. H. (1972). Effects of some variation in rating scale characteristics on the means and reliabilities of ratings. Educational and Psychological Measurement, 32, 255–265. doi: 10.1177/001316447203200203 First citation in article Crossref, Google Scholar
Greer, T., Dunlap, W. P., Hunter, S. T. & Berman, M. E. (2006). Skew and internal consistency. Journal of Applied Psychology, 91, 1351–1358. doi: 10.1037/0021-9010.91.6.1351 First citation in article Crossref, Google Scholar
Hamby, T. & Ickes, W. (2015). Do the readability and average item length of personality scales affect their reliability? Some meta-analytic answers. Journal of Individual Differences, 36, 54–63. doi: 10.1027/1614-2241/a000154 First citation in article Link, Google Scholar
Hamby, T. & Levine, D. (2016). Response-scale formats and psychological distances between categories. Applied Psychological Measurement, 40, 73–75. doi: 10.1177/0146621615597961 First citation in article Crossref, Google Scholar
Hamby, T., Taylor, W., Snowden, A. K. & Peterson, R. A. (2016). A meta-analysis of the reliability of free and for-pay Big Five scales. Journal of Psychology, 150, 422–430. doi: 10.1080/00223980.2015.1060186 First citation in article Crossref, Google Scholar
Hox, J. J. (2010). Multilevel analysis: Techniques and applications (2nd ed.). New York, NY: Routledge. First citation in article Crossref, Google Scholar
Jacoby, J. & Matell, M. S. (1971). Three-point Likert scales are good enough. Journal of Marketing Research, 8, 495–500. doi: 10.2307/3150242 First citation in article Crossref, Google Scholar
Jenkins, C. G. Jr. & Taber, T. D. (1977). A Monte Carlo study of factors affecting three indices of composite scale reliability. Journal of Applied Psychology, 62, 392–398. doi: 10.1037/0021-9010.62.4.392 First citation in article Crossref, Google Scholar
Johnson, D. R. & Creech, J. C. (1983). Ordinal measures in multiple indicator models: A simulation study of categorization error. American Sociological Review, 48, 398–407. doi: 10.2307/2095231 First citation in article Crossref, Google Scholar
Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537–567. doi: 10.1146/annurev.psych.50.1.537 First citation in article Crossref, Google Scholar
Krosnick, J. A. & Presser, S. (2010). Question and questionnaire design. In J. D. WrightP. V. MarsdenEds., Handbook of Survey Research (2nd ed., pp. 263–314). San Diego, CA: Elsevier. First citation in article Google Scholar
Lee, J. & Paek, I. (2014). In search of the optimal number of response categories in a rating scale. Journal of Psychoeducational Assessment, 32, 663–673. doi: 10.1177/0734282914522200 First citation in article Crossref, Google Scholar
Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage. First citation in article Google Scholar
Lissitz, R. W. & Green, S. B. (1975). Effect of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60, 10–13. doi: 10.1037/h0076268 First citation in article Crossref, Google Scholar
Lozano, L. M., García-Cueto, E. & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 4, 73–79. doi: 10.1027/1614-2241.4.2.73 First citation in article Link, Google Scholar
Nunnally, J. C. & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill. First citation in article Google Scholar
Oaster, T. R. F. (1989). Number of alternatives per choice point and stability of Likert-type scales. Perceptual and Motor Skills, 68, 549–550. doi: 10.2466/pms.1989.68.2.549 First citation in article Crossref, Google Scholar
Peters, D. L. & McCormick, E. J. (1966). Comparative reliability of numerically anchored versus job-task anchored rating scales. Journal of Applied Psychology, 50, 90–92. doi: 10.1037/h0022935 First citation in article Crossref, Google Scholar
Peterson, R. A. (1994). A meta-analysis of Cronbach’s coefficient alpha. Journal of Consumer Research, 21, 381–391. doi: 10.1086/209405 First citation in article Crossref, Google Scholar
Peterson, R. A. & Kim, Y. (2013). On the relationship between coefficient alpha and composite reliability. Journal of Applied Psychology, 98, 194–198. doi: 10.1037/a0030767 First citation in article Crossref, Google Scholar
Preston, C. C. & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104, 1–15. doi: 10.1016/s0001-6918(99)00050-5 First citation in article Crossref, Google Scholar
Rodriguez, M. C. & Maeda, Y. (2006). Meta-analysis of coefficient alpha. Psychological Methods, 11, 306–322. doi: 10.1037/1082-989x.11.3.306 First citation in article Crossref, Google Scholar
Romano, J. L. & Kromrey, J. D. (2009). What are the consequences if the assumption of independent observations is violated in reliability generalization meta-analysis studies? Educational and Psychological Measurement, 69, 404–428. doi: 10.1177/0013164408323237 First citation in article Crossref, Google Scholar
Schaeffer, N. C. & Presser, S. (2003). The science of asking questions. Annual Review of Sociology, 29, 65–88. doi: 10.1146/annurev.soc.29.110702.110112 First citation in article Crossref, Google Scholar
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120. doi: 10.1007/s11336-008-9101-0 First citation in article Crossref, Google Scholar
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101. doi: 10.2307/1412159 First citation in article Crossref, Google Scholar
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295. doi: 10.1111/j.2044-8295.1910.tb00206.x First citation in article Crossref, Google Scholar
Spector, P. E. (1976). Choosing response categories for summated rating scales. Journal of Applied Psychology, 61, 374–375. doi: 10.1037/0021-9010.61.3.374 First citation in article Crossref, Google Scholar
Trapmann, S., Hell, B., Hirn, J. W. & Schular, H. (2007). Meta-analysis of the Big Five and academic success at university. Zeitschrift für Psychologie, 215, 132–151. doi: 10.1027/0044-3409.215.2.132 First citation in article Link, Google Scholar
Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30, 261–293. doi: 10.3102/10769986030003261 First citation in article Crossref, Google Scholar
Viswesvaran, C. & Ones, D. (2000). Measurement error in “Big Five Factors” personality assessment: Reliability generalization across studies and measures. Educational and Psychological Measurement, 60, 224–235. doi: 10.1177/00131640021970475 First citation in article Crossref, Google Scholar
Wakita, T., Ueshima, N. & Noguchi, H. (2012). Psychological distance between categories in the Likert scale: Comparing different numbers of options. Educational and Psychological Measurement, 72, 533–546. doi: 10.1177/0013164411431162 First citation in article Crossref, Google Scholar
Weng, L. J. (2004). Impact of the number of response categories and anchor labels on coefficient alpha and test–retest reliability. Educational and Psychological Measurement, 64, 956–972. doi: 10.1177/0013164404268674 First citation in article Crossref, Google Scholar

Volume 12Issue 3July 2016

ISSN: 1614-1881eISSN: 1614-2241

History

ReceivedDecember 5, 2014
RevisedNovember 9, 2015
AcceptedMay 5, 2016
Published onlineOctober 5, 2016

Licenses & Copyright

Keywords

Acknowledgments:

Tyler Hamby was a graduate student in Department of Psychology at The University of Texas at Arlington at the time of the study.

PDF download

Verify Phone

Congrats!

A Meta-Analytic Investigation of the Relationship Between Scale-Item Length, Label Format, and Reliability

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

A Meta-Analytic Investigation of the Relationship Between Scale-Item Length, Label Format, and Reliability

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners