Skip to main content
Free AccessEditorial

On the Changing Role of Cronbach’s α in the Evaluation of the Quality of a Measure

Published Online:https://doi.org/10.1027/1015-5759/a000069

A long time ago Cronbach’s (1951) α was proposed for investigating the quality of a measure in the framework of classical test theory. Cronbach’s α is considered an indicator of internal consistency, one of the four major methods of estimating reliability (the others being the parallel-test, test-retest, and split-half methods). The concept of consistency emerged from the idea that items representing the same construct should stimulate a consistent way of responding. Accordingly, Cronbach’s α is a statistic that reflects the degree of consistency in response to the items of a measure. Its size varies between 0 and 1, whereby 1 indicates perfect consistency. Measures of a good psychometric quality are expected to surmount a specific limit. Nunnally (1978) suggests a limit of .70 for measures with an acceptable degree of consistency; a good degree requires .80.

For a long time Cronbach’s α was one of the major assets of classical test theory, a well-established statistic that could be found in virtually every paper reporting an evaluation of a measure. Checking the papers published in the 2010 issues of European Journal of Psychological Assessment reveals that 80.5% of them include a report of consistency based on Cronbach’s α. However, in some cases Cronbach’s α is not included in the results section, but rather is found in the method section. This means that in 66.7% of the papers a new Cronbach’s α is actually reported.

Because of the large number of reports that include αs it is more interesting to check the papers without such a report. The paper by Alonso-Arbiol and van de Vijver (2010) reports a meta-analysis so that Cronbach’s α does not apply. Ayis, Paul, and Ebrahim (2010) are concerned with the search for a cutpoint and, therefore, need no index of consistency. The search for the best response format characterizes Kubinger and Wolfsbauer’s (2010) paper, so that there is also no need for Cronbach’s α. Schroeders and Wilhelm (2010) present omega results instead of α results. As already indicated, other authors include established measures in their papers and report the Cronbach’s αs of these measures in the results sections (Crocetti, Schwartz, Fermani, & Meeus, 2010; Daoud & Abojedi, 2010; DiGuita, Eisenberg, Kupfer, Steca, Tramontano, & Caprara, 2010; Petermann, Petermann, & Schreyer, 2010; Vierhaus, Lohaus, & Shah, 2010). Furthermore, there are a few research reports that explicitly omit mentioning Cronbach’s α (Blickle, Kramer, & Mierke, 2010; Molinengo & Testa, 2010; Newman, Limbers, & Varni, 2010).

The popularity of Cronbach’s α presumably has three reasons. First, it can easily be made available since it requires only the application of an additional statistical procedure. This procedure is nowadays part of every statistical program package. Second, in most cases favorable results can relatively easily be accomplished since an improvement is usually possible by eliminating ill-fitting items without detrimental consequences. One must only take care to include some extra items so that the elimination has no negative consequences for the quality. Third, it can be used without special training and is rather robust. The likelihood of an error in application is very low.

However, many researchers no longer consider classical test theory as state-of-the-art, gradually replacing it by congeneric test theory (Lucke, 2005; McDonald, 1999; Raykov, 1997), although this shift seems not yet to have really become aware to all the members of the relevant scientific public. Congeneric test theory emphasizes model fit with respect to the congeneric model of measurement (Jöreskog, 1971) as the standard model instead of consistency. It suggests the perspective of the model, and the major characteristic of this model is that it tends to produce measures that are really one-dimensional. As a consequence, there is reason to expect that a specific measure, accordingly constructed, truly represents a specific construct. So there has been an implicit move from emphasizing consistency to emphasizing homogeneity. This is a move in the direction of a property that characterizes many item response models (Fischer, 1974).

Behavioral consistency can result from a combination of related abilities or traits, so that the indication of a high degree of consistency does not necessarily mean homogeneity. The main ingredients of Cronbach’s α are the variances of the individual items and the variance of all the items, meaning the correlations between the items also contribute. The correlations may be high because of different combinations of underlying abilities or traits. It is therefore a measure of true consistency – but it is not perfectly appropriate for the construction of measures that represent a specific ability or trait only. In contrast, items influenced by a combination of related abilities or traits are not likely to stand the test of model fit since deviations from homogeneity are punished in investigating model fit on the basis of the congeneric model.

Another statistic has been developed which shows a considerable similarity with Cronbach’s α but aims at homogeneity instead of consistency. It is the omega coefficient (McDonald, 1999, p. 90), and it is in agreement with congeneric test theory. This coefficient is based on the factor loadings on the relevant dimension instead on correlations, so that contributions from other dimensions are excluded and cannot lead to an overestimation. Unfortunately, only one of the papers published in European Journal of Psychological Assessment in 2010 reports omega results (Schroeders & Wilhelm, 2010). More papers considering this coefficient would be highly desirable.

The move regarding the dominating test theory signifies that the role of Cronbach’s α is changing. It represents a test theory that is no longer considered as state-of-the-art. This does not mean that Cronbach’s α produces wrong results, but that there are methods providing results considered to be more precise and more appropriate, and that in itself represents a critique (Raykov, 2001; Schweizer, Altmeyer, Reiß, & Schreiner, 2010). What remains for the moment is the role of a link that relates the test construction of the past to the test construction of the future. Cronbach’s α presents an established standard, and for a while it may serve as a kind of comparison level that helps to overcome the distrust in the new developments.

References

  • Alonso-Arbiol, I. , van de Vijver, F. (2010). A historical analysis of the European Journal of Psychological Assessment. European Journal of Psychological Assessment, 26, 238–247. First citation in articleLinkGoogle Scholar

  • Ayis, S. , Paul, C. , & Ebrahim, S. (2010). Psychological disorder in old age: better identification for better treatment. European Journal of Psychological Assessment, 26, 39–45. First citation in articleLinkGoogle Scholar

  • Blickle, G. , Kramer, J. , Mierke, J. (2010). Telephone-administered intelligence testing for research in work and organizational psychology. European Journal of Psychological Assessment, 26, 154–161. First citation in articleLinkGoogle Scholar

  • Crocetti, E. , Schwartz, S. J. , Fermani, A. , Meeus, W. (2010). The Utrecht-Management of Identità Commitments Scale (U-MICS): Italian validation and cross-national comparisons. European Journal of Psychological Assessment, 26, 172–186. First citation in articleLinkGoogle Scholar

  • Cronbach, L. J. (1951). Coefficient α and the internal structure of tests. Psychometrika, 16, 297–334. First citation in articleCrossrefGoogle Scholar

  • Daoud, F. S. , Abojedi, A. A. (2010). Equivalent factorial structure in the Brief Symptom Inventory (BSI) in clinical and nonclinical Jordanian populations. European Journal of Psychological Assessment, 26, 116–121. First citation in articleLinkGoogle Scholar

  • Di Giunta, L. , Eisenberg, N. , Kupfer, A. , Steca, P. , Tramontano, C. , Caprara, G. V. (2010). Assessing Perceived Empathic and Social Self-Efficacy Across Countries. European Journal of Psychological Assessment, 26, 77–86. First citation in articleLinkGoogle Scholar

  • Fischer, G. (1974). Einführung in die Theorie psychologischer Tests, Grundlagen und Anwendungen [Introduction to the theory of psychological tests, foundation and application]. Bern: Huber. First citation in articleGoogle Scholar

  • Jöreskog, K. G., (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109–133. First citation in articleCrossrefGoogle Scholar

  • Kubinger, K. , Wolfbauer, C. (2010). On the risk of certain psycho-technological response options in multiple-choice tests: Does a particular personality handicap examinees? European Journal of Psychological Assessment, 26, 302–308. First citation in articleLinkGoogle Scholar

  • Lucke, J. F. (2005). The alpha and the omega of congeneric test theory: An extension of reliability and internal consistency of heterogeneous tests. Applied Psychological Measurement, 29, 65–81. First citation in articleCrossrefGoogle Scholar

  • McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum. First citation in articleGoogle Scholar

  • Molinengo, G. , Testa, S. (2010). Analysis of the psychometric properties of an assessment tool for deviant behavior in adolescence. European Journal of Psychological Assessment, 26, 108–115. First citation in articleLinkGoogle Scholar

  • Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill. First citation in articleGoogle Scholar

  • Newman, D. A. , Limbers, C. A. , Varni, J. W. (2010). Factorial invariance of child self-report across English and Spanish language groups in a Hispanic population utilizing PedsQL TM 4.0 Generic Core Scales. European Journal of Psychological Assessment, 26, 194–202. First citation in articleLinkGoogle Scholar

  • Petermann, U. , Petermann, F. , Schreyer, I. (2010). The German Strengths and Difficulties Questionnaire: Validity of the teacher version for preschoolers. European Journal of Psychological Assessment, 26, 256–262. First citation in articleLinkGoogle Scholar

  • Raykov, T. (1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21, 173–184. First citation in articleCrossrefGoogle Scholar

  • Raykov, T. (2001). Bias in coefficient α for fixed congeneric measures with correlated errors. Applied Psychological Measurement, 25, 69–76. First citation in articleCrossrefGoogle Scholar

  • Schroeders, U. , Wilhelm, O. (2010). Testing reasoning ability with handheld computers, notebooks, and paper and pencil. European Journal of Psychological Assessment, 26, 284–292. First citation in articleLinkGoogle Scholar

  • Schweizer, K. , Altmeyer, M. , Reiß, S. , Schreiner, M. (2010). The c-bifactor model as a tool for the construction of semihomogeneous upper-level measures. Psychological Test and Assessment Modeling, 52, 298–312. First citation in articleGoogle Scholar

  • Vierhaus, M. , Lohaus, A. , Shah, I. (2010). Internalizing behavior during the transition from childhood to adolescence: Separating age from retest effects. European Journal of Psychological Assessment, 26, 187–193. First citation in articleLinkGoogle Scholar

Karl Schweizer, Department of Psychology, Goethe University Frankfurt, Mertonstr. 17, 60054 Frankfurt a.M., Germany, +49 69 798-22081, +49 69 798-23847,