Estimation of and Confidence Interval Formation for Reliability Coefficients of Homogeneous Measurement Instruments
Abstract
The reliability of a composite score is a fundamental and important topic in the social and behavioral sciences. The most commonly used reliability estimate of a composite score is coefficient α. However, under regularity conditions, the population value of coefficient α is only a lower bound on the population reliability, unless the items are essentially τ-equivalent, an assumption that is likely violated in most applications. A generalization of coefficient α, termed ω, is discussed and generally recommended. Furthermore, a point estimate itself almost certainly differs from the population value. Therefore, it is important to provide confidence interval limits so as not to overinterpret the point estimate. Analytic and bootstrap methods are described in detail for confidence interval construction for ω. We go on to recommend the bias-corrected bootstrap approach for ω and provide open source and freely available R functions via the MBESS package to implement the methods discussed.
References
2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association.
. (2009). Alpha, distribution-free, and model-based internal consistency reliability. Psychometrika, 74, 137–143.
(2002). Statistical inference (2nd ed.). Pacific Grove, CA: Duxbury Press.
(2009). Constructing approximate confidence intervals for parameters with structural constructing approximate confidence intervals for parameters with structural equation models. Structural Equation Modeling, 16, 267–294.
(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.
(2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64, 391–418.
(1987). Better bootstrap confidence intervals. Journal of American Statistical Association, 82, 171–185.
(1998). R. A. Fisher in the 21st century. Statistical Science, 13, 95–114.
(1993). An introduction to the bootstrap. New York, NY: Chapman & Hall/CRC.
(2001). Confidence intervals around score reliability coefficients, please: An EPM guidelines editorial. Educational and Psychological Measurement, 61, 517–532.
(2008–2010). Retrieved from www.fordpas.org/.
(2000). Correlated errors in true score models and their effect on coefficient alpha. Structural Equation Modeling, 7, 251–270.
(2009a). Commentary on coefficient alpha: A cautionary tale. Psychometrika, 74, 121–135.
(2009b). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 74, 155–167.
(2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Erlbaum.
(1954). Psychometric methods (2nd ed.). New York, NY: McGraw-Hill Book Company.
(1950). Theory of mental tests. New York, NY: Wiley.
(1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282.
(1996). Teams in organizations: Recent research on performance and effectiveness. Annual Review of Psychology, 47, 307–338.
(1991). Statistical intervals: A guide for practitioners. New York, NY: Wiley.
(1997). What if there were no significance tests?. Mahwah, NJ: Erlbaum.
(2004). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage.
(2007). A history and overview of psychometrics. In , Handbook of statistics: Psychometrics (Vol. 26, pp. 1–27). New York, NY: Elsevier.
(1996). LISREL 8: User’s reference guide (2nd ed.). Chicago, IL: Scientific Software International.
(2005). The effects of nonnormal distributions on confidence intervals around the standardized mean difference: Bootstrapping as an alternative to parametric confidence intervals. Educational and Psychological Measurement, 65, 51–69.
(2007a). Confidence intervals for standardized effect sizes: Theory, application, and implementation. Journal of Statistical Software, 20(8), 1–24.
(2007b). Methods for the Behavioral, Educational, and Educational Sciences: An R package. Behavior Research Methods, 39, 979–984.
(2010). MBESS 3.0 (or greater): [computer software and manual]. Retrieved from www.cran.r-project.org/.
(1997). Effect of simultaneous violations of essential τ-equivalence and uncorrelated error on coefficient α . Applied Psychological Measurement, 21, 337–348.
(2006). Enhancing the effectiveness of work groups and teams. Psychological Science in the Public Interest, 7, 77–124.
(1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160.
(1997). A unifying expression for the maximal reliability of a linear composite. Psychometrika, 62, 245–249.
(1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
(1984). Some algebraic properties of the reticular action model for moment structures. British Journal of Mathematical and Statistical Psychology, 37, 234–251.
(1970). The theoretical foundations of principal factor analysis, canonical factor analysis, and alpha factor analysis. British Journal of Mathematical and Statistical Psychology, 38, 1–21.
(1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
(1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166.
(1974). The jackknife – A review. Biometrika, 61, 1–15.
(1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 1–13.
(1992). A note on the delta method. The American Statistician, 46, 27–29.
(2001). In all likelihood: Statistical modelling and inference using likelihood. New York, NY: Oxford University Press.
(2010). R: A language and environment for statistical computing
([Computer software manual] . Vienna, Austria: R Development Core Team. (ISBN 3-900051-07-0).1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21, 173–184.
(2002). Analytic estimation of standard error and confidence interval for scale reliability. Multivariate Behavioral Research, 37, 89–103.
(1979). Hierarchical cluster-analysis and the internal structure of tests. Multivariate Behavioral Research, 14, 57–74.
(2009). Coefficients alpha, beta, omega, and the GLB: Comments on Sijtsma. Psychometrika, 74, 145–154.
(1995). Theory of statistics. New York, NY: Springer.
(1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 115–129.
(2009a). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120.
(2009b). Reliability beyond theory and into practice. Psychometrika, 74, 169–173.
(1904). “General intelligence”, objectively determined and measured. American Journal of Psychology, 15, 201–292.
(1999). The challenges of supporting work team effectiveness. In , Supporting work team effectiveness (pp. 3–23). San Francisco, CA: Jossey-Bass.
(2006). Standards for reporting on empirical social science research in AERA publications, american educational. Washington, DC: American Educational Research Association.
(2004). The greatest lower bound to the reliability of a test and the hypothesis of unidimensionality. Psychometrika, 69, 613–625.
(2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31, 25–32.
(2003). Understanding reliability and coefficient alpha, really. In , Score reliability: Contemporary thinking on reliability issues (pp. 3–23). Thousand Oaks, CA: Sage.
(2009). Assessing teamwork and collaboration in high school students: A multi-method approach. Canadian Journal of School Psychology, 24, 108–124.
(1999). Statistical methods in psychology: Guidelines and explanations. American Psychologist, 54, 594–604.
(2002). On robustness of the normal-theory based asymptotic distributions of three reliability coefficient estimates. Psychometrika, 67, 251–259.
(2003). A study of the distribution of sample coefficient alpha with the Hopkins symptom checklist: Bootstrap versus asymptotics. Educational and Psychological Measurement, 63, 5–23.
(2008). Development and validity evidence supporting a teamwork and collaboration assessment for high school students ETS Research Report RR-08-50 Princeton, NJ: Educational Testing Service.
(1975). Probability spaces, Hilbert spaces, and the axioms of test theory. Psychometrika, 40, 395–412.
(1993). Coefficient alpha as an estimate of test reliability under violations of two assumptions. Educational and Psychological Measurement, 53, 33–49.
(2005). Cronbach’s α, Revelle’s β, and Mcdonald’s ω h : Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70, 123–133.
(