Skip to main content
Published Online:https://doi.org/10.1027/1015-5759/a000094

Whether errors can be correlated is an old matter that is as yet unsettled. The literature reveals that the discussion concerning correlated errors began some 80 years ago (e.g., Aitken, 1934). In the framework of confirmatory factor analysis this question has even become ubiquitous since correlated errors developed into the preferred means of overcoming problems concerning model fit. Virtually every ill-fitting model can be transformed into a well-fitting one by allowing pairs of error components to correlate with each other. The larger the number of pairs of correlated error components, the better the model fit usually is.

Although I am not encouraging the use of correlated errors (Schweizer, 2010), a number of manuscripts reporting correlated errors were submitted to the European Journal of Psychological Assessment in the past – and some of them were in fact published. The issues of the last 2 years include eight papers (De Carvalho, Leite, Seminotti, Freitas, & de Lourdes Drachler, 2011; Derkman, Scholte, Van der Veld, & Markland, 2010; Di Giunta, Eisenberg, Kupfer, Steca, Tramontano, & Caprara, 2010; Höfling, Moosbrugger, Schermelleh-Engel, & Heidenreich, 2011; Mumma, 2011; Teubert & Pinquart, 2011; Veirman, Brouwers, & Fontaine, 2011; Zohar, Denollet, Ari, & Cloninger, 2011) that report models including correlated error components. It should be added that the numbers of correlated error components reported in most articles are small. The publication of these papers gratifies scientific honesty in a world that suffers from publication pressure. Furthermore, it should be added that my investigation of the EJPA issues even revealed one article including the statement that “correlated errors ... were not permitted” (Houghton, Durkin, Ang, Taylor, & Brandtman, 2011).

However, the use of correlated errors is risky with respect to validity because of the damage that is done to the model. The likelihood is high that such a model no longer sufficiently reflects the concept it is assumed to represent. If a model with correlated errors is applied for test construction, the structural validity of the measure is likely to be impaired. Reasons excluding such damage are seldom given. There may be cases where a characteristic of a specific sample or simply chance justifies correlated errors; however, such cases are really rare (Zimmerman & Williams, 1977).

From the perspective of philosophy of science correlated errors are a disaster since they in no way contribute to our knowledge. Including correlated errors into a model does not create a revised model that may be considered as an alternative to the original model. Although the revised model would be a posthoc model, it would be a valuable model since it would be a specific model, i.e., a model that contributes to knowledge. Instead, correlated errors simply weaken the testability of the model. One can no longer increase our knowledge by disproving the hypothesis, i.e., the major hypothesis underlying the model (Popper, 1934), respectively by contributing to a fair balance of favorable and unfavorable findings concerning the hypothesis that can provide a reasonable basis for progress in science (Lakatos & Musgrave, 1970).

In many cases the necessity of correlated errors simply signifies that the model does not consider all the important characteristics of the data. There are established approaches to dealing with this problem: First, one can consider a possible substructure. For example, in the case of GENOS as measures of emotional intelligence (recently published in EJPA, see Gignac, 2010), consideration of a substructure led to an acceptable degree of model fit. Second, measurement impurity can cause model misfit; consideration of the source of impurity as part of the model can lead to a good degree of model fit otherwise not possible (Schweizer, 2007). Third, a method effect may have to be taken into consideration, for example, by considering method factors (Schroeders & Wilhelm, 2010). In each of these cases exemplifying the approaches there is a modification of the original model, and this modification increases our knowledge about human nature.

Finally, one can avoid correlated errors by computing parcels. Including the crucial items in a parcel may solve the problem of model misfit, but it may also hide something we need to know when evaluating a measure. So computing parcels may not truly contribute to establishing structural validity. However, the computation of parcels can be reasonable if a very large number of items has to be considered in an investigation of structural validity (e.g., Gorostiaga, Balluerka, Alonso-Arbiol, & Haranburu, 2011; Lehmann-Willenbrock, Grohmann, & Kauffeld, 2011).

References

  • Aitken, A. C. (1934). On fitting polynomials to data with weighted and correlated errors. Proceedings of the Royal Society, 54, 12–16. First citation in articleGoogle Scholar

  • De Carvalho Leite, J. C. , Seminotti, N. , Freitas, P. F. , de Lourdes Drachler, M. (2011). The Psychosocial Treatment Expectations Questionnaire (PTEQ) for alcohol problems: Development and early validation. European Journal of Psychological Assessment, 27, 228–236. First citation in articleLinkGoogle Scholar

  • Derkman, M. M. S. , Scholte, R. H. J. , Van der Veld, W. M. , Markland, R. C. M. E. (2010). Factorial and construct validity of the sibling relationship questionnaire. European Journal of Psychological Assessment, 26, 277–283. First citation in articleLinkGoogle Scholar

  • Di Giunta, L. , Eisenberg, N. , Kupfer, A. , Steca, P. , Tramontano, C. , Caprara, G. V. (2010). Assessing perceived empathic and social self-efficacy across countries. European Journal of Psychological Assessment, 26, 77–86. First citation in articleLinkGoogle Scholar

  • Gignac, G. (2010). Seven-factor model of emotional intelligence as measured by Genos EI: A confirmatory factor analytic investigation based on self- and rater-report data. European Journal of Psychological Assessment, 26, 309–316. First citation in articleLinkGoogle Scholar

  • Gorostiaga, A. , Balluerka, N. , Alonso-Arbiol, I. , Haranburu, M. (2011). Validation of the Basque Revised NEO Personality Inventory (NEO PI-R). European Journal of Psychological Assessment, 27, 193–204. First citation in articleLinkGoogle Scholar

  • Höfling, V. , Moosbrugger, H. , Schermelleh-Engel, K. , Heidenreich, T. (2011). Mindfulness or mindlessness? A modified version of the Mindful Attention and Awareness Scale (MAAS). European Journal of Psychological Assessment, 27, 59–64. First citation in articleLinkGoogle Scholar

  • Houghton, S. , Durkin, K. , Ang, R. P. , Taylor, M. F. , Brandtman, M. (2011). Measuring temporal self-regulation in children with and without attention deficit hyperactivity disorder: sense of time in everyday contexts. European Journal of Psychological Assessment, 27, 88–94. First citation in articleLinkGoogle Scholar

  • Lakatos, I. , Musgrave, A. (Eds.). (1970). Criticism and the growth of knowledge. Cambridge: Cambridge University Press. First citation in articleCrossrefGoogle Scholar

  • Lehmann-Willenbrock, N. , Grohmann, A. , Kauffeld, S. (2011). Task and relationship conflict at work: Construct validity of a German version of Jehn’s Intragroup Conflict Scale. European Journal of Psychological Assessment, 27, 171–178. First citation in articleLinkGoogle Scholar

  • Mumma, G. H. (2011). Validity issues in cognitive behavioral case formulation. European Journal of Psychological Assessment, 27, 29–49. First citation in articleLinkGoogle Scholar

  • Popper, K. (1934). Logik der Forschung [The logic of science]. Vienna: Springer-Verlag. First citation in articleGoogle Scholar

  • Schroeders, U. , Wilhelm, O. (2010). Testing reasoning ability with handheld computers, notebooks, and paper and pencil. European Journal of Psychological Assessment, 26, 284–292. First citation in articleLinkGoogle Scholar

  • Schweizer, K. (2007). Investigating the relationship of working memory tasks and fluid intelligence tests by means of the fixed-links model in considering the impurity problem. Intelligence, 35, 591–604. First citation in articleCrossrefGoogle Scholar

  • Schweizer, K. (2010). Some guidelines concerning the modeling of traits and abilities in test construction. European Journal of Psychological Assessment, 26, 1–2. First citation in articleLinkGoogle Scholar

  • Teubert, D. , Pinquart, M. (2011). The Coparenting Inventory for Parents and Adolescents (CI-PA): Reliability and validity. European Journal of Psychological Assessment, 27, 206–214. First citation in articleLinkGoogle Scholar

  • Veirman, E. , Brouwers, S. A. , Fontaine, J. (2011). The assessment of emotional awareness in children: Validation of the Levels Of Emotional Awareness for Children. European Journal of Psychological Assessment, 27, 265–273. First citation in articleLinkGoogle Scholar

  • Zimmerman, D. W. , Williams, R. H. (1977). The theory of test validity and correlated errors of measurement. Journal of Mathematical Psychology, 16, 135–152. First citation in articleCrossrefGoogle Scholar

  • Zohar, A. H. , Denollet, J. , Ari, L. L. , Cloninger, C. R. (2011). The psychometric properties of the DS14 in Hebrew and the prevalence of type D personality in Israeli adults. European Journal of Psychological Assessment, 27, 274–281. First citation in articleLinkGoogle Scholar

Karl Schweizer, Department of Psychology, Goethe University Frankfurt, Mertonstr. 17, 60054 Frankfurt a. M., Germany, +49 69 798-22081, +49 69 798-23847,