Assessing the Quality and Effectiveness of the Factor Score Estimates in Psychometric Factor-Analytic Applications
Abstract
Abstract. This article proposes an approach, intended for factor-analytic (FA) psychometric applications, which aims to assess the extent to which the FA-derived individual score estimates are accurate and allow the respondents to be consistently ordered and effectively differentiated over the range of trait values that is appropriate given the purposes of the test. Three groups of properties are assessed: (a) fineness, (b) probability, and (c) range, and, within each group, different indices are proposed. Overall, the proposal is comprehensive in that it can be used with (a) different factor score estimates derived from both the linear model and the categorical variable methodology model, and (b) any type of unrestricted or restricted FA solution. All the indices proposed have been implemented in a non-commercial and widely known program for exploratory factor analysis. The usefulness of the proposal is illustrated with a real-data example in the personality domain.
References
1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444. https://doi.org/10.1177/014662168200600405
(2015).
(Scoring and estimating score precision using multidimensional IRT . In S. P. ReiseD. A. RevickiEds., Handbook of Item Response Theory modeling: Applications to typical performance assessment (pp. 307–333). New York, NY: Routledge.1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58, 37–52. https://doi.org/10.1007/BF02294469
(1977). A theory of consistency of ordering generalizable to tailored testing. Psychometrika, 42, 375–399. https://doi.org/10.1007/BF02293657
(1964). The signal/noise ratio in the comparison of reliability coefficients. Educational and Psychological Measurement, 24, 467–480. https://doi.org/10.1177/001316446402400303
(2017). Factor score path analysis. Methodology, 13, 31–38. https://doi.org/10.1027/1614-2241/a000130
(2012). Assessing the discriminating power of item and test scores in the linear factor-analysis model. Psicologica, 33, 111–134.
(2013). Unrestricted item factor analysis and some relations with item response theory (Technical report). Tarragona, Spain: Department of Psychology, Universitat Rovira i Virgili, Tarragona.
(2016). A note on improving EAP trait estimation in oblique factor-analytic and item response theory models. Psicologica, 37, 235–247.
(2017). Program FACTOR at 10: Origins, development and future directions. Psicothema, 29, 236–240. https://doi.org/10.7334/psicothema2016.304
(2018a). Assessing the quality and appropriateness of factor solutions and factor score estimates in exploratory item factor analysis. Educational and Psychological Measurement, 78, 762–780. https://doi.org/10.1177/0013164417719308
(2018b). On the added value of multiple factor score estimates in essentially unidimensional models. Educational and Psychological Measurement, 79, 249–271. https://doi.org/10.1177/0013164418773851
(2001). Computing and evaluating factor scores. Psychological Methods, 6, 430–450. https://doi.org/10.1037/1082-989X.6.4.430
(1999). The Five-Factor Inventory (FFPI). Personality and Individual Differences, 27, 307–325.
(1939). Reliability of mental tests. British Journal of Psychology, 29, 267–287. https://doi.org/10.1111/j.2044-8295.1939.tb00918.x
(1959). An index of the discriminating power of a test at different parts of the score range. Educational and Psychological Measurement, 19, 497–503. https://doi.org/10.1177/001316445901900402
(1954). The attenuation paradox in test theory. Psychological Bulletin, 51, 493–504. https://doi.org/10.1037/h0058543
(1983). Statistical bias in maximum likelihood estimators of item parameters. Psychometrika, 48(3), 425–435. https://doi.org/10.1007/BF02293684
(1968). Statistical theories of mental test scores. Reading, MA: Addison Wesley.
(1982). Linear vs. non linear models in Item Response Theory. Applied Psychological Measurement, 6, 379–396. https://doi.org/10.1177/014662168200600402
(1955). The reliability of test discriminations. Educational and Psychological Measurement, 15, 362–375. https://doi.org/10.1177/001316445501500404
(1984). Estimating latent distributions. Psychometrika, 49, 359–381. https://doi.org/10.1007/BF02306026
(1999). Some reliability estimates for computerized adaptive tests. Applied Psychological Measurement, 23, 239–247. https://doi.org/10.1177/01466219922031356
(2016a). Evaluating bifactor models: Calculating and interpreting statistical indices. Psychological Methods, 21, 137–150. https://doi.org/10.1037/met0000045
(2016b). Applying bifactor statistical indices in the evaluation of psychological measures. Journal of Personality Assessment, 98, 223–237. https://doi.org/10.1080/00223891.2015.1089249
(2001). Psychometric properties of the Spanish adaptation of the Five Factor Personality Inventory. European Journal of Psychological Assessment, 17, 145–153. https://doi.org/10.1027//1015-5759.17.2.145
(1977). Weakly parallel tests in latent trait theory with some criticism of classical test theory. Psychometrika, 42, 193–198. https://doi.org/10.1007/BF02294048
(2001).
(Item response theory for items scored in two categories . In D. ThissenH. WainerEds., Test scoring (pp. 73–140). Mahwah, NJ: LEA.1996). Reliability and separation. Rasch Measurement Transactions, 9, 472–474.
(2002). Number of person or item strata. Rasch Measurement Transactions, 16, 888.
(2015). Evaluation of structural equation modeling estimates of reliability for scales with ordered categorical items. Methodology, 11, 23–24. https://doi.org/10.1027/1614-2241/a000087
(2016). Assessing structural equation models by equivalence testing with adjusted fit indexes. Structural Equation Modeling, 23, 319–330. https://doi.org/10.1080/10705511.2015.1065414
(