Abstract
Zusammenfassung. In der psychologischen Forschung durchgeführte Messungen zur Erfassung von Konstrukten sind meistens mit einem Messfehler behaftet. Diese Messfehler führen zu verzerrten Schätzern von Populationsparametern und deren Standardfehlern. In den letzten Jahrzehnten hat sich im Bereich der Large-Scale-Assessments mit der Plausible-Values-Technik ein Verfahren zur Korrektur von messfehlerbehafteten Zusammenhängen zwischen latenten Variablen und beobachteten Kovariaten etabliert. Der vorliegende Beitrag führt anhand eines einfachen Beispiels aus der Klassischen Testtheorie in dieses komplexe statistische Verfahren ein. Es wird gezeigt, dass alternative Verfahren zur Schätzung von Personenwerten im Allgemeinen zu verzerrten Schätzungen von Zusammenhängen auf Populationsebene führen. In einer Simulationsstudie werden diese Befunde auf ein IRT-Modell für dichotome Indikatoren übertragen. Aus diagnostischer Sicht wird betont, dass Plausible Values nicht zur Schätzung von individuellen Fähigkeitsausprägungen verwendet werden sollen. Abschließend werden methodische Herausforderungen bei der Anwendung der Plausible-Values-Technik sowie das Potential für die psychologische Forschung diskutiert.
Abstract. In psychological research, the assessment of most constructs is affected by measurement error. Measurement error results in biased estimates of population parameters and their standard errors. In the past few decades, in the area of large-scale assessment studies, the plausible values technique has been established as a procedure for correcting relationships between latent variables and covariates. The present article introduces this complex statistical technique using a simple example from classic test theory. It shows that alternative procedures for estimating person parameters result in biased estimates of relationships at the population level. A simulation study was conducted to demonstrate that these findings also hold for an item response model in the case of dichotomous indicators. The results highlight that plausible values should not be used for estimating individual person parameters and are not appropriate for individual psychological assessment. Finally, we discuss methodological challenges in the application of the plausible value technique and the potential of this technique for psychological research.
Literatur
2005). Reliability as a measurement design effect. Studies in Educational Evaluation, 31, 162 – 172.
(1988). Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin, 103, 411 – 423.
(2010). Plausible values for latent variables using Mplus (Tech. Rep.). Mplus Technical Report. Retrieved May 27, 2015 from http://statmodel.com/download/Plausible.pdf
((im Druck a). A unified approach to measurement error and missing data: Overview and applications. Sociological Methods and Research.
(im Druck b). A unified approach to measurement error and missing data: Details and extensions. Sociological Methods and Research.
2008). What improves with increased missing data imputations? Structural Equation Modeling, 15, 651 – 675.
(1989). Structural equations with latent variables. New York, NY: Wiley.
(2007). Introduction to Bayesian statistics. New York, NY: Wiley.
(1996).
(Modelle mit latenten Variablen: Faktorenanalyse, Latent-Structure-Analyse und LISREL-Analyse . In L. FahrmeirA. HamerleG. TutzHrsg., Multivariate statistische Verfahren (S. 638 – 766). Berlin: de Gruyter.2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45 (3), 1 – 67. Retrieved from http://www.jstatsoft.org/v45/i03/
(2007).
(Technische Grundlagen des dritten internationalen Vergleichs . In M. PrenzelC. ArteltJ. BaumertW. BlumM. HammannE. KliemeHrsg, PISA 2006. Die Ergebnisse der dritten internationalen Vergleichsstudie (S. 367 – 390). Münster: Waxmann.2006). Multiple-imputation for measurement-error correction. International Journal of Epidemiology, 35, 1074 – 1081.
(2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330 – 351.
(2009). What are plausible values and why are they useful? IERI Monograph Series: Issues and Methodologies in Large Scale Assessments, 2, 9 – 36.
(2006).
(The statistical procedures used in national assessment of educational progress: Recent developments and future directions . In C. R. RaoS. SinharayEds., Handbook of statistics: Vol. 26. Psychometrics (pp. 1039 – 1055). Amsterdam: Elsevier.2016). Hypothesis testing using factor score regression: A comparison of four methods. Educational and Psychological Measurement, 76, 741 – 770.
(2014). Testtheorie und Testkonstruktion. Göttingen: Hogrefe.
(2010). Applied missing data analysis. New York, NY: Guilford.
(2016). Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. Psychological Methods, 21, 222 – 240.
(2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Preventions Science, 8, 206 – 213.
(2001). A comparison of factor scores under conditions of factor obliquity. Psychological Methods, 6, 67 – 83.
(2010).
(Modelle der Item-Response-Theorie . In S. MaschkeL. StecherHrsg., Enzyklopädie Erziehungswissenschaft Online. Fachgebiet Methoden der empirischen erziehungswissenschaftlichen Forschung, Quantitative Forschungsmethoden. Weinheim: Juventa.1987). Structural equation modeling with LISREL: Essentials and advances. Baltimore: John Hopkins University Press.
(2009). A first course in Bayesian statistical methods. New York, NY: Springer.
(2011). Amelia II: A program for missing data. Journal of Statistical Software, 45, 1 – 47. Retrieved May 27, 2015, from http://www.jstatsoft.org/v45/i07/
(1927). Interpretation of educational measurements. New York, NY: World Book.
(2016). TAM: Test analysis modules (R package version 1.16 – 2) [Computer software]. Retrieved from http://CRAN.R-project.org/package=TAM
(2011). The trade-off between accuracy and precision in latent variable models of mediation processes. Journal of Personality and Social Psychology, 101, 1174 – 1188.
(2009). On the estimation of hierarchical latent regression models for large-scale assessments. Journal of Educational and Behavioral Statistics, 34, 433 – 463.
(2002). Statistical analysis with missing data. New York: Wiley.
(2013). Why the items versus parcels controversy needn’t be one. Psychological Methods, 18, 285 – 300.
(1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
(2010).
(Umgang mit fehlenden Daten in der empirischen Bildungsforschung . In S. MaschkeL. StecherHrsg., Enzyklopädie Erziehungswissenschaft Online. Fachgebiet Methoden der empirischen erziehungswissenschaftlichen Forschung, Quantitative Forschungsmethoden. Weinheim: Juventa.1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.
(2010). Structural models and the art of approximation. Perspectives on Psychological Science, 5, 675 – 686.
(2011). Measuring latent quantities. Psychometrika, 76, 511 – 536.
(1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56, 177 – 196.
(1992). Scaling procedures in NAEP. Journal of Educational Statistics, 17, 131 – 154.
(2009). Plausible values: How to deal with their limitations. Journal of Applied Measurement, 10, 320 – 334.
(2004). Mplus technical appendices. Los Angeles, CA: Muthén & Muthén.
(2013). Measurement error models with uncertainty about the error variance. Structural Equation Modeling, 20, 409 – 428.
(2014).
(Population model size, bias and variance in educational survey assessments . In L. RutkowskiM. von DavierD. RutkowskiEds., Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 203 – 228). Boca Raton, FL: CRC Press.2011). Introduction to psychometric theory. Routledge: Taylor & Francis.
(2004). Lehrbuch Testtheorie – Testkonstruktion. Bern: Huber.
(1987). Multiple imputation for nonresponse in surveys. Hoboken, NJ: Wiley.
(2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147 – 177.
(2015). Predictive inference using latent variables with covariates. Psychometrika, 80, 727 – 747.
(2001). Regression among factor scores. Psychometrika, 66, 563 – 575.
(2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Boca Raton, FL: Chapman & Hall/CRC.
(2010). Variability in parameter estimates and model fit across repeated allocations of items to parcels. Multivariate Behavioral Research, 45, 322 – 358.
(2001). Messen und Testen. Berlin: Springer.
(1994). Reliability for the social sciences: Theory and applications. Thousand Oaks, CA: Sage Publications.
(2001).
(True score theory: The traditional method . In D. ThissenH. WainerEds., Test scoring (pp. 23 – 72). Hillsdale, NJ: Lawrence Erlbaum Associates.2016).
(Handling measurement error in predictors using a multilevel latent variable plausible values approach . In J. R. HarringL. M. StapletonS. N. BeretvasEds., Advances in multilevel modeling for educational research (pp. 295 – 333). Charlotte, NC: IAP.1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427 – 450.
(2014). Nested multiple imputation in large-scale assessments. Large-scale Assessments in Education, 2 (9), 1 – 18.
(2011). The myth of global fit indices and alternatives for assessing latent variable relations. Organizational Research Methods, 14, 350 – 369.
(2013). Sample size requirements for structural equation models: An evaluation of power, bias, and solution propriety. Educational and Psychological Measurement, 73, 913 – 934.
(2005). The role of plausible values in large-scale surveys. Studies in Educational Evaluation, 31, 114 – 128.
(