Abstract
Zusammenfassung. Beim computerisierten adaptiven Testen (CAT) werden geschätzte Itemparameter als bekannt und gültig für alle möglichen Darbietungspositionen im Test angesehen. Diese Annahme ist jedoch problematisch, da sich geschätzte Itemparameter empirisch wiederholt als abhängig von der Darbietungsposition erwiesen haben. Die Nichtbeachtung existierender Itempositionseffekte würde zu suboptimaler Itemauswahl und verzerrter Merkmalsschätzung bei CAT führen. Als Lösungsansatz wird ein einfaches Vorgehen zum Umgang mit Itempositionseffekten bei der CAT-Kalibrierung vorgeschlagen. Hierbei werden Item-Response-Theorie-Modelle mit zunehmender Komplexität bezüglich der Modellierung von Itempositionseffekten geschätzt und das angemessenste Modell aufgrund globaler Modellgeltungskriterien ausgewählt. Das Vorgehen wird an einem empirischen Datensatz aus der Kalibrierung von drei adaptiven Tests (N = 1 632) illustriert. Es zeigten sich Itempositionseffekte, die unterschiedlich differenziert in den einzelnen Tests ausfielen. Durch die Modellierung der Itempositionseffekte wird eine Überschätzung von Varianz und Reliabilität vermieden. Die Nutzung der ermittelten Itempositionseffekte bei nachfolgenden CAT-Anwendungen wird erläutert.
Abstract. In computerized adaptive testing (CAT), item parameter estimates are assumed to be known and valid for all the positions that items can be presented on in the test. This assumption is problematic since item parameter estimates were shown to depend on the position in the test. Neglecting item position effects in CAT administration would cause inefficient item selection and biased ability estimation. As a solution, a simple procedure accounting for item position effects is suggested. In this procedure, potential item position effects are identified by fitting and comparing a series of item response theory models with increasing complexity of item position effects. The proposed procedure is illustrated using empirical calibration data from three adaptive tests (N = 1 632). Test-specific item position effects were identified. By accounting for item position effect with an appropriate model, overestimations of variance and reliability were avoided. The implementation of item position effects in operational adaptive tests is explained.
Literatur
2005). Reliability as a measurement design effect. Studies in Educational Evaluation, 31, 162 – 172.
(2012). ACER ConQuest (Version 3.0.1) [Computer software]. Melbourne: ACER Press.
(1978). A Bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics, 30, 9 – 14.
(2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50, 408 – 426.
(2012). Intelligenz-Struktur-Batterie – INSBAT. Mödling: Schuhfried.
(1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345 – 370.
(1957). Experimental design. New York: John Wiley & Sons.
(2010). Common core state standards for English language arts & literacy in history/social studies, science, and technical subjects. Washington, DC: National Governors Association Center for Best Practices and the Council of Chief State School Officers.
(2005). Using item difficulty and item position to measure test fatigue. Washington, DC: American Institutes for Research.
(2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39, 502 – 523.
(2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50, 164 – 185.
(2012).
(Adaptives Testen . In H. MoosbruggerA. KelavaHrsg., Testtheorie und Fragebogenkonstruktion (2., aktualisierte und überarbeitete Auflage, S. 275 – 293). Berlin, Heidelberg: Springer.2015). Youden. A program for the construction of booklet designs (Version 1.0) [Computer software]. Jena: Friedrich Schiller University Jena, Germany.
(2009). An NCME instructional module on booklet designs in large-scale assessments of student achievement: Theory and practice. Educational Measurement: Issues and Practice, 28 (3), 39 – 53.
(2012). A multilevel item response model for item position effects and individual persistence. Psychological Test and Assessment Modeling, 54, 418 – 431.
(2007). A confirmatory analysis of item reliability trends (CAIRT): Differentiating true score and error variance in the analysis of item context effects. Multivariate Behavioral Research Methods, 42, 157 – 183.
(2011).
(Kriteriumsorientierte Diagnostik . In Hornke, L. F.Amelang, M.M. KerstingHrsg., Methoden der psychologischen Diagnostik (Enzyklopädie der Psychologie, Serie Psychologische Diagnostik, Bd. 2, S. 281 – 324). Göttingen: Hogrefe.2008). Examining item-position effects in large-scale assessment using the Linear Logistic Test Model. Psychology Science Quarterly, 50, 391 – 402.
(2011). Analyzing item-position effects due to booklet design within large-scale assessment. Educational Research and Evaluation, 17, 497 – 509.
(1993). Differential item functioning. Hillsdale: Erlbaum.
(1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8, 147 – 154.
(1988). Item context effects on personality scales: Measuring changes the measure. Journal of Personality and Social Psychology, 55, 312–320.
(2013). Multidimensional adaptive testing environment (MATE). Manual. Frankfurt am Main: DIPF.
(2007, July). Effects of item positions on their difficulty and discrimination: A study in PISA Science data across test language and countries. Paper presented at the 72nd Annual Meeting of the Psychometric Society, Tokyo.
(1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55, 387 – 413.
(2012). Investigating the effect of item position in computer-based tests. Journal of Educational Measurement, 49, 362 – 379.
(1994). Many-facet Rasch measurement. Chicago: MESA Press.
(2009). Item position and item difficulty change in an IRT-based common item equating design. Applied Measurement in Education, 22, 38 – 60.
(1950). An experimental study of the effects on item analysis data of changing item placement and test-time limit. Psychometrika, 15, 291 – 315.
(2004). An investigation of context effects for item randomization within testlets. Journal of Educational Computing Research, 30, 243 – 254.
(2014). An investigation of position effects in large-scale writing assessments. Applied Psychological Measurement, 38, 518 – 534.
(1966). An item analysis which takes individual differences into account. British Journal of Mathematical and Statistical Psychology, 19, 49 – 57.
(2004). Lehrbuch Testtheorie-Testkonstruktion (2., vollständig überarbeitete und erweiterte Aufl.). Bern: Huber.
(1978). Estimation the dimensions of a model. Annals of Statistics, 6, 461 – 464.
(2009). The confirmatory investigation of APM items with loadings as a function of the position and easiness of items: A two-dimensional model of APM. Psychology Science Quarterly, 51, 47 – 64.
(2011). On the special relationship between fluid and general intelligence: New evidence obtained by considering the position effect. Personality and Individual Differences, 50, 1249 – 1254.
(2005).
(Computerized adaptive testing . In Kempf-Leonard K.Ed., Encyclopedia of social measurement (Vol. 1, pp. 429–438). New York, NY: Academic Press.1994). Context and serial-order effects in personality measurement: Limits on the generality of measuring changes the measure. Journal of Personality and Social Psychology, 66, 341 – 349.
(2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation, 16, 1 – 9. Zugriff am 20. 02. 2015. Verfügbar unter http://pareonline.net/getvn.asp?v=16&n=1
(1997). Handbook of modern item response theory. New York, NY: Springer.
(2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38, 535–548.
(1980). The extent, causes and importance of context effects on item parameters for two latent trait models. Journal of Educational Measurement, 17, 297 – 311.
(2012). Diagnostik „funktionaler Lesekompetenz“. bwp@ Berufs- und Wirtschaftspädagogik–online, 22, 1 – 19. Zugriff am 20. 5. 2016 http://www.bwpat.de/ausgabe22/ziegler_etal_bwpat22.pdf
(2016).
(Adaptive Messung allgemeiner Kompetenzen (MaK-adapt) . In Beck, K.Landenberger, M.Oser, F.Hrsg., Technologiebasierte Kompetenzmessung in der beruflichen Bildung. Ergebnisse aus der BMBF-Förderinitiative ASCOT (S. 33 – 54). Bielefeld: wbv.1991). Effects of item order and context on estimation of NAEP reading proficiency. Educational Measurement: Issues and Practice, 10 (3), 10 – 16.
(