Skip to main content
Originalarbeit

Umgang mit Itempositionseffekten bei der Entwicklung computerisierter adaptiver Tests

Published Online:https://doi.org/10.1026/0012-1924/a000173

Zusammenfassung. Beim computerisierten adaptiven Testen (CAT) werden geschätzte Itemparameter als bekannt und gültig für alle möglichen Darbietungspositionen im Test angesehen. Diese Annahme ist jedoch problematisch, da sich geschätzte Itemparameter empirisch wiederholt als abhängig von der Darbietungsposition erwiesen haben. Die Nichtbeachtung existierender Itempositionseffekte würde zu suboptimaler Itemauswahl und verzerrter Merkmalsschätzung bei CAT führen. Als Lösungsansatz wird ein einfaches Vorgehen zum Umgang mit Itempositionseffekten bei der CAT-Kalibrierung vorgeschlagen. Hierbei werden Item-Response-Theorie-Modelle mit zunehmender Komplexität bezüglich der Modellierung von Itempositionseffekten geschätzt und das angemessenste Modell aufgrund globaler Modellgeltungskriterien ausgewählt. Das Vorgehen wird an einem empirischen Datensatz aus der Kalibrierung von drei adaptiven Tests (N = 1 632) illustriert. Es zeigten sich Itempositionseffekte, die unterschiedlich differenziert in den einzelnen Tests ausfielen. Durch die Modellierung der Itempositionseffekte wird eine Überschätzung von Varianz und Reliabilität vermieden. Die Nutzung der ermittelten Itempositionseffekte bei nachfolgenden CAT-Anwendungen wird erläutert.


Handling of Item Positions Effects in the Development of Computerized Adaptive Tests

Abstract. In computerized adaptive testing (CAT), item parameter estimates are assumed to be known and valid for all the positions that items can be presented on in the test. This assumption is problematic since item parameter estimates were shown to depend on the position in the test. Neglecting item position effects in CAT administration would cause inefficient item selection and biased ability estimation. As a solution, a simple procedure accounting for item position effects is suggested. In this procedure, potential item position effects are identified by fitting and comparing a series of item response theory models with increasing complexity of item position effects. The proposed procedure is illustrated using empirical calibration data from three adaptive tests (N = 1 632). Test-specific item position effects were identified. By accounting for item position effect with an appropriate model, overestimations of variance and reliability were avoided. The implementation of item position effects in operational adaptive tests is explained.

Literatur

  • Adams, R. J. (2005). Reliability as a measurement design effect. Studies in Educational Evaluation, 31, 162 – 172. First citation in articleCrossrefGoogle Scholar

  • Adams, R. J., Wu, M. L., Haldane, S. & Sun, X. (2012). ACER ConQuest (Version 3.0.1) [Computer software]. Melbourne: ACER Press. First citation in articleGoogle Scholar

  • Akaike, H. (1978). A Bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics, 30, 9 – 14. First citation in articleCrossrefGoogle Scholar

  • Albano, A. D. (2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50, 408 – 426. First citation in articleCrossrefGoogle Scholar

  • Arendasy, M., Hornke, L. F., Sommer, M., Wagner-Menghin, M., Gittler, G. & Häusler, J., et al. (2012). Intelligenz-Struktur-Batterie – INSBAT. Mödling: Schuhfried. First citation in articleGoogle Scholar

  • Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345 – 370. First citation in articleCrossrefGoogle Scholar

  • Cochran, W. G. & Cox, G. M. (1957). Experimental design. New York: John Wiley & Sons. First citation in articleGoogle Scholar

  • Common Core State Standards Initiative (2010). Common core state standards for English language arts & literacy in history/social studies, science, and technical subjects. Washington, DC: National Governors Association Center for Best Practices and the Council of Chief State School Officers. First citation in articleGoogle Scholar

  • Davis, J. & Ferdous, A. (2005). Using item difficulty and item position to measure test fatigue. Washington, DC: American Institutes for Research. First citation in articleCrossrefGoogle Scholar

  • Debeer, D., Buchholz, J., Hartig, J. & Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39, 502 – 523. First citation in articleCrossrefGoogle Scholar

  • Debeer, D. & Janssen, R. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50, 164 – 185. First citation in articleCrossrefGoogle Scholar

  • Frey, A. (2012). Adaptives Testen. In H. MoosbruggerA. KelavaHrsg., Testtheorie und Fragebogenkonstruktion (2., aktualisierte und überarbeitete Auflage, S. 275 – 293). Berlin, Heidelberg: Springer. First citation in articleGoogle Scholar

  • Frey, A. & Annageldyev, M. (2015). Youden. A program for the construction of booklet designs (Version 1.0) [Computer software]. Jena: Friedrich Schiller University Jena, Germany. First citation in articleGoogle Scholar

  • Frey, A., Hartig, J. & Rupp, A. (2009). An NCME instructional module on booklet designs in large-scale assessments of student achievement: Theory and practice. Educational Measurement: Issues and Practice, 28 (3), 39 – 53. First citation in articleCrossrefGoogle Scholar

  • Hartig, J. & Buchholz, J. (2012). A multilevel item response model for item position effects and individual persistence. Psychological Test and Assessment Modeling, 54, 418 – 431. First citation in articleGoogle Scholar

  • Hartig, J., Hölzel, B. & Moosbrugger, H. (2007). A confirmatory analysis of item reliability trends (CAIRT): Differentiating true score and error variance in the analysis of item context effects. Multivariate Behavioral Research Methods, 42, 157 – 183. First citation in articleCrossrefGoogle Scholar

  • Herzberg, P. Y. & Frey, A. (2011). Kriteriumsorientierte Diagnostik. In Hornke, L. F.Amelang, M.M. KerstingHrsg., Methoden der psychologischen Diagnostik (Enzyklopädie der Psychologie, Serie Psychologische Diagnostik, Bd. 2, S. 281 – 324). Göttingen: Hogrefe. First citation in articleGoogle Scholar

  • Hohensinn, C., Kubinger, K. D., Reif, M., Holocher-Ertl, S., Khorramdel, L. & Frebort, M. (2008). Examining item-position effects in large-scale assessment using the Linear Logistic Test Model. Psychology Science Quarterly, 50, 391 – 402. First citation in articleGoogle Scholar

  • Hohensinn, C., Kubinger, K. D., Reif, M., Schleicher, E. & Khorramdel, L. (2011). Analyzing item-position effects due to booklet design within large-scale assessment. Educational Research and Evaluation, 17, 497 – 509. First citation in articleCrossrefGoogle Scholar

  • Holland, P. W. & Wainer, H. (1993). Differential item functioning. Hillsdale: Erlbaum. First citation in articleGoogle Scholar

  • Kingston, N. M. & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8, 147 – 154. First citation in articleCrossrefGoogle Scholar

  • Knowles, E. S. (1988). Item context effects on personality scales: Measuring changes the measure. Journal of Personality and Social Psychology, 55, 312–320. First citation in articleCrossrefGoogle Scholar

  • Kroehne, U. & Frey, A. (2013). Multidimensional adaptive testing environment (MATE). Manual. Frankfurt am Main: DIPF. First citation in articleGoogle Scholar

  • Le, L. T. (2007, July). Effects of item positions on their difficulty and discrimination: A study in PISA Science data across test language and countries. Paper presented at the 72nd Annual Meeting of the Psychometric Society, Tokyo. First citation in articleGoogle Scholar

  • Leary, L. F. & Dorans N. J, Dorans N. J (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55, 387 – 413. First citation in articleCrossrefGoogle Scholar

  • Li, F., Cohen, A. & Shen, L. (2012). Investigating the effect of item position in computer-based tests. Journal of Educational Measurement, 49, 362 – 379. First citation in articleCrossrefGoogle Scholar

  • Linacre, J. M. (1994). Many-facet Rasch measurement. Chicago: MESA Press. First citation in articleGoogle Scholar

  • Meyers, J. L., Miller, G. E. & Way, W. D. (2009). Item position and item difficulty change in an IRT-based common item equating design. Applied Measurement in Education, 22, 38 – 60. First citation in articleCrossrefGoogle Scholar

  • Mollenkopf, W. G. (1950). An experimental study of the effects on item analysis data of changing item placement and test-time limit. Psychometrika, 15, 291 – 315. First citation in articleCrossrefGoogle Scholar

  • Pomplun, M. & Ritchie, T. (2004). An investigation of context effects for item randomization within testlets. Journal of Educational Computing Research, 30, 243 – 254. First citation in articleCrossrefGoogle Scholar

  • Qian, J. (2014). An investigation of position effects in large-scale writing assessments. Applied Psychological Measurement, 38, 518 – 534. First citation in articleCrossrefGoogle Scholar

  • Rasch, G. (1966). An item analysis which takes individual differences into account. British Journal of Mathematical and Statistical Psychology, 19, 49 – 57. First citation in articleCrossrefGoogle Scholar

  • Rost, J. (2004). Lehrbuch Testtheorie-Testkonstruktion (2., vollständig überarbeitete und erweiterte Aufl.). Bern: Huber. First citation in articleGoogle Scholar

  • Schwarz, M. (1978). Estimation the dimensions of a model. Annals of Statistics, 6, 461 – 464. First citation in articleCrossrefGoogle Scholar

  • Schweizer, K., Schreiner, M. & Gold, A. (2009). The confirmatory investigation of APM items with loadings as a function of the position and easiness of items: A two-dimensional model of APM. Psychology Science Quarterly, 51, 47 – 64. First citation in articleGoogle Scholar

  • Schweizer, K., Troche, S. J. & Rammsayer, T. (2011). On the special relationship between fluid and general intelligence: New evidence obtained by considering the position effect. Personality and Individual Differences, 50, 1249 – 1254. First citation in articleCrossrefGoogle Scholar

  • Segall, D. O. (2005). Computerized adaptive testing. In Kempf-Leonard K.Ed., Encyclopedia of social measurement (Vol. 1, pp. 429–438). New York, NY: Academic Press. First citation in articleGoogle Scholar

  • Steinberg, L. (1994). Context and serial-order effects in personality measurement: Limits on the generality of measuring changes the measure. Journal of Personality and Social Psychology, 66, 341 – 349. First citation in articleCrossrefGoogle Scholar

  • Thompson, N. A. & Weiss, D. A. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation, 16, 1 – 9. Zugriff am 20. 02. 2015. Verfügbar unter http://pareonline.net/getvn.asp?v=16&n=1 First citation in articleGoogle Scholar

  • van der Linden, W. J. & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York, NY: Springer. First citation in articleCrossrefGoogle Scholar

  • Weirich, S., Hecht, M. & Böhme, K. (2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38, 535–548. First citation in articleCrossrefGoogle Scholar

  • Yen, W. M. (1980). The extent, causes and importance of context effects on item parameters for two latent trait models. Journal of Educational Measurement, 17, 297 – 311. First citation in articleCrossrefGoogle Scholar

  • Ziegler, B., Balkenhol, A., Keimes, C. & Rexing, V. (2012). Diagnostik „funktionaler Lesekompetenz“. bwp@ Berufs- und Wirtschaftspädagogik–online, 22, 1 – 19. Zugriff am 20. 5. 2016 http://www.bwpat.de/ausgabe22/ziegler_etal_bwpat22.pdf First citation in articleGoogle Scholar

  • Ziegler, B., Frey, A., Seeber, S., Balkenhol, A. & Bernhardt, R. (2016). Adaptive Messung allgemeiner Kompetenzen (MaK-adapt). In Beck, K.Landenberger, M.Oser, F.Hrsg., Technologiebasierte Kompetenzmessung in der beruflichen Bildung. Ergebnisse aus der BMBF-Förderinitiative ASCOT (S. 33 – 54). Bielefeld: wbv. First citation in articleGoogle Scholar

  • Zwick, R. (1991). Effects of item order and context on estimation of NAEP reading proficiency. Educational Measurement: Issues and Practice, 10 (3), 10 – 16. First citation in articleCrossrefGoogle Scholar