Originalarbeit

Umgang mit Itempositionseffekten bei der Entwicklung computerisierter adaptiver Tests

Andreas Frey

Friedrich-Schiller-Universität Jena Institut für Erziehungswissenschaft, Institut für Erziehungswissenschaften

Center for Educational Measurement at the University of Oslo (CEMO)

Search for more papers by this author

Raphael Bernhardt

Friedrich-Schiller-Universität Jena Institut für Erziehungswissenschaft, Institut für Erziehungswissenschaften

Search for more papers by this author

, and

Sebastian Born

Friedrich-Schiller-Universität Jena Institut für Erziehungswissenschaft, Institut für Erziehungswissenschaften

Search for more papers by this author

Published Online:October 25, 2016https://doi.org/10.1026/0012-1924/a000173

Abstract

Zusammenfassung. Beim computerisierten adaptiven Testen (CAT) werden geschätzte Itemparameter als bekannt und gültig für alle möglichen Darbietungspositionen im Test angesehen. Diese Annahme ist jedoch problematisch, da sich geschätzte Itemparameter empirisch wiederholt als abhängig von der Darbietungsposition erwiesen haben. Die Nichtbeachtung existierender Itempositionseffekte würde zu suboptimaler Itemauswahl und verzerrter Merkmalsschätzung bei CAT führen. Als Lösungsansatz wird ein einfaches Vorgehen zum Umgang mit Itempositionseffekten bei der CAT-Kalibrierung vorgeschlagen. Hierbei werden Item-Response-Theorie-Modelle mit zunehmender Komplexität bezüglich der Modellierung von Itempositionseffekten geschätzt und das angemessenste Modell aufgrund globaler Modellgeltungskriterien ausgewählt. Das Vorgehen wird an einem empirischen Datensatz aus der Kalibrierung von drei adaptiven Tests (N = 1 632) illustriert. Es zeigten sich Itempositionseffekte, die unterschiedlich differenziert in den einzelnen Tests ausfielen. Durch die Modellierung der Itempositionseffekte wird eine Überschätzung von Varianz und Reliabilität vermieden. Die Nutzung der ermittelten Itempositionseffekte bei nachfolgenden CAT-Anwendungen wird erläutert.

Handling of Item Positions Effects in the Development of Computerized Adaptive Tests

Abstract. In computerized adaptive testing (CAT), item parameter estimates are assumed to be known and valid for all the positions that items can be presented on in the test. This assumption is problematic since item parameter estimates were shown to depend on the position in the test. Neglecting item position effects in CAT administration would cause inefficient item selection and biased ability estimation. As a solution, a simple procedure accounting for item position effects is suggested. In this procedure, potential item position effects are identified by fitting and comparing a series of item response theory models with increasing complexity of item position effects. The proposed procedure is illustrated using empirical calibration data from three adaptive tests (N = 1 632). Test-specific item position effects were identified. By accounting for item position effect with an appropriate model, overestimations of variance and reliability were avoided. The implementation of item position effects in operational adaptive tests is explained.

Literatur

Adams, R. J. (2005). Reliability as a measurement design effect. Studies in Educational Evaluation, 31, 162 – 172. First citation in article Crossref, Google Scholar
Adams, R. J., Wu, M. L., Haldane, S. & Sun, X. (2012). ACER ConQuest (Version 3.0.1) [Computer software]. Melbourne: ACER Press. First citation in article Google Scholar
Akaike, H. (1978). A Bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics, 30, 9 – 14. First citation in article Crossref, Google Scholar
Albano, A. D. (2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50, 408 – 426. First citation in article Crossref, Google Scholar
Arendasy, M., Hornke, L. F., Sommer, M., Wagner-Menghin, M., Gittler, G. & Häusler, J., et al. (2012). Intelligenz-Struktur-Batterie – INSBAT. Mödling: Schuhfried. First citation in article Google Scholar
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345 – 370. First citation in article Crossref, Google Scholar
Cochran, W. G. & Cox, G. M. (1957). Experimental design. New York: John Wiley & Sons. First citation in article Google Scholar
Common Core State Standards Initiative (2010). Common core state standards for English language arts & literacy in history/social studies, science, and technical subjects. Washington, DC: National Governors Association Center for Best Practices and the Council of Chief State School Officers. First citation in article Google Scholar
Davis, J. & Ferdous, A. (2005). Using item difficulty and item position to measure test fatigue. Washington, DC: American Institutes for Research. First citation in article Crossref, Google Scholar
Debeer, D., Buchholz, J., Hartig, J. & Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA reading assessment. Journal of Educational and Behavioral Statistics, 39, 502 – 523. First citation in article Crossref, Google Scholar
Debeer, D. & Janssen, R. (2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50, 164 – 185. First citation in article Crossref, Google Scholar
Frey, A. (2012). Adaptives Testen. In H. MoosbruggerA. KelavaHrsg., Testtheorie und Fragebogenkonstruktion (2., aktualisierte und überarbeitete Auflage, S. 275 – 293). Berlin, Heidelberg: Springer. First citation in article Google Scholar
Frey, A. & Annageldyev, M. (2015). Youden. A program for the construction of booklet designs (Version 1.0) [Computer software]. Jena: Friedrich Schiller University Jena, Germany. First citation in article Google Scholar
Frey, A., Hartig, J. & Rupp, A. (2009). An NCME instructional module on booklet designs in large-scale assessments of student achievement: Theory and practice. Educational Measurement: Issues and Practice, 28 (3), 39 – 53. First citation in article Crossref, Google Scholar
Hartig, J. & Buchholz, J. (2012). A multilevel item response model for item position effects and individual persistence. Psychological Test and Assessment Modeling, 54, 418 – 431. First citation in article Google Scholar
Hartig, J., Hölzel, B. & Moosbrugger, H. (2007). A confirmatory analysis of item reliability trends (CAIRT): Differentiating true score and error variance in the analysis of item context effects. Multivariate Behavioral Research Methods, 42, 157 – 183. First citation in article Crossref, Google Scholar
Herzberg, P. Y. & Frey, A. (2011). Kriteriumsorientierte Diagnostik. In Hornke, L. F.Amelang, M.M. KerstingHrsg., Methoden der psychologischen Diagnostik (Enzyklopädie der Psychologie, Serie Psychologische Diagnostik, Bd. 2, S. 281 – 324). Göttingen: Hogrefe. First citation in article Google Scholar
Hohensinn, C., Kubinger, K. D., Reif, M., Holocher-Ertl, S., Khorramdel, L. & Frebort, M. (2008). Examining item-position effects in large-scale assessment using the Linear Logistic Test Model. Psychology Science Quarterly, 50, 391 – 402. First citation in article Google Scholar
Hohensinn, C., Kubinger, K. D., Reif, M., Schleicher, E. & Khorramdel, L. (2011). Analyzing item-position effects due to booklet design within large-scale assessment. Educational Research and Evaluation, 17, 497 – 509. First citation in article Crossref, Google Scholar
Holland, P. W. & Wainer, H. (1993). Differential item functioning. Hillsdale: Erlbaum. First citation in article Google Scholar
Kingston, N. M. & Dorans, N. J. (1984). Item location effects and their implications for IRT equating and adaptive testing. Applied Psychological Measurement, 8, 147 – 154. First citation in article Crossref, Google Scholar
Knowles, E. S. (1988). Item context effects on personality scales: Measuring changes the measure. Journal of Personality and Social Psychology, 55, 312–320. First citation in article Crossref, Google Scholar
Kroehne, U. & Frey, A. (2013). Multidimensional adaptive testing environment (MATE). Manual. Frankfurt am Main: DIPF. First citation in article Google Scholar
Le, L. T. (2007, July). Effects of item positions on their difficulty and discrimination: A study in PISA Science data across test language and countries. Paper presented at the 72nd Annual Meeting of the Psychometric Society, Tokyo. First citation in article Google Scholar
Leary, L. F. & Dorans N. J, Dorans N. J (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55, 387 – 413. First citation in article Crossref, Google Scholar
Li, F., Cohen, A. & Shen, L. (2012). Investigating the effect of item position in computer-based tests. Journal of Educational Measurement, 49, 362 – 379. First citation in article Crossref, Google Scholar
Linacre, J. M. (1994). Many-facet Rasch measurement. Chicago: MESA Press. First citation in article Google Scholar
Meyers, J. L., Miller, G. E. & Way, W. D. (2009). Item position and item difficulty change in an IRT-based common item equating design. Applied Measurement in Education, 22, 38 – 60. First citation in article Crossref, Google Scholar
Mollenkopf, W. G. (1950). An experimental study of the effects on item analysis data of changing item placement and test-time limit. Psychometrika, 15, 291 – 315. First citation in article Crossref, Google Scholar
Pomplun, M. & Ritchie, T. (2004). An investigation of context effects for item randomization within testlets. Journal of Educational Computing Research, 30, 243 – 254. First citation in article Crossref, Google Scholar
Qian, J. (2014). An investigation of position effects in large-scale writing assessments. Applied Psychological Measurement, 38, 518 – 534. First citation in article Crossref, Google Scholar
Rasch, G. (1966). An item analysis which takes individual differences into account. British Journal of Mathematical and Statistical Psychology, 19, 49 – 57. First citation in article Crossref, Google Scholar
Rost, J. (2004). Lehrbuch Testtheorie-Testkonstruktion (2., vollständig überarbeitete und erweiterte Aufl.). Bern: Huber. First citation in article Google Scholar
Schwarz, M. (1978). Estimation the dimensions of a model. Annals of Statistics, 6, 461 – 464. First citation in article Crossref, Google Scholar
Schweizer, K., Schreiner, M. & Gold, A. (2009). The confirmatory investigation of APM items with loadings as a function of the position and easiness of items: A two-dimensional model of APM. Psychology Science Quarterly, 51, 47 – 64. First citation in article Google Scholar
Schweizer, K., Troche, S. J. & Rammsayer, T. (2011). On the special relationship between fluid and general intelligence: New evidence obtained by considering the position effect. Personality and Individual Differences, 50, 1249 – 1254. First citation in article Crossref, Google Scholar
Segall, D. O. (2005). Computerized adaptive testing. In Kempf-Leonard K.Ed., Encyclopedia of social measurement (Vol. 1, pp. 429–438). New York, NY: Academic Press. First citation in article Google Scholar
Steinberg, L. (1994). Context and serial-order effects in personality measurement: Limits on the generality of measuring changes the measure. Journal of Personality and Social Psychology, 66, 341 – 349. First citation in article Crossref, Google Scholar
Thompson, N. A. & Weiss, D. A. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research & Evaluation, 16, 1 – 9. Zugriff am 20. 02. 2015. Verfügbar unter http://pareonline.net/getvn.asp?v=16&n=1 First citation in article Google Scholar
van der Linden, W. J. & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York, NY: Springer. First citation in article Crossref, Google Scholar
Weirich, S., Hecht, M. & Böhme, K. (2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38, 535–548. First citation in article Crossref, Google Scholar
Yen, W. M. (1980). The extent, causes and importance of context effects on item parameters for two latent trait models. Journal of Educational Measurement, 17, 297 – 311. First citation in article Crossref, Google Scholar
Ziegler, B., Balkenhol, A., Keimes, C. & Rexing, V. (2012). Diagnostik „funktionaler Lesekompetenz“. bwp@ Berufs- und Wirtschaftspädagogik–online, 22, 1 – 19. Zugriff am 20. 5. 2016 http://www.bwpat.de/ausgabe22/ziegler_etal_bwpat22.pdf First citation in article Google Scholar
Ziegler, B., Frey, A., Seeber, S., Balkenhol, A. & Bernhardt, R. (2016). Adaptive Messung allgemeiner Kompetenzen (MaK-adapt). In Beck, K.Landenberger, M.Oser, F.Hrsg., Technologiebasierte Kompetenzmessung in der beruflichen Bildung. Ergebnisse aus der BMBF-Förderinitiative ASCOT (S. 33 – 54). Bielefeld: wbv. First citation in article Google Scholar
Zwick, R. (1991). Effects of item order and context on estimation of NAEP reading proficiency. Educational Measurement: Issues and Practice, 10 (3), 10 – 16. First citation in article Crossref, Google Scholar

Volume 63Issue 3Juli 2017

ISSN: 0012-1924eISSN: 2190-622X

Licenses & Copyright

Keywords

PDF download

Verify Phone

Congrats!

Umgang mit Itempositionseffekten bei der Entwicklung computerisierter adaptiver Tests

Abstract

Literatur

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Umgang mit Itempositionseffekten bei der Entwicklung computerisierter adaptiver Tests

Abstract

Literatur

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners