Originalia

Die Bedeutung der Itemauswahl und der Modellwahl für die längsschnittliche Erfassung von Kompetenzen

Lesekompetenzentwicklung in der Primarstufe

Alexander Robitzsch

Bundesinstitut für Bildungsforschung, Innovation & Entwicklung des österreichischen Schulwesens (BIFIE Salzburg)

Search for more papers by this author

Tobias Dörfler

Lehrstuhl für Empirische Bildungsforschung, Otto-Friedrich-Universität Bamberg

Search for more papers by this author

Maximilian Pfost

Lehrstuhl für Empirische Bildungsforschung, Otto-Friedrich-Universität Bamberg

Search for more papers by this author

, and

Cordula Artelt

Lehrstuhl für Empirische Bildungsforschung, Otto-Friedrich-Universität Bamberg

Search for more papers by this author

Published Online:November 10, 2011https://doi.org/10.1026/0049-8637/a000052

Abstract

Zusammenfassung. In diesem Beitrag wird die Entwicklung der Lesekompetenz von Grundschülerinnen und Grundschülern anhand des ELFE-Tests (Lenhard & Schneider, 2005) untersucht. Dabei wird die Bedeutung der Itemauswahl im verwendeten Test (sog. Item Sampling) als auch der Auswahl statistischer Modelle (sog. Multi Model Inferenz) im Hinblick auf Effektgrößen der Veränderung diskutiert und deren Variabilität quantifiziert. Es wird argumentiert, dass in einem Konzept der Generalisierbarkeit bei Tests (mindestens) drei Facetten eine wichtige Rolle spielen: die Stichprobenziehung oder Auswahl von Personen, Items und die Wahl statistischer Modelle. Die empirischen Befunde dieses Beitrags zeigen auf, dass die in Publikationen meist vernachlässigten Variationsquellen von Item Sampling und der Modellwahl gegenüber der Stichprobenziehung von Personen nicht vernachlässigbar sind.

Relevance of item selection and model selection for assessing the development of competencies: The development in reading competence in primary school students

Abstract. In this paper, the development of reading skills of students and primary school children using the ELFE test (Lenhard & Schneider, 2005) is investigated. The importance of item selection in the test (item sampling) and the selection of statistical models (multi model inference) are discussed in terms of effect sizes of change. It is argued that in a concept of generalizability in tests, (at least) three facets play an important role: the sampling or selection of persons, items, and the choice of statistical models. The empirical findings of this paper show that the most neglected sources of variation in publications of item sampling and model selection are not negligible compared to the sampling of persons.

Literatur

Afflerbach, P. , Pearson, D. , Paris, S. (2008). Clarifying differences between reading skills and reading strategies. The Reading Teacher, 61, 364–373. First citation in article Crossref, Google Scholar
Algina, J. , Keselman, H. J. , Penfield, R. D. (2005). Effect sizes and their intervals: The two-level repeated measures case. Educational and Psychological Measurement, 65, 241–258. First citation in article Crossref, Google Scholar
Artelt, C. , Dörfler, T. (2010). Förderung von Lesekompetenz als Aufgabe aller Fächer. Forschungsergebnisse und Anregungen für die Praxis. In Bayerisches Staatsministerium für Unterricht und Kultus (Hrsg.), ProLesen. Auf dem Weg zur Leseschule (S. 13–36). Donauwörth: Auer. First citation in article Google Scholar
Becker, M. , Lüdtke, O. , Trautwein, U. , Baumert, J. (2006). Leistungszuwachs in Mathematik: Evidenz für einen Schereneffekt im mehrgliedrigen Schulsystem? Zeitschrift für Pädagogische Psychologie, 20, 233–242. First citation in article Link, Google Scholar
Berk, R. , Brown, L. , Zhao, L. (2010). Statistical inference after model selection. Journal of Quantitative Crimonology, 26, 217–236. First citation in article Crossref, Google Scholar
Böhme, K. , Robitzsch, A. (2009) Lesekompetenzdiagnostik. In Bremerich-Vos, A. , Granzer, D. , Köller, O. (Hrsg.), Bildungsstandards Deutsch und Mathematik (S. 250–289). Weinheim: Beltz Pädagogik. First citation in article Google Scholar
Brennan, R. L. (2001). Generalizability theory. New York: Springer. First citation in article Crossref, Google Scholar
Brennan, R. L. (2011). Generalizability theory and classical test theory. Applied Measurement in Education, 24, 1–21. First citation in article Crossref, Google Scholar
Briggs, D. C. , Weeks, J. P. (2009). The impact of vertical scaling decisions on growth interpretations. Educational Measurement, 28, 3–14. First citation in article Crossref, Google Scholar
Briggs, D. C. , Wilson, M. (2007). Generalizability in item response modeling. Journal of Educational Measurement, 44, 131–155. First citation in article Crossref, Google Scholar
Brock, W. A. , Durlauf, S. N. , West, K. D. (2007). Model uncertainty and policy evaluation: Some theory and empirics. Journal of Econometrics, 136, 629–664. First citation in article Crossref, Google Scholar
Burnham, K. P. , Anderson, D. R. (2002). Model selection and multimodel inference. New York: Springer. First citation in article Google Scholar
Cameron, A. C. , Trivedi, P. K. (2005). Microeconometrics. New York: Cambridge University Press. First citation in article Crossref, Google Scholar
Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society. Series A, 158, 419–466. First citation in article Crossref, Google Scholar
Clyde, M. , George, E. I. (2004). Model uncertainty. Statistical Science, 19, 81–94. First citation in article Crossref, Google Scholar
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559. First citation in article Crossref, Google Scholar
de Jong, M. G. , Steenkamp, J.-B. E. M. , Fox, J.-P. (2007). Relaxing measurement invariance in cross-national consumer research using a hierarchical IRT model. Journal of Consumer Research, 34, 260–278. First citation in article Crossref, Google Scholar
Doran, H. , Bates, D. , Bliese, P. , Dowling, M. (2007). Estimating the multilevel Rasch model: With the lme4 package. Journal of Statistical Software, 20 (2). First citation in article Crossref, Google Scholar
Douglas, J. (2001). Asymptotic identifiability of nonparametric item response models. Psychometrika, 66, 531–540. First citation in article Crossref, Google Scholar
Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society, Series B, 57, 45–97. First citation in article Google Scholar
Draper, D. (2007). Bayesian multilevel analysis and MCMC. In J. de Leeuw, E. Meijer (Eds.). Handbook of multilevel analysis (pp. 77–140). New York: Springer. First citation in article Google Scholar
Duke, N. K. , Carlisle, J. (2011). The Development of Comprehension. In M. L. Kamil, P. D. Pearson, E. B. Moje, P. P. Afflerbach (Eds.). Handbook of Reading Research (Volume IV) (pp. 199–228). New York: Routledge. First citation in article Google Scholar
Fischer, G. H. (1995a). Derivations of the Rasch model. In G. H. Fischer, I. W. Molenaar (Eds.), Rasch models. Foundations, recent developments, and applications (pp. 15–31). New York: Springer. First citation in article Crossref, Google Scholar
Fischer, G. H. (1995b). Linear logistic models for change. In G. H. Fischer, I. W. Molenaar (Eds.), Rasch models. Foundations, recent developments, and applications (pp. 157–180). New York: Springer. First citation in article Crossref, Google Scholar
Garthwaite, P. H. , Mubwandarikwa, E. (2010). Selection of weights for weighted model averaging. Australian & New Zealand Journal of Statistics, 52, 362–382. First citation in article Crossref, Google Scholar
Gifi, A. (1990). Nonlinear multivariate analysis. New York: Wiley. First citation in article Google Scholar
Goldstein, H. (1980). Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of Mathematical and Statistical Psychology, 33, 234–246. First citation in article Crossref, Google Scholar
Haberman, S. (2005). Identifiability of parameters in item response models with unconstrained ability distributions. (ETS Research Report RR05-24). Princeton: ETS. First citation in article Google Scholar
Haberman, S. J. (2009). Linking parameter estimates derived from an item response model through separate calibrations (ETS Research Report RR09-40). Princeton: ETS. First citation in article Google Scholar
Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139–164. First citation in article Crossref, Google Scholar
Hill, C. J. , Bloom, H. S. , Black, A. R. , Lipsey, M. W. (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2 (3), 172–177. First citation in article Crossref, Google Scholar
Ho, A. D. (2009). A nonparametric framework for comparing trends and gaps across tests. Journal of Educational and Behavioral Statistics, 34, 201–228. First citation in article Crossref, Google Scholar
Holland, P. W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55, 577–601. First citation in article Crossref, Google Scholar
Holland, P. W. , Wainer, H. (Eds.). (1993). Differential item functioning: Theory and practice. Hillsdale, NJ: Erlbaum. First citation in article Google Scholar
Husek, T. R. , Sirotnik, K. (1967). Item sampling in educational research (CSEIP Occasional Report No. 2). Los Angeles: University of California. First citation in article Google Scholar
Hutchison, D. (2008). On the conceptualisation of measurement error. Oxford Review of Education, 34, 443–460. First citation in article Crossref, Google Scholar
Ip, E. H. (2010). Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models. British Journal of Mathematical and Statistical Psychology, 63, 395–416. First citation in article Crossref, Google Scholar
Junker, B. W. (1993). Conditional association, essential independence and monotone unidimensional item response models. Annals of Statistics, 21, 1359–1378. First citation in article Crossref, Google Scholar
Kane, M. (2011). The errors of our ways. Journal of Educational Measurement, 48, 12–30. First citation in article Crossref, Google Scholar
Kass, R. E. (2011). Statistical inference: The big picture. Statistical Science, 26, 1–9. First citation in article Crossref, Google Scholar
Klicpera, C. , Gasteiger-Klicpera, B. (1993). Lesen und Schreiben – Entwicklung und Schwierigkeiten: Die Wiener Längsschnittuntersuchungen über die Entwicklung, den Verlauf und die Ursachen von Lese- und Schreibschwierigkeiten in der Pflichtschulzeit. Bern: Huber. First citation in article Google Scholar
Kurz, K. , Kratzmann, J. , von Maurice, J. (2007). Die BiKS-Studie. Methodenbericht zur Stichprobenziehung. Bamberg. Zugriff am 08.02.2011. Verfügbar unter: psydok.sulb.uni-saarland.de/volltexte/2007/990/index.html First citation in article Google Scholar
Lenhard, W. , Schneider, W. (2005). ELFE 1–6: Ein Leseverständnistest für Erst- bis Sechstklässler (1. Auflage). Göttingen: Hogrefe. First citation in article Google Scholar
Levin, A. T. , Williams, J. C. (2003). Robust monetary policy with competing reference models. Journal of Monetary Economics, 50, 945–975. First citation in article Crossref, Google Scholar
Longford, N. T. (2003). An alternative to model selection in ordinary regression. Statistics and Computing, 13, 67–80. First citation in article Crossref, Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum. First citation in article Google Scholar
Lord, F. M. , Novick, M. R. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley. First citation in article Google Scholar
Lüdtke, O. , Robitzsch, A. (2010). Umgang mit fehlenden Daten in der empirischen Bildungsforschung. In S. Maschke, L. Stecher (Hrsg.), Enzyklopädie Erziehungswissenschaft Online. Fachgebiet Methoden der empirischen erziehungswissenschaftlichen Forschung, Quantitative Forschungsmethoden. Weinheim: Juventa. Zugriff über www.erzwissonline.de/. First citation in article Google Scholar
Marais, I. (2009). Response dependence and the measurement of change. Journal of Applied Measurement, 10, 17–29. First citation in article Google Scholar
Maris, G. , Bechger, T. (2010). On interpreting the model parameters for the three parameter logistic model. Measurement: Interdisciplinary Research & Perspective, 7, 75–88. First citation in article Crossref, Google Scholar
Mazzeo, J. , von Davier, M. (2008). Review of the Programme for International Student Assessment (PISA) test design: Recommendations for fostering stability in assessment results. EDU/PISA/GB (2008) 28. First citation in article Google Scholar
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah NJ: Lawrence Erlbaum. First citation in article Google Scholar
McDonald, R. P. , Mulaik, S. A. (1979). Determinacy of common factors: A nontechnical review. Psychological Bulletin, 86, 297–306. First citation in article Crossref, Google Scholar
Mevik, B.-H. , Wehrens, R. (2007). The pls package: Principal component and partial least squares regression in R. Journal of Statistical Software, 18 (2). First citation in article Crossref, Google Scholar
Michaelides, M. P. (2010). A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating. Frontiers in Psychology, 1, 167. First citation in article Crossref, Google Scholar
Molenaar, I. W. (1995). Some background for item response theory and the Rasch model. In G. H. Fischer, I. W. Molenaar (Eds.), Rasch models. Foundations, recent developments, and applications (pp. 3–14). New York: Springer. First citation in article Crossref, Google Scholar
Monseur, C. , Berezner, A. (2007). The computation of equating errors in international surveys in education. Journal of Applied Measurement, 8, 323–335. First citation in article Google Scholar
Morgan, S. L. , Winship, C. (2007). Counterfactuals and causal inference. Cambridge: Cambridge University Press. First citation in article Crossref, Google Scholar
Muthén, L. K. , Muthén, B. O. (1998–2010). Mplus user’s guide (6th ed.). Sixth Edition. Los Angeles, CA: Muthén & Muthén First citation in article Google Scholar
Peress, M. (in press). Identification of a semiparametric item response model. Psychometrika. First citation in article Google Scholar
Pfost, M. , Karing, C. , Lorenz, C. , Artelt, C. (2010). Schereneffekte im ein- und mehrgliedrigen Schulsystem. Differenzielle Entwicklung sprachlicher Kompetenzen am Übergang von der Grund- in die weiterführende Schule? Zeitschrift für Pädagogische Psychologie, 24, 259–273. First citation in article Link, Google Scholar
R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. www.R-project.org. First citation in article Google Scholar
Ramsay, J. O. (1996). A geometric approach to item response theory. Behaviormetrika, 23, 3–16. First citation in article Crossref, Google Scholar
Retelsdorf, J. , Möller, J. (2008). Entwicklungen von Lesekompetenz und Lesemotivation. Schereneffekte in der Sekundarstufe? Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 40, 179–188. First citation in article Link, Google Scholar
Richter, T. , Christmann, U. (2002). Lesekompetenz: Prozessebenen und interindividuelle Unterschiede. In N. Groeben, B. Hurrelmann (Hrsg.), Lesekompetenz: Bedingungen, Dimensionen, Funktionen (S. 25–58). Weinheim: Juventa. First citation in article Google Scholar
Rupp, A. A. , Zumbo, B. D. (2006). Understanding parameter invariance in unidimensional IRT models. Educational and Psychological Measurement, 66, 63–84. First citation in article Crossref, Google Scholar
Särndal, C.-E. , Swensson, B. , Wretman, J. (1992). Model assisted survey sampling. New York: Springer. First citation in article Crossref, Google Scholar
Scheerer-Neumann, G. (1997). Lesen und Leseschwierigkeiten. In F. E. Weinert (Hrsg.), Psychologie des Unterrichts und der Schule (S. 279–325). Göttingen: Hogrefe. First citation in article Google Scholar
Schneider, W. , Stefanek, J. (2004). Entwicklungsveränderungen allgemeiner kognitiver Fähigkeiten und schulbezogener Fertigkeiten im Kindes- und Jugendalter. Evidenz für einen Schereneffekt? Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 36, 147–159. First citation in article Link, Google Scholar
Schulz, E. M. , Nicewander, W. A. (1997). Grade equivalent and IRT representation of growth. Journal of Educational Measurement, 34, 315–331. First citation in article Crossref, Google Scholar
Spiegelhalter, D. , Thomas, A. , Best, N. , Lunn, D. (2003). WinBUGS version 1.4 user manual. MRC Biostatistics Unit. First citation in article Google Scholar
Steyer, R. , Eid, M. (2001). Messen und Testen. Berlin: Springer. First citation in article Crossref, Google Scholar
Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325. First citation in article Google Scholar
Strathmann, A. M. , Klauer, K. J. (2010). Lernverlaufsdiagnostik: Ein Ansatz zur längerfristigen Lernfortschrittsmessung. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 42, 111–122. First citation in article Link, Google Scholar
Stukel, T. A. (1988). Generalized logistic models. Journal of the American Statistical Association, 83, 426–431. First citation in article Crossref, Google Scholar
Tryon, R. C. (1957). Reliability and behavior domain validity: Reformulation and historical critique. Psychological Bulletin, 54, 229–249. First citation in article Crossref, Google Scholar
van Buuren, S. , Groothuis-Oudshoorn, K. (in press). MICE: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software. First citation in article Google Scholar
van den Berg, S. M. , Glas, C. A. W. , Boomsma, A. (2007). Variance decomposition using an IRT measurement model. Behavior Genetics, 37, 604–616. First citation in article Crossref, Google Scholar
van den Heuvel-Panhuizen, M. , Robitzsch, A. , Treffers, A. , Köller, O. (2009). Large-scale assessment of change in student achievement: Dutch primary school students’ results on written division in 1997 and 2004 as an example. Psychometrika, 74, 351–365. First citation in article Crossref, Google Scholar
van der Leeden, R. , Meijer, E. , Busing, F. M. T. A. (2007). Resampling multilevel models. In J. de Leeuw, E. Meijer (Eds.), Handbook of multilevel analysis (pp. 401–434). New York: Springer. First citation in article Google Scholar
van der Linden, W. J. (2001). Book review: Applying the Rasch model by Bond & Fox. International Journal of Testing, 1, 319–326. First citation in article Crossref, Google Scholar
van der Maas, H. J. L. , Molenaar, D. , Maris, G. , Kievit, R. A. , Borsboom, D. (2011). Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences. Psychological Review, 318, 339–356. First citation in article Crossref, Google Scholar
Wainer, H. (2010). Schrödinger’s cat and the conception of probability in item response theory. Chance, 23, 53–56. First citation in article Crossref, Google Scholar
Wu, M. L. (2010). Measurement, sampling, and equating errors in large-scale assessments. Educational Measurement, 29, 15–27. First citation in article Crossref, Google Scholar
Wu, M. L. , Adams, R. J. , Wilson, M. R. , Haldane, S. (2007). ACER ConQuest Version 2.0. Mulgrave. First citation in article Google Scholar
Young, C. (2009). Model uncertainty in sociological research: An application to religion and economic growth. American Sociological Review, 74, 380–397. First citation in article Crossref, Google Scholar
Yucel, R. M. (2008). Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philosophical Transactions of the Royal Society A, 366, 2389–2404. First citation in article Crossref, Google Scholar
Zwick, R. (1992). Statistical and psychometric issues in the measurement of educational achievement trends: Examples from the National Assessment of Educational Progress. Journal of Educational Statistics, 17, 205–218. First citation in article Crossref, Google Scholar

Themenheft: Kompetenzentwicklung

Volume 43Issue 4Oktober 2011

ISSN: 0049-8637eISSN: 2190-6262

Licenses & Copyright

Keywords

PDF download

Verify Phone

Congrats!

Die Bedeutung der Itemauswahl und der Modellwahl für die längsschnittliche Erfassung von Kompetenzen

Lesekompetenzentwicklung in der Primarstufe

Abstract

Literatur

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Die Bedeutung der Itemauswahl und der Modellwahl für die längsschnittliche Erfassung von Kompetenzen

Lesekompetenzentwicklung in der Primarstufe

Abstract

Literatur

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners