Die Bedeutung der Itemauswahl und der Modellwahl für die längsschnittliche Erfassung von Kompetenzen
Lesekompetenzentwicklung in der Primarstufe
Abstract
Zusammenfassung. In diesem Beitrag wird die Entwicklung der Lesekompetenz von Grundschülerinnen und Grundschülern anhand des ELFE-Tests (Lenhard & Schneider, 2005) untersucht. Dabei wird die Bedeutung der Itemauswahl im verwendeten Test (sog. Item Sampling) als auch der Auswahl statistischer Modelle (sog. Multi Model Inferenz) im Hinblick auf Effektgrößen der Veränderung diskutiert und deren Variabilität quantifiziert. Es wird argumentiert, dass in einem Konzept der Generalisierbarkeit bei Tests (mindestens) drei Facetten eine wichtige Rolle spielen: die Stichprobenziehung oder Auswahl von Personen, Items und die Wahl statistischer Modelle. Die empirischen Befunde dieses Beitrags zeigen auf, dass die in Publikationen meist vernachlässigten Variationsquellen von Item Sampling und der Modellwahl gegenüber der Stichprobenziehung von Personen nicht vernachlässigbar sind.
Abstract. In this paper, the development of reading skills of students and primary school children using the ELFE test (Lenhard & Schneider, 2005) is investigated. The importance of item selection in the test (item sampling) and the selection of statistical models (multi model inference) are discussed in terms of effect sizes of change. It is argued that in a concept of generalizability in tests, (at least) three facets play an important role: the sampling or selection of persons, items, and the choice of statistical models. The empirical findings of this paper show that the most neglected sources of variation in publications of item sampling and model selection are not negligible compared to the sampling of persons.
Literatur
2008). Clarifying differences between reading skills and reading strategies. The Reading Teacher, 61, 364–373.
(2005). Effect sizes and their intervals: The two-level repeated measures case. Educational and Psychological Measurement, 65, 241–258.
(2010). Förderung von Lesekompetenz als Aufgabe aller Fächer. Forschungsergebnisse und Anregungen für die Praxis. In , ProLesen. Auf dem Weg zur Leseschule (S. 13–36). Donauwörth: Auer.
(2006). Leistungszuwachs in Mathematik: Evidenz für einen Schereneffekt im mehrgliedrigen Schulsystem? Zeitschrift für Pädagogische Psychologie, 20, 233–242.
(2010). Statistical inference after model selection. Journal of Quantitative Crimonology, 26, 217–236.
(2009) Lesekompetenzdiagnostik. In , Bildungsstandards Deutsch und Mathematik (S. 250–289). Weinheim: Beltz Pädagogik.
(2001). Generalizability theory. New York: Springer.
(2011). Generalizability theory and classical test theory. Applied Measurement in Education, 24, 1–21.
(2009). The impact of vertical scaling decisions on growth interpretations. Educational Measurement, 28, 3–14.
(2007). Generalizability in item response modeling. Journal of Educational Measurement, 44, 131–155.
(2007). Model uncertainty and policy evaluation: Some theory and empirics. Journal of Econometrics, 136, 629–664.
(2002). Model selection and multimodel inference. New York: Springer.
(2005). Microeconometrics. New York: Cambridge University Press.
(1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society. Series A, 158, 419–466.
(2004). Model uncertainty. Statistical Science, 19, 81–94.
(2008). Random item IRT models. Psychometrika, 73, 533–559.
(2007). Relaxing measurement invariance in cross-national consumer research using a hierarchical IRT model. Journal of Consumer Research, 34, 260–278.
(2007). Estimating the multilevel Rasch model: With the lme4 package. Journal of Statistical Software, 20 (2).
(2001). Asymptotic identifiability of nonparametric item response models. Psychometrika, 66, 531–540.
(1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society, Series B, 57, 45–97.
(2007). Bayesian multilevel analysis and MCMC. In . Handbook of multilevel analysis (pp. 77–140). New York: Springer.
(2011). The Development of Comprehension. In . Handbook of Reading Research (Volume IV) (pp. 199–228). New York: Routledge.
(1995a). Derivations of the Rasch model. In , Rasch models. Foundations, recent developments, and applications (pp. 15–31). New York: Springer.
(1995b). Linear logistic models for change. In , Rasch models. Foundations, recent developments, and applications (pp. 157–180). New York: Springer.
(2010). Selection of weights for weighted model averaging. Australian & New Zealand Journal of Statistics, 52, 362–382.
(1990). Nonlinear multivariate analysis. New York: Wiley.
(1980). Dimensionality, bias, independence and measurement scale problems in latent trait test score models. British Journal of Mathematical and Statistical Psychology, 33, 234–246.
(2005). Identifiability of parameters in item response models with unconstrained ability distributions. (ETS Research Report RR05-24). Princeton: ETS.
(2009). Linking parameter estimates derived from an item response model through separate calibrations (ETS Research Report RR09-40). Princeton: ETS.
(1985). Methodology review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139–164.
(2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2 (3), 172–177.
(2009). A nonparametric framework for comparing trends and gaps across tests. Journal of Educational and Behavioral Statistics, 34, 201–228.
(1990). On the sampling theory foundations of item response theory models. Psychometrika, 55, 577–601.
(1993). Differential item functioning: Theory and practice. Hillsdale, NJ: Erlbaum.
. (1967). Item sampling in educational research (CSEIP Occasional Report No. 2). Los Angeles: University of California.
(2008). On the conceptualisation of measurement error. Oxford Review of Education, 34, 443–460.
(2010). Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models. British Journal of Mathematical and Statistical Psychology, 63, 395–416.
(1993). Conditional association, essential independence and monotone unidimensional item response models. Annals of Statistics, 21, 1359–1378.
(2011). The errors of our ways. Journal of Educational Measurement, 48, 12–30.
(2011). Statistical inference: The big picture. Statistical Science, 26, 1–9.
(1993). Lesen und Schreiben – Entwicklung und Schwierigkeiten: Die Wiener Längsschnittuntersuchungen über die Entwicklung, den Verlauf und die Ursachen von Lese- und Schreibschwierigkeiten in der Pflichtschulzeit. Bern: Huber.
(2007). Die BiKS-Studie. Methodenbericht zur Stichprobenziehung. Bamberg. Zugriff am 08.02.2011. Verfügbar unter: psydok.sulb.uni-saarland.de/volltexte/2007/990/index.html
(2005). ELFE 1–6: Ein Leseverständnistest für Erst- bis Sechstklässler (1. Auflage). Göttingen: Hogrefe.
(2003). Robust monetary policy with competing reference models. Journal of Monetary Economics, 50, 945–975.
(2003). An alternative to model selection in ordinary regression. Statistics and Computing, 13, 67–80.
(1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
(1968). Statistical theories of mental test scores. Reading: Addison-Wesley.
(2010). Umgang mit fehlenden Daten in der empirischen Bildungsforschung. In , Enzyklopädie Erziehungswissenschaft Online. Fachgebiet Methoden der empirischen erziehungswissenschaftlichen Forschung, Quantitative Forschungsmethoden. Weinheim: Juventa. Zugriff über www.erzwissonline.de/.
(2009). Response dependence and the measurement of change. Journal of Applied Measurement, 10, 17–29.
(2010). On interpreting the model parameters for the three parameter logistic model. Measurement: Interdisciplinary Research & Perspective, 7, 75–88.
(2008). Review of the Programme for International Student Assessment (PISA) test design: Recommendations for fostering stability in assessment results. EDU/PISA/GB (2008) 28.
(1999). Test theory: A unified treatment. Mahwah NJ: Lawrence Erlbaum.
(1979). Determinacy of common factors: A nontechnical review. Psychological Bulletin, 86, 297–306.
(2007). The pls package: Principal component and partial least squares regression in R. Journal of Statistical Software, 18 (2).
(2010). A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating. Frontiers in Psychology, 1, 167.
(1995). Some background for item response theory and the Rasch model. In , Rasch models. Foundations, recent developments, and applications (pp. 3–14). New York: Springer.
(2007). The computation of equating errors in international surveys in education. Journal of Applied Measurement, 8, 323–335.
(2007). Counterfactuals and causal inference. Cambridge: Cambridge University Press.
(1998–2010). Mplus user’s guide (6th ed.). Sixth Edition. Los Angeles, CA: Muthén & Muthén
(in press ). Identification of a semiparametric item response model. Psychometrika.2010). Schereneffekte im ein- und mehrgliedrigen Schulsystem. Differenzielle Entwicklung sprachlicher Kompetenzen am Übergang von der Grund- in die weiterführende Schule? Zeitschrift für Pädagogische Psychologie, 24, 259–273.
(2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. www.R-project.org.
(1996). A geometric approach to item response theory. Behaviormetrika, 23, 3–16.
(2008). Entwicklungen von Lesekompetenz und Lesemotivation. Schereneffekte in der Sekundarstufe? Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 40, 179–188.
(2002). Lesekompetenz: Prozessebenen und interindividuelle Unterschiede. In , Lesekompetenz: Bedingungen, Dimensionen, Funktionen (S. 25–58). Weinheim: Juventa.
(2006). Understanding parameter invariance in unidimensional IRT models. Educational and Psychological Measurement, 66, 63–84.
(1992). Model assisted survey sampling. New York: Springer.
(Psychologie des Unterrichts und der Schule (S. 279–325). Göttingen: Hogrefe.
(1997). Lesen und Leseschwierigkeiten. In ,2004). Entwicklungsveränderungen allgemeiner kognitiver Fähigkeiten und schulbezogener Fertigkeiten im Kindes- und Jugendalter. Evidenz für einen Schereneffekt? Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 36, 147–159.
(1997). Grade equivalent and IRT representation of growth. Journal of Educational Measurement, 34, 315–331.
(2003). WinBUGS version 1.4 user manual. MRC Biostatistics Unit.
(2001). Messen und Testen. Berlin: Springer.
(1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325.
(2010). Lernverlaufsdiagnostik: Ein Ansatz zur längerfristigen Lernfortschrittsmessung. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 42, 111–122.
(1988). Generalized logistic models. Journal of the American Statistical Association, 83, 426–431.
(1957). Reliability and behavior domain validity: Reformulation and historical critique. Psychological Bulletin, 54, 229–249.
(in press ). MICE: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software.2007). Variance decomposition using an IRT measurement model. Behavior Genetics, 37, 604–616.
(2009). Large-scale assessment of change in student achievement: Dutch primary school students’ results on written division in 1997 and 2004 as an example. Psychometrika, 74, 351–365.
(2007). Resampling multilevel models. In , Handbook of multilevel analysis (pp. 401–434). New York: Springer.
(2001). Book review: Applying the Rasch model by Bond & Fox. International Journal of Testing, 1, 319–326.
(2011). Cognitive psychology meets psychometric theory: On the relation between process models for decision making and latent variable models for individual differences. Psychological Review, 318, 339–356.
(2010). Schrödinger’s cat and the conception of probability in item response theory. Chance, 23, 53–56.
(2010). Measurement, sampling, and equating errors in large-scale assessments. Educational Measurement, 29, 15–27.
(2007). ACER ConQuest Version 2.0. Mulgrave.
(2009). Model uncertainty in sociological research: An application to religion and economic growth. American Sociological Review, 74, 380–397.
(2008). Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philosophical Transactions of the Royal Society A, 366, 2389–2404.
(1992). Statistical and psychometric issues in the measurement of educational achievement trends: Examples from the National Assessment of Educational Progress. Journal of Educational Statistics, 17, 205–218.
(