Test Purification and the Evaluation of Differential Item Functioning with Multinomial Logistic Regression
Abstract
Summary We conducted a computer simulation study to determine the effect of using an iterative or noniterative multinomial logistic regression analysis (MLR) to detect differential item functioning (DIF) in polytomous items. A simple iteration in which ability is defined as total observed score in the test is compared with a two-step MLR in which the ability was purified by eliminating the DIF items. Data were generated to simulate several biased tests. The factors manipulated were: DIF effect size (0.5, 1.0, and 1.5), percentage of DIF items in the test (0%, 10%, 20% and 30%), DIF type (uniform and nonuniform) and sample size (500, 1000 and 2000). Item scores were generated using the graded response model. The MLR procedures were consistently able to detect both uniform and nonuniform DIF. When the two-step MLR procedure was used, the false-positive rate (the proportion of non-DIF items that were detected as DIF) decreased and the correct identification rate increased slightly. The purification process results in an improvement in the correct detection rate only in uniform DIF, large sample size, and large amount of DIF conditions. For nonuniform DIF there is no difference between the MLR-WP and MLR-TP procedures.
References
References
Agresti, A. (1984). Analysis of ordinal categorical data. . New York: Wiley.Agresti, A. (1990). Categorical data analysis. . New York: Wiley.Bishop, Y.M.M. Fienberg, S.E. Holland, P.W. (1975). Discrete multivariate analysis: Theory and practice. . Cambridge, MA: MIT Press.Camilli, G. (1979). A critique of the chi square method for assessing item bias. . Unpublished paper, Laboratory of Educational Research, University of Colorado.Clauser, B. Mazor, K.M. Hambleton, R.K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel Haenszel procedure.. Applied Measurement in Education, 6, 269– 279Cohen, A.S. Kim, S.H. Baker, F.B.(1993). (differential ). in the graded response model.. Applied Psychological Measurement, 17, 335– 350Cohen, A.S. Kim, S.H. Wollack, J.A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning.. Applied Psychological Measurement, 20, 15– 26Ellis, B. (1989). Differential item functioning: Implications for test translations.. Journal of Applied Psychology, 74, 912– 921French, A.W. Miller, T.R. (1996). Logistic Regression and its use in detecting differential item functioning in polytomous items.. Journal of Educational Measurement, 33, 315– 3321992). Washington: Aptech Systems, Inc.
(Hambleton, R.K. Cook, L. (1983). Robustness of Item Response models and effects of test length and sample size on the precision of ability estimates.. In D.J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptative testing (pp. 31-49). New York: Academic Press.Hambleton, R.K. Rogers, H.J. (1989). Detecting potentially biased test items: Comparison of IRT area and Mantel Haenszel methods.. Applied Measurement in Education, 2, 313– 334Hidalgo Montesinos, M.D. Lopez Pina, J.A. (2002). Two stage equating in differential item functioning detection under the Graded Response Model with the Raju area measures and the Lord statistic.. Educational and Psychological Measurement, 62, 32– 44Holland, P.W. Thayer, D.T. (1988). Differential item performance and Mantel Haenszel procedure.. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 129-146). Hillsdale, NJ: Erlbaum.Kim, S.H. Cohen, A.S. (1992). Effects of linking methods on detection of DIF.. Journal of Educational Measurement, 29, 51– 66Koch, W.R. (1983). Likert scaling using the Graded Response Latent Trait Model.. Applied Psychological Measurement, 7, 15– 32Kok, F.G. Mellenbergh, G.J. Van der Flier, H. (1985). Detecting experimentally induced item bias using the iterative logit method.. Journal of Educational Measurement, 22, 295– 303Lautenschlager, G.J. Flaherty, V.L. Park, D. (1994). IRT differential item functioning: An examination of ability scale purification.. Educational and Psychological Measurement, 54, 21– 31Lord, F.M. (1980). Applications of item response theory to practical testing problems. . Hillsdale, NJ: Erlbaum.Marco, G.L. (1977). Item Characteristic curve solutions to three intractable testing problems.. Journal of Educational Measurement, 14, 139– 160Mellenbergh, G.J. (1982). Contingency table models for assessing items bias.. Journal of Educational Statistics, 7, 105– 118Menard, S. (1995). Applied logistic regression analysis. . SAGE Paper Series on Quantitative Applications in the Social Sciences, 07-106. Thousand Oaks, CA: Sage.Meredith, W. Millsap, R.E. (1992). On the misuse of manifest variables in the detection of measurement bias.. Psychometrika, 57, 289– 311Miller, M.D. Oshima, T.C. (1992). Effect of sample size, number of biased items and magnitude of bias on a two stage item bias estimation method.. Applied Psychological Measurement, 16, 381– 388Miller, T.R. Spray, J.A. (1993). Logistic discriminant function analysis for DIF identification of polytomously scored items.. Journal of Educational Measurement, 30, 107– 122Miller, T.R. Spray, J.A. Wilson, A. (1992). A comparison of three methods for identifying nonuniform DIF in polytomously scored test items. . Paper presented at the Psychometric Society Meeting, Columbus, OH.Millsap, R.E. Everson, H.T. (1993). Methodology review: Statistical approaches for assessing measurement bias.. Applied Psychological Measurement, 16, 389– 402Millsap, R.E. Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias.. Applied Psychological Measurement, 16, 389– 402Narayanan, P. Swaminathan, H. (1996). Identification of items than show nonuniform DIF.. Applied Psychological Measurement, 20, 257– 274Navas Ara, M.J. Gómez Benito, J. (2002). Effects of ability scale purification on the identification of DIF.. European Journal of Psychological Assessment, 18, 9– 15Park, D.G. (1988). Investigations of item response theory item bias detection. . Unpublished doctoral dissertation, University of Georgia.Park, D.G. Lautenschlager, G.J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification.. Applied Psychological Measurement, 14, 163– 173Potenza, M.T. Dorans, N.J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation.. Applied Psychological Measurement, 19, 23– 37Raju, N.S. (1988). The area between two item characteristic curves.. Psychometrika, 53, 492– 502Rogers, H.J. Swaminathan, H. (1993). A comparison of logistic regression and Mantel Haenszel procedures for detecting differential item functioning.. Applied Psychological Measurement, 17, 105– 116Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores.. Psychometric Monograph Supplement, 17,1986). SAS user's guide (Release 6.04). . Cary, NC: Author.
(Scheuneman, J.D. (1979). A new method for assessing bias in test items.. Journal of Educational Measurement, 16, 143– 152Segall, D.O. (1983). Test characteristic curves, item bias and transformation to a common metric in IRT: A methodological artifact with serious consequences and a simple solution. . Unpublished manuscript, University of Illinois.Swaminathan, H. Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures.. Journal of Educational Measurement, 27, 361– 370Thissen, D. Steinberg, L. Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines.. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 147-169). Hillsdale, NJ: Erlbaum.Van der Flier, H. Mellenbergh, G.J. Ader, H.J. Wijn, M. (1984). An iterative item bias detection method.. Journal of Educational Measurement, 21, 131– 145