Original Articles

Test Purification and the Evaluation of Differential Item Functioning with Multinomial Logistic Regression

M. Dolores Hidalgo-Montesinos

University of Murcia, Spain

Search for more papers by this author

and

Juana Gómez-Benito

University of Barcelona, Spain

Search for more papers by this author

Published Online:September 01, 2006https://doi.org/10.1027//1015-5759.19.1.1

Abstract

Summary We conducted a computer simulation study to determine the effect of using an iterative or noniterative multinomial logistic regression analysis (MLR) to detect differential item functioning (DIF) in polytomous items. A simple iteration in which ability is defined as total observed score in the test is compared with a two-step MLR in which the ability was purified by eliminating the DIF items. Data were generated to simulate several biased tests. The factors manipulated were: DIF effect size (0.5, 1.0, and 1.5), percentage of DIF items in the test (0%, 10%, 20% and 30%), DIF type (uniform and nonuniform) and sample size (500, 1000 and 2000). Item scores were generated using the graded response model. The MLR procedures were consistently able to detect both uniform and nonuniform DIF. When the two-step MLR procedure was used, the false-positive rate (the proportion of non-DIF items that were detected as DIF) decreased and the correct identification rate increased slightly. The purification process results in an improvement in the correct detection rate only in uniform DIF, large sample size, and large amount of DIF conditions. For nonuniform DIF there is no difference between the MLR-WP and MLR-TP procedures.

References

Agresti, A. (1984). Analysis of ordinal categorical data. . New York: Wiley. First citation in article Google Scholar
Agresti, A. (1990). Categorical data analysis. . New York: Wiley. First citation in article Google Scholar
Bishop, Y.M.M. Fienberg, S.E. Holland, P.W. (1975). Discrete multivariate analysis: Theory and practice. . Cambridge, MA: MIT Press. First citation in article Google Scholar
Camilli, G. (1979). A critique of the chi square method for assessing item bias. . Unpublished paper, Laboratory of Educational Research, University of Colorado. First citation in article Google Scholar
Clauser, B. Mazor, K.M. Hambleton, R.K. (1993). The effects of purification of the matching criterion on the identification of DIF using the Mantel Haenszel procedure.. Applied Measurement in Education, 6, 269– 279 First citation in article Crossref, Google Scholar
Cohen, A.S. Kim, S.H. Baker, F.B.(1993). (differential ). in the graded response model.. Applied Psychological Measurement, 17, 335– 350 First citation in article Crossref, Google Scholar
Cohen, A.S. Kim, S.H. Wollack, J.A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning.. Applied Psychological Measurement, 20, 15– 26 First citation in article Crossref, Google Scholar
Ellis, B. (1989). Differential item functioning: Implications for test translations.. Journal of Applied Psychology, 74, 912– 921 First citation in article Crossref, Google Scholar
French, A.W. Miller, T.R. (1996). Logistic Regression and its use in detecting differential item functioning in polytomous items.. Journal of Educational Measurement, 33, 315– 332 First citation in article Crossref, Google Scholar
GAUSS System version 3.0 (1992). Washington: Aptech Systems, Inc. First citation in article Google Scholar
Hambleton, R.K. Cook, L. (1983). Robustness of Item Response models and effects of test length and sample size on the precision of ability estimates.. In D.J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptative testing (pp. 31-49). New York: Academic Press. First citation in article Google Scholar
Hambleton, R.K. Rogers, H.J. (1989). Detecting potentially biased test items: Comparison of IRT area and Mantel Haenszel methods.. Applied Measurement in Education, 2, 313– 334 First citation in article Crossref, Google Scholar
Hidalgo Montesinos, M.D. Lopez Pina, J.A. (2002). Two stage equating in differential item functioning detection under the Graded Response Model with the Raju area measures and the Lord statistic.. Educational and Psychological Measurement, 62, 32– 44 First citation in article Crossref, Google Scholar
Holland, P.W. Thayer, D.T. (1988). Differential item performance and Mantel Haenszel procedure.. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 129-146). Hillsdale, NJ: Erlbaum. First citation in article Google Scholar
Kim, S.H. Cohen, A.S. (1992). Effects of linking methods on detection of DIF.. Journal of Educational Measurement, 29, 51– 66 First citation in article Crossref, Google Scholar
Koch, W.R. (1983). Likert scaling using the Graded Response Latent Trait Model.. Applied Psychological Measurement, 7, 15– 32 First citation in article Crossref, Google Scholar
Kok, F.G. Mellenbergh, G.J. Van der Flier, H. (1985). Detecting experimentally induced item bias using the iterative logit method.. Journal of Educational Measurement, 22, 295– 303 First citation in article Crossref, Google Scholar
Lautenschlager, G.J. Flaherty, V.L. Park, D. (1994). IRT differential item functioning: An examination of ability scale purification.. Educational and Psychological Measurement, 54, 21– 31 First citation in article Crossref, Google Scholar
Lord, F.M. (1980). Applications of item response theory to practical testing problems. . Hillsdale, NJ: Erlbaum. First citation in article Google Scholar
Marco, G.L. (1977). Item Characteristic curve solutions to three intractable testing problems.. Journal of Educational Measurement, 14, 139– 160 First citation in article Crossref, Google Scholar
Mellenbergh, G.J. (1982). Contingency table models for assessing items bias.. Journal of Educational Statistics, 7, 105– 118 First citation in article Crossref, Google Scholar
Menard, S. (1995). Applied logistic regression analysis. . SAGE Paper Series on Quantitative Applications in the Social Sciences, 07-106. Thousand Oaks, CA: Sage. First citation in article Google Scholar
Meredith, W. Millsap, R.E. (1992). On the misuse of manifest variables in the detection of measurement bias.. Psychometrika, 57, 289– 311 First citation in article Crossref, Google Scholar
Miller, M.D. Oshima, T.C. (1992). Effect of sample size, number of biased items and magnitude of bias on a two stage item bias estimation method.. Applied Psychological Measurement, 16, 381– 388 First citation in article Crossref, Google Scholar
Miller, T.R. Spray, J.A. (1993). Logistic discriminant function analysis for DIF identification of polytomously scored items.. Journal of Educational Measurement, 30, 107– 122 First citation in article Crossref, Google Scholar
Miller, T.R. Spray, J.A. Wilson, A. (1992). A comparison of three methods for identifying nonuniform DIF in polytomously scored test items. . Paper presented at the Psychometric Society Meeting, Columbus, OH. First citation in article Google Scholar
Millsap, R.E. Everson, H.T. (1993). Methodology review: Statistical approaches for assessing measurement bias.. Applied Psychological Measurement, 16, 389– 402 First citation in article Crossref, Google Scholar
Millsap, R.E. Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias.. Applied Psychological Measurement, 16, 389– 402 First citation in article Crossref, Google Scholar
Narayanan, P. Swaminathan, H. (1996). Identification of items than show nonuniform DIF.. Applied Psychological Measurement, 20, 257– 274 First citation in article Crossref, Google Scholar
Navas Ara, M.J. Gómez Benito, J. (2002). Effects of ability scale purification on the identification of DIF.. European Journal of Psychological Assessment, 18, 9– 15 First citation in article Link, Google Scholar
Park, D.G. (1988). Investigations of item response theory item bias detection. . Unpublished doctoral dissertation, University of Georgia. First citation in article Google Scholar
Park, D.G. Lautenschlager, G.J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification.. Applied Psychological Measurement, 14, 163– 173 First citation in article Crossref, Google Scholar
Potenza, M.T. Dorans, N.J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation.. Applied Psychological Measurement, 19, 23– 37 First citation in article Crossref, Google Scholar
Raju, N.S. (1988). The area between two item characteristic curves.. Psychometrika, 53, 492– 502 First citation in article Crossref, Google Scholar
Rogers, H.J. Swaminathan, H. (1993). A comparison of logistic regression and Mantel Haenszel procedures for detecting differential item functioning.. Applied Psychological Measurement, 17, 105– 116 First citation in article Crossref, Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores.. Psychometric Monograph Supplement, 17, First citation in article Google Scholar
SAS Institute (1986). SAS user's guide (Release 6.04). . Cary, NC: Author. First citation in article Google Scholar
Scheuneman, J.D. (1979). A new method for assessing bias in test items.. Journal of Educational Measurement, 16, 143– 152 First citation in article Crossref, Google Scholar
Segall, D.O. (1983). Test characteristic curves, item bias and transformation to a common metric in IRT: A methodological artifact with serious consequences and a simple solution. . Unpublished manuscript, University of Illinois. First citation in article Google Scholar
Swaminathan, H. Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures.. Journal of Educational Measurement, 27, 361– 370 First citation in article Crossref, Google Scholar
Thissen, D. Steinberg, L. Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines.. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 147-169). Hillsdale, NJ: Erlbaum. First citation in article Google Scholar
Van der Flier, H. Mellenbergh, G.J. Ader, H.J. Wijn, M. (1984). An iterative item bias detection method.. Journal of Educational Measurement, 21, 131– 145 First citation in article Crossref, Google Scholar

Volume 19Issue 1January 2003

ISSN: 1015-5759eISSN: 2151-2426

Licenses & Copyright

Keywords

Acknowledgments:

We would like to acknowledge the helpful suggestions and comments made by two anonymous reviewers. This research was partially supported by Grant BSO2001- 3751 C02-02 from the “Ministerio de Ciencia y Tecnolog¡a, DGI” Spain.

PDF download

Verify Phone

Congrats!

Test Purification and the Evaluation of Differential Item Functioning with Multinomial Logistic Regression

Abstract

References

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Test Purification and the Evaluation of Differential Item Functioning with Multinomial Logistic Regression

Abstract

References

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners