Skip to main content
Original Article

Detection of Differential Item Functioning

Using Decision Rules Based on the Mantel-Haenszel Procedure and Breslow-Day Tests

Published Online:https://doi.org/10.1027/1614-2241/a000038

This study analyzes Differential Item Functioning (DIF) with three combined decision rules and compares the results with the variation of the Mantel-Haenszel procedure (vaMH) proposed by Mazor, Clauser, and Hambleton (1994). One decision rule combines the Mantel-Haenszel procedure (MH) with the Breslow-Day test of trend in odds ratio heterogeneity (BDT), having performed the Bonferroni adjustment, as Randall Penfield proposed. The second uses both MH and BDT without the Bonferroni adjustment. The third combines MH with the Breslow-Day test for homogeneity of the odds ratio without the Bonferroni adjustment. The three decision rules yielded satisfactory results, showed similar power, and none of them detected DIF erroneously. The second rule proved to be the most powerful in the presence of nonuniform DIF. Only in the presence of uniform DIF with the smallest difference of difficulty parameters, was there evidence of vaMH’s superiority.

References

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education . (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association. First citation in articleGoogle Scholar

  • Aguerri, M. E. , Galibert, M. S. , Attorresi, H. F. , Prieto-Marañón, P. (2009). Erroneous detection of nonuniform DIF using the Breslow-Day test in a short test. Quality and Quantity. International Journal of Methodology, 43, 35–44. First citation in articleCrossrefGoogle Scholar

  • Aguerri, M. E. , Galibert, M. S. , Lozzia, G. S. , Attorresi, H. F. (2004). Un estudio acerca del funcionamiento diferencial no uniforme del ítem. [A study about nonuniform differential item functioning] Metodología de las Ciencias del Comportamiento, Volumen Especial. 7–10. First citation in articleGoogle Scholar

  • Breslow, N. E. , Day, N. E. (1980). Statistical methods in cancer research. Volume I. The analysis of case-control studies. Lyon, France International Agency for Research on Cancer (IARC Scientific Publication No. 32). First citation in articleGoogle Scholar

  • Camilli, G. , Shepard, L. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage. First citation in articleGoogle Scholar

  • Clauser, B. E. , Mazor, K. M. (1998). Using statistical procedures to identify differential functioning test items. Educational Measurement: Issues and Practice, 17, 31–44. First citation in articleCrossrefGoogle Scholar

  • Clauser, B. E. , Mazor, K. M. , Hambleton, R. K. (1994). The effects of score group width on the Mantel-Haenszel procedure. Journal of Educational Measurement, 31, 67–78. First citation in articleCrossrefGoogle Scholar

  • Ferreres, D. , Fidalgo, A. M. , & Muñiz, J. (2000). Detección del funcionamiento diferencial de los items no uniforme: Comparación de los métodos Mantel-Haenszel y regresión logística. [Detection of nonuniform DIF: Mantel-Haenszel and logistic regression methods] Psicothema, 12, 220–225. First citation in articleGoogle Scholar

  • Hanson, B. A. (1998). Uniform DIF and DIF defined by differences in item response functions. Journal of Educational and Behavioral Statistics, 23, 244–253. First citation in articleCrossrefGoogle Scholar

  • Hidalgo-Montesinos, M. D. , López-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel Procedures. Educational and Psychological Measurement, 64, 903–915. First citation in articleCrossrefGoogle Scholar

  • Holland, P. , Thayer, D. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer, H. I. Braun, (Ed.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum. First citation in articleGoogle Scholar

  • Hosmer, D. W. , Lemeshow, S. (1989). Applied logistic regression. New York, NY: Wiley. First citation in articleGoogle Scholar

  • Kingston, N. , Leary, L. , Wightman, L. (1988). An exploratory study of the applicability of item response theory methods to the Graduate Management Admissions Test (GMAC Occasional Papers). Princeton, NJ: Graduate Management Admissions Council. First citation in articleGoogle Scholar

  • Li, H. , Stout, W. F. (1993). A new procedure for detection of crossing DIF/bias. Paper presented at the annual meeting of the American Educational Research Association Atlanta. First citation in articleGoogle Scholar

  • Mantel, N. , Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748. First citation in articleGoogle Scholar

  • Mazor, K. , Clauser, B. , & Hambleton, R. K. (1994). Identification of nonuniform differential item functioning using variation of the Mantel-Haenszel procedure. Educational and Psychological Measurement, 54, 284–291. First citation in articleCrossrefGoogle Scholar

  • Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105–108. First citation in articleCrossrefGoogle Scholar

  • Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143. First citation in articleCrossrefGoogle Scholar

  • Narayanan, P. , Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 252–274. First citation in articleGoogle Scholar

  • Penfield, R. (2003). Applying the Breslow-Day test of trend in odds ratio heterogeneity to the analysis of nonuniform DIF. The Alberta Journal of Educational Research, 49, 231–243. First citation in articleGoogle Scholar

  • Prieto-Marañón, P. (2005). Bday: Computational program for the detection of DIF by the Breslow-Day tests, the Mantel-Haenszel procedures, and combined decision rules. Unpublished manuscript. First citation in articleGoogle Scholar

  • Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 284–291. First citation in articleCrossrefGoogle Scholar

  • Rogers, H. J. , Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116. First citation in articleCrossrefGoogle Scholar

  • Swaminathan, H. , Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. First citation in articleCrossrefGoogle Scholar

  • Waller, N. G. (1998). EZDIF: Detection of uniform and nonuniform differential item functioning with Mantel-Haenszel and Logistic Regression Procedures. Applied Psychological Measurement, 22, 391. First citation in articleCrossrefGoogle Scholar

  • Yoes, M. (1997). PARDSIM Parameter and Response Data Simulation. Software St. Paul, MN Assessment System Corporation. First citation in articleGoogle Scholar

  • Zumbo, B. D. (2007). Three generations of DIF analysis: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4, 223–233. First citation in articleCrossrefGoogle Scholar