Skip to main content

Empirical Option Weights Improve the Validity of a Multiple-Choice Knowledge Test

Published Online:https://doi.org/10.1027/1015-5759/a000295

Abstract. Standard dichotomous scoring of multiple-choice test items grants no partial credit for partial knowledge. Empirical option weighting is an alternative, polychotomous scoring method that uses the point-biserial correlation between option choices and total score as a weight for each answer alternative. Extant studies demonstrate that the method increases reliability of multiple-choice tests in comparison to conventional scoring. Most previous studies employed a correlational validation approach, however, and provided mixed findings with regard to the validity of empirical option weighting. The present study is the first investigation using an experimental approach to determine the reliability and validity of empirical option weighting. To obtain an external validation criterion, we experimentally induced various degrees of knowledge in a domain of which participants had no knowledge. We found that in comparison to dichotomous scoring, empirical option weighting increased both reliability and validity of a multiple-choice knowledge test employing distractors that were appealing to test takers with different levels of knowledge. A potential application of the present results is the computation and publication of empirical option weights for existing multiple-choice knowledge tests that have previously been scored dichotomously.

References

  • Aust, F., Diedenhofen, B., Ullrich, S. & Musch, J. (2013). Seriousness checks are useful to improve data validity in online research. Behavioral Research Methods, 45, 527–535. doi: 10.3758/s13428-012-0265-2 First citation in articleCrossrefGoogle Scholar

  • Carpenter, P. A., Just, M. A. & Shell, P. (1990). What one intelligence test measures: A theoretical account of the processing in the raven progressive matrices test. Psychological Review, 97, 404–431. doi: 10.1037/0033-295x.97.3.404 First citation in articleCrossrefGoogle Scholar

  • Claudy, J. G. (1978). Biserial weights: A new approach to test item option weighting. Applied Psychological Measurement, 2, 25–30. doi: 10.1177/014662167800200102 First citation in articleCrossrefGoogle Scholar

  • Cross, L. H. & Frary, R. B. (1978). Empirical choice weighting under “guess” and “do not guess” directions. Educational and Psychological Measurement, 38, 613–620. doi: 10.1177/001316447803800302 First citation in articleCrossrefGoogle Scholar

  • Cross, L. H., Ross, F. K. & Geller, E. S. (1980). Using choice-weighted scoring of multiple-choice tests for determination of grades in college courses. The Journal of Experimental Educational, 48, 296–301. doi: 10.1080/00220973.1980.11011747 First citation in articleCrossrefGoogle Scholar

  • Davis, F. B. & Fifer, G. (1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19, 159–170. doi: 10.1177/001316445901900202 First citation in articleCrossrefGoogle Scholar

  • Diedenhofen, B. (2013). cocron: Statistical comparisons of two or more alpha coefficients. (Version 1.0-0). Retrieved from http://comparingcronbachalphas.org/ First citation in articleGoogle Scholar

  • Downey, R. G. (1979). Item-option weighting of achievement tests: Comparative study of methods. Applied Psychological Measurement, 3, 453–461. doi: 10.1177/014662167900300403 First citation in articleCrossrefGoogle Scholar

  • Echternacht, G. (1976). Reliability and validity of item option weighting schemes. Educational and Psychological Measurement, 36, 301–309. doi: 10.1177/001316447603600208 First citation in articleCrossrefGoogle Scholar

  • Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3, 380–396. doi: 10.1037/1082-989x.3.3.380 First citation in articleCrossrefGoogle Scholar

  • Feldt, L. S., Woodruff, D. J. & Salih, F. A. (1987). Statistical inference for coefficient alpha. Applied Psychological Measurement, 11, 93–103. doi: 10.1177/014662168701100107 First citation in articleCrossrefGoogle Scholar

  • Frary, R. B. (1982). A simulation study of reliability and validity of multiple-choice test scores under six response-scoring modes. Journal of Educational and Behavioral Statistics, 7, 333–351. doi: 10.3102/10769986007004333 First citation in articleCrossrefGoogle Scholar

  • Frary, R. B. (1989). Partial-credit scoring methods for multiple-choice tests. Applied Measurement in Education, 2, 79–96. doi: 10.1207/s15324818ame0201_5 First citation in articleCrossrefGoogle Scholar

  • Gilman, D. A. & Ferry, P. (1972). Increasing test reliability through self-scoring procedures. Journal of Educational Measurement, 9, 205–207. doi: 10.1111/j.1745-3984.1972.tb00953.x First citation in articleCrossrefGoogle Scholar

  • Haladyna, T. M. (1990). Effects of empirical option weighting on estimating domain scores and making pass/fail decisions. Applied Measurement in Education, 3, 231–244. doi: 10.1207/s15324818ame0303_2 First citation in articleCrossrefGoogle Scholar

  • Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Mahwah, NJ: Taylor & Francis. First citation in articleCrossrefGoogle Scholar

  • Hambleton, R. K., Roberts, D. M. & Traub, R. E. (1970). A comparison of the reliability and validity of two methods for assessing partial knowledge on a multiple-choice test. Journal of Educational Measurement, 7, 75–82. doi: 10.1111/j.1745-3984.1970.tb00698.x First citation in articleCrossrefGoogle Scholar

  • Hendrickson, G. F. (1971). The effect of differential option weighting on multiple-choice objective tests. Journal of Educational Measurement, 8, 291–296. doi: 10.1111/j.1745-3984.1971.tb00941.x First citation in articleCrossrefGoogle Scholar

  • Kane, M. & Moloney, J. (1978). The effect of guessing on item reliability under answer-until-correct scoring. Applied Psychological Measurement, 2, 41–49. doi: 10.1177/014662167800200104 First citation in articleCrossrefGoogle Scholar

  • Lau, P. N. K., Lau, S. H., Hong, K. S. & Usop, H. (2011). Guessing, partial knowledge, and misconceptions in multiple-choice tests. Educational Technology & Society, 14, 99–110. doi: 10.1037/e683312011-118 First citation in articleCrossrefGoogle Scholar

  • Liddell, F. D. K. (1983). Simplified exact analysis of case-referent studies: Matched pairs; dichotomous exposure. Journal of Epidemiology and Community Health, 37, 82–84. doi: 10.1136/jech.37.1.82 First citation in articleCrossrefGoogle Scholar

  • Lukas, J. (1997). Modellierung von Fehlkonzepten in einer algebraischen Wissensstruktur [Modeling misconceptions in an algebraic knowledge structure]. Kognitionswissenschaft, 6, 196–204. doi: 10.1007/s001970050042 First citation in articleCrossrefGoogle Scholar

  • Poizner, S. B., Nicewander, W. A. & Gettys, C. F. (1978). Alternative response and scoring methods for multiple-choice items: An empirical study of probabilistic and ordinal response modes. Applied Psychological Measurement, 2, 83–96. doi: 10.1177/014662167800200109 First citation in articleCrossrefGoogle Scholar

  • QuestBack. (2013). Unipark EFS Survey 9.1. Retrieved from http://www.unipark.de First citation in articleGoogle Scholar

  • R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project.org/ First citation in articleGoogle Scholar

  • Raffeld, P. (1975). The effects of Guttman weights on the reliability and predictive validity of objective tests when omissions are not differentially weighted. Journal of Educational Measurement, 12, 179–185. doi: 10.1111/j.1745-3984.1975.tb01020.x First citation in articleCrossrefGoogle Scholar

  • Reilly, R. R. & Jackson, R. (1973). Effects of empirical option weighting on reliability and validity of an academic aptitude test. Journal of Educational Measurement, 10, 185–193. doi: 10.1111/j.1745-3984.1973.tb00796.x First citation in articleCrossrefGoogle Scholar

  • Rosenthal, R. & Rubin, D. B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74, 166–169. doi: 10.1037/0022-0663.74.2.166 First citation in articleCrossrefGoogle Scholar

  • Sabers, D. L. & White, G. W. (1969). The effect of differential weighting of individual item responses on the predictive validity and reliability of an aptitude test. Journal of Educational Measurement, 6, 93–96. doi: 10.1111/j.1745-3984.1969.tb00664.x First citation in articleCrossrefGoogle Scholar

  • Stanley, J. C. & Wang, M. W. (1970). Weighting test items and test-item options, an overview of the analytical and empirical literature. Educational and Psychological Measurement, 30, 21–35. doi: 10.1177/001316447003000102 First citation in articleCrossrefGoogle Scholar

  • Wang, M. W. & Stanley, J. C. (1970). Differential weighting: A review of methods and empirical studies. Review of Educational Research, 40, 663–705. doi: 10.3102/00346543040005663 First citation in articleCrossrefGoogle Scholar

  • Waters, B. K. (1976). The measurement of partial knowledge: A comparison between two empirical option-weighting methods and rights-only scoring. The Journal of Educational Research, 69, 256–260. doi: 10.1080/00220671.1976.10884892 First citation in articleCrossrefGoogle Scholar

  • Wilcox, R. R. (1981). Solving measurement problems with an answer-until-correct scoring procedure. Applied Psychological Measurement, 5, 399–414. doi: 10.1177/014662168100500313 First citation in articleCrossrefGoogle Scholar