Original Article

Empirical Option Weights Improve the Validity of a Multiple-Choice Knowledge Test

Birk Diedenhofen

Department of Experimental Psychology, University of Düsseldorf, Germany

Search for more papers by this author

and

Jochen Musch

Department of Experimental Psychology, University of Düsseldorf, Germany

Search for more papers by this author

Published Online:November 30, 2015https://doi.org/10.1027/1015-5759/a000295

Abstract

Abstract. Standard dichotomous scoring of multiple-choice test items grants no partial credit for partial knowledge. Empirical option weighting is an alternative, polychotomous scoring method that uses the point-biserial correlation between option choices and total score as a weight for each answer alternative. Extant studies demonstrate that the method increases reliability of multiple-choice tests in comparison to conventional scoring. Most previous studies employed a correlational validation approach, however, and provided mixed findings with regard to the validity of empirical option weighting. The present study is the first investigation using an experimental approach to determine the reliability and validity of empirical option weighting. To obtain an external validation criterion, we experimentally induced various degrees of knowledge in a domain of which participants had no knowledge. We found that in comparison to dichotomous scoring, empirical option weighting increased both reliability and validity of a multiple-choice knowledge test employing distractors that were appealing to test takers with different levels of knowledge. A potential application of the present results is the computation and publication of empirical option weights for existing multiple-choice knowledge tests that have previously been scored dichotomously.

References

Aust, F., Diedenhofen, B., Ullrich, S. & Musch, J. (2013). Seriousness checks are useful to improve data validity in online research. Behavioral Research Methods, 45, 527–535. doi: 10.3758/s13428-012-0265-2 First citation in article Crossref, Google Scholar
Carpenter, P. A., Just, M. A. & Shell, P. (1990). What one intelligence test measures: A theoretical account of the processing in the raven progressive matrices test. Psychological Review, 97, 404–431. doi: 10.1037/0033-295x.97.3.404 First citation in article Crossref, Google Scholar
Claudy, J. G. (1978). Biserial weights: A new approach to test item option weighting. Applied Psychological Measurement, 2, 25–30. doi: 10.1177/014662167800200102 First citation in article Crossref, Google Scholar
Cross, L. H. & Frary, R. B. (1978). Empirical choice weighting under “guess” and “do not guess” directions. Educational and Psychological Measurement, 38, 613–620. doi: 10.1177/001316447803800302 First citation in article Crossref, Google Scholar
Cross, L. H., Ross, F. K. & Geller, E. S. (1980). Using choice-weighted scoring of multiple-choice tests for determination of grades in college courses. The Journal of Experimental Educational, 48, 296–301. doi: 10.1080/00220973.1980.11011747 First citation in article Crossref, Google Scholar
Davis, F. B. & Fifer, G. (1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19, 159–170. doi: 10.1177/001316445901900202 First citation in article Crossref, Google Scholar
Diedenhofen, B. (2013). cocron: Statistical comparisons of two or more alpha coefficients. (Version 1.0-0). Retrieved from http://comparingcronbachalphas.org/ First citation in article Google Scholar
Downey, R. G. (1979). Item-option weighting of achievement tests: Comparative study of methods. Applied Psychological Measurement, 3, 453–461. doi: 10.1177/014662167900300403 First citation in article Crossref, Google Scholar
Echternacht, G. (1976). Reliability and validity of item option weighting schemes. Educational and Psychological Measurement, 36, 301–309. doi: 10.1177/001316447603600208 First citation in article Crossref, Google Scholar
Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3, 380–396. doi: 10.1037/1082-989x.3.3.380 First citation in article Crossref, Google Scholar
Feldt, L. S., Woodruff, D. J. & Salih, F. A. (1987). Statistical inference for coefficient alpha. Applied Psychological Measurement, 11, 93–103. doi: 10.1177/014662168701100107 First citation in article Crossref, Google Scholar
Frary, R. B. (1982). A simulation study of reliability and validity of multiple-choice test scores under six response-scoring modes. Journal of Educational and Behavioral Statistics, 7, 333–351. doi: 10.3102/10769986007004333 First citation in article Crossref, Google Scholar
Frary, R. B. (1989). Partial-credit scoring methods for multiple-choice tests. Applied Measurement in Education, 2, 79–96. doi: 10.1207/s15324818ame0201_5 First citation in article Crossref, Google Scholar
Gilman, D. A. & Ferry, P. (1972). Increasing test reliability through self-scoring procedures. Journal of Educational Measurement, 9, 205–207. doi: 10.1111/j.1745-3984.1972.tb00953.x First citation in article Crossref, Google Scholar
Haladyna, T. M. (1990). Effects of empirical option weighting on estimating domain scores and making pass/fail decisions. Applied Measurement in Education, 3, 231–244. doi: 10.1207/s15324818ame0303_2 First citation in article Crossref, Google Scholar
Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Mahwah, NJ: Taylor & Francis. First citation in article Crossref, Google Scholar
Hambleton, R. K., Roberts, D. M. & Traub, R. E. (1970). A comparison of the reliability and validity of two methods for assessing partial knowledge on a multiple-choice test. Journal of Educational Measurement, 7, 75–82. doi: 10.1111/j.1745-3984.1970.tb00698.x First citation in article Crossref, Google Scholar
Hendrickson, G. F. (1971). The effect of differential option weighting on multiple-choice objective tests. Journal of Educational Measurement, 8, 291–296. doi: 10.1111/j.1745-3984.1971.tb00941.x First citation in article Crossref, Google Scholar
Kane, M. & Moloney, J. (1978). The effect of guessing on item reliability under answer-until-correct scoring. Applied Psychological Measurement, 2, 41–49. doi: 10.1177/014662167800200104 First citation in article Crossref, Google Scholar
Lau, P. N. K., Lau, S. H., Hong, K. S. & Usop, H. (2011). Guessing, partial knowledge, and misconceptions in multiple-choice tests. Educational Technology & Society, 14, 99–110. doi: 10.1037/e683312011-118 First citation in article Crossref, Google Scholar
Liddell, F. D. K. (1983). Simplified exact analysis of case-referent studies: Matched pairs; dichotomous exposure. Journal of Epidemiology and Community Health, 37, 82–84. doi: 10.1136/jech.37.1.82 First citation in article Crossref, Google Scholar
Lukas, J. (1997). Modellierung von Fehlkonzepten in einer algebraischen Wissensstruktur [Modeling misconceptions in an algebraic knowledge structure]. Kognitionswissenschaft, 6, 196–204. doi: 10.1007/s001970050042 First citation in article Crossref, Google Scholar
Poizner, S. B., Nicewander, W. A. & Gettys, C. F. (1978). Alternative response and scoring methods for multiple-choice items: An empirical study of probabilistic and ordinal response modes. Applied Psychological Measurement, 2, 83–96. doi: 10.1177/014662167800200109 First citation in article Crossref, Google Scholar
QuestBack. (2013). Unipark EFS Survey 9.1. Retrieved from http://www.unipark.de First citation in article Google Scholar
R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project.org/ First citation in article Google Scholar
Raffeld, P. (1975). The effects of Guttman weights on the reliability and predictive validity of objective tests when omissions are not differentially weighted. Journal of Educational Measurement, 12, 179–185. doi: 10.1111/j.1745-3984.1975.tb01020.x First citation in article Crossref, Google Scholar
Reilly, R. R. & Jackson, R. (1973). Effects of empirical option weighting on reliability and validity of an academic aptitude test. Journal of Educational Measurement, 10, 185–193. doi: 10.1111/j.1745-3984.1973.tb00796.x First citation in article Crossref, Google Scholar
Rosenthal, R. & Rubin, D. B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74, 166–169. doi: 10.1037/0022-0663.74.2.166 First citation in article Crossref, Google Scholar
Sabers, D. L. & White, G. W. (1969). The effect of differential weighting of individual item responses on the predictive validity and reliability of an aptitude test. Journal of Educational Measurement, 6, 93–96. doi: 10.1111/j.1745-3984.1969.tb00664.x First citation in article Crossref, Google Scholar
Stanley, J. C. & Wang, M. W. (1970). Weighting test items and test-item options, an overview of the analytical and empirical literature. Educational and Psychological Measurement, 30, 21–35. doi: 10.1177/001316447003000102 First citation in article Crossref, Google Scholar
Wang, M. W. & Stanley, J. C. (1970). Differential weighting: A review of methods and empirical studies. Review of Educational Research, 40, 663–705. doi: 10.3102/00346543040005663 First citation in article Crossref, Google Scholar
Waters, B. K. (1976). The measurement of partial knowledge: A comparison between two empirical option-weighting methods and rights-only scoring. The Journal of Educational Research, 69, 256–260. doi: 10.1080/00220671.1976.10884892 First citation in article Crossref, Google Scholar
Wilcox, R. R. (1981). Solving measurement problems with an answer-until-correct scoring procedure. Applied Psychological Measurement, 5, 399–414. doi: 10.1177/014662168100500313 First citation in article Crossref, Google Scholar

Volume 33Issue 5September 2017

ISSN: 1015-5759eISSN: 2151-2426

History

AcceptedMay 7, 2015
Published onlineNovember 30, 2015

Licenses & Copyright

Keywords

PDF download

Verify Phone

Congrats!

Empirical Option Weights Improve the Validity of a Multiple-Choice Knowledge Test

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Empirical Option Weights Improve the Validity of a Multiple-Choice Knowledge Test

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners