Abstract
Abstract. Standard dichotomous scoring of multiple-choice test items grants no partial credit for partial knowledge. Empirical option weighting is an alternative, polychotomous scoring method that uses the point-biserial correlation between option choices and total score as a weight for each answer alternative. Extant studies demonstrate that the method increases reliability of multiple-choice tests in comparison to conventional scoring. Most previous studies employed a correlational validation approach, however, and provided mixed findings with regard to the validity of empirical option weighting. The present study is the first investigation using an experimental approach to determine the reliability and validity of empirical option weighting. To obtain an external validation criterion, we experimentally induced various degrees of knowledge in a domain of which participants had no knowledge. We found that in comparison to dichotomous scoring, empirical option weighting increased both reliability and validity of a multiple-choice knowledge test employing distractors that were appealing to test takers with different levels of knowledge. A potential application of the present results is the computation and publication of empirical option weights for existing multiple-choice knowledge tests that have previously been scored dichotomously.
References
2013). Seriousness checks are useful to improve data validity in online research. Behavioral Research Methods, 45, 527–535. doi: 10.3758/s13428-012-0265-2
(1990). What one intelligence test measures: A theoretical account of the processing in the raven progressive matrices test. Psychological Review, 97, 404–431. doi: 10.1037/0033-295x.97.3.404
(1978). Biserial weights: A new approach to test item option weighting. Applied Psychological Measurement, 2, 25–30. doi: 10.1177/014662167800200102
(1978). Empirical choice weighting under “guess” and “do not guess” directions. Educational and Psychological Measurement, 38, 613–620. doi: 10.1177/001316447803800302
(1980). Using choice-weighted scoring of multiple-choice tests for determination of grades in college courses. The Journal of Experimental Educational, 48, 296–301. doi: 10.1080/00220973.1980.11011747
(1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19, 159–170. doi: 10.1177/001316445901900202
(2013). cocron: Statistical comparisons of two or more alpha coefficients. (Version 1.0-0). Retrieved from http://comparingcronbachalphas.org/
(1979). Item-option weighting of achievement tests: Comparative study of methods. Applied Psychological Measurement, 3, 453–461. doi: 10.1177/014662167900300403
(1976). Reliability and validity of item option weighting schemes. Educational and Psychological Measurement, 36, 301–309. doi: 10.1177/001316447603600208
(1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3, 380–396. doi: 10.1037/1082-989x.3.3.380
(1987). Statistical inference for coefficient alpha. Applied Psychological Measurement, 11, 93–103. doi: 10.1177/014662168701100107
(1982). A simulation study of reliability and validity of multiple-choice test scores under six response-scoring modes. Journal of Educational and Behavioral Statistics, 7, 333–351. doi: 10.3102/10769986007004333
(1989). Partial-credit scoring methods for multiple-choice tests. Applied Measurement in Education, 2, 79–96. doi: 10.1207/s15324818ame0201_5
(1972). Increasing test reliability through self-scoring procedures. Journal of Educational Measurement, 9, 205–207. doi: 10.1111/j.1745-3984.1972.tb00953.x
(1990). Effects of empirical option weighting on estimating domain scores and making pass/fail decisions. Applied Measurement in Education, 3, 231–244. doi: 10.1207/s15324818ame0303_2
(2004). Developing and validating multiple-choice test items (3rd ed.). Mahwah, NJ: Taylor & Francis.
(1970). A comparison of the reliability and validity of two methods for assessing partial knowledge on a multiple-choice test. Journal of Educational Measurement, 7, 75–82. doi: 10.1111/j.1745-3984.1970.tb00698.x
(1971). The effect of differential option weighting on multiple-choice objective tests. Journal of Educational Measurement, 8, 291–296. doi: 10.1111/j.1745-3984.1971.tb00941.x
(1978). The effect of guessing on item reliability under answer-until-correct scoring. Applied Psychological Measurement, 2, 41–49. doi: 10.1177/014662167800200104
(2011). Guessing, partial knowledge, and misconceptions in multiple-choice tests. Educational Technology & Society, 14, 99–110. doi: 10.1037/e683312011-118
(1983). Simplified exact analysis of case-referent studies: Matched pairs; dichotomous exposure. Journal of Epidemiology and Community Health, 37, 82–84. doi: 10.1136/jech.37.1.82
(1997). Modellierung von Fehlkonzepten in einer algebraischen Wissensstruktur
([Modeling misconceptions in an algebraic knowledge structure] . Kognitionswissenschaft, 6, 196–204. doi: 10.1007/s0019700500421978). Alternative response and scoring methods for multiple-choice items: An empirical study of probabilistic and ordinal response modes. Applied Psychological Measurement, 2, 83–96. doi: 10.1177/014662167800200109
(2013). Unipark EFS Survey 9.1. Retrieved from http://www.unipark.de
. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project.org/
. (1975). The effects of Guttman weights on the reliability and predictive validity of objective tests when omissions are not differentially weighted. Journal of Educational Measurement, 12, 179–185. doi: 10.1111/j.1745-3984.1975.tb01020.x
(1973). Effects of empirical option weighting on reliability and validity of an academic aptitude test. Journal of Educational Measurement, 10, 185–193. doi: 10.1111/j.1745-3984.1973.tb00796.x
(1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74, 166–169. doi: 10.1037/0022-0663.74.2.166
(1969). The effect of differential weighting of individual item responses on the predictive validity and reliability of an aptitude test. Journal of Educational Measurement, 6, 93–96. doi: 10.1111/j.1745-3984.1969.tb00664.x
(1970). Weighting test items and test-item options, an overview of the analytical and empirical literature. Educational and Psychological Measurement, 30, 21–35. doi: 10.1177/001316447003000102
(1970). Differential weighting: A review of methods and empirical studies. Review of Educational Research, 40, 663–705. doi: 10.3102/00346543040005663
(1976). The measurement of partial knowledge: A comparison between two empirical option-weighting methods and rights-only scoring. The Journal of Educational Research, 69, 256–260. doi: 10.1080/00220671.1976.10884892
(1981). Solving measurement problems with an answer-until-correct scoring procedure. Applied Psychological Measurement, 5, 399–414. doi: 10.1177/014662168100500313
(