The Effect of Instructions on Multiple-Choice Test Scores
Abstract
Summary: Most standardized tests instruct subjects to guess under scoring procedures that do not correct for guessing or correct only for expected random guessing. Other scoring rules, such as offering a small reward for omissions or punishing errors by discounting more than expected from random guessing, have been proposed. This study was designed to test the effects of these four instruction/scoring conditions on performance indicators and on score reliability of multiple-choice tests. Some 240 participants were randomly assigned to four conditions differing in how much they discourage guessing. Subjects performed two psychometric computerized tests, which differed only in the instructions provided and the associated scoring procedure. For both tests, our hypotheses predicted (0) an increasing trend in omissions (showing that instructions were effective); (1) decreasing trends in wrong and right responses; (2) an increase in reliability estimates of both number right and scores. Predictions regarding performance indicators were mostly fulfilled, but expected differences in reliability failed to appear. The discussion of results takes into account not only psychometric issues related to guessing, but also the misleading educational implications of recommendations to guess in testing contexts.
References
References
Albanese, M.A. (1988). The projected impact of the correction for guessing on individual scores. Journal of Educational Measurement, 25, 225– 235Bar-Hillel, M. Wagenaar, W.A. (1993). The perception of randomness. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences. Methodological issues (pp. 369-393). London: LEABen-Shakhar, G. Sinai, Y. (1991). Gender differences in multiple-choice tests: The role of differential guessing tendencies. Journal of Educational Measurement, 28, 23– 35Budescu, D. Bar-Hillel, M. (1993). To guess or not to guess: A decision theoretic view of formula scoring. Journal of Educational Measurement, 30, 277– 291Cizek, G.J. O'Day, D.M. (1994). Further investigation of nonfunctioning options in multiple-choice test items. Educational and Psychological Measurement, 54, 861– 872Delgado, A.R. Prieto, G. (1996). Sex differences in visuospatial ability: Do performance factors play such an important role?. Memory & Cognition, 24, 504– 510Delgado, A.R. Prieto, G. (1997). Further evidence favoring three-option items in multiple-choice tests. European Journla of Psychological Assessment, 14, 197– 201Gigerenzer, G. Hoffrage, U. Kleinbolting, H. (1991). Probabilistic mental models: A brunswikian theory of confidence. Psychological Review, 98, 506– 528Haladyna, T.M. (1994). Developing and validating multiple-choice test items . Hillsdale, NJ: LEAHunt, E.B. (1995). Will we be smart enough? . New York: Russell Sage FoundationHunter, J.E. Schmidt, F.L. (1990). Methods of Meta-Analysis . London: SageKoriat, A. Goldsmith, M. (1996). Monitoring and control processes in the strategic regulation of memory accuracy. Psychological Review, 103, 490– 517LeVine, R.A. (1980). Anthropology and child development. New Directions for Child Development, 8, 71– 86Lichtenstein, S. Fischhoff, B. (1980). Training for calibration. Organizational Behavior and Human Performance, 26, 141– 171Mattson, D. (1965). The effects of guessing on the standard error of measurement and the reliability of test scores. Educational and Psychological measurement, 24, 745– 747Rogers, W.T. Bateson, D.J. (1991). The influence of test-wiseness on performance of high-school seniors on school leaving examinations. Applied Measurement in Education, 4, 159– 183Rogers, W.T. Yang, P. (1996). Test-wiseness: Its nature and application. European Journal of Psychological Assessment, 12, 247– 259Rosenthal, R. Rosnow, R.L. (1991). Essentials of behavioral research: Methods and data analysis (2nd ed.). London: McGraw-HillRosenthal, R. Rubin, D.B. (1982). Comparing effect sizes of independent studies. Psychological Bulletin, 92, 500– 5041989). Aptitudes Mentales Primarias . Madrid: Author
(Thorndike, R.L. (1971). Educational measurement . Washington, DC: American Council on EducationThurstone, L.L. (1919). A method for scoring tests. Psychological Bulletin, 16, 235– 240Thurstone, L.L. (1939). Manual of instructions for the Primary Mental Abilities Tests . Washington, DC: American Council of EducationThurstone, T.G. Thurstone, L.L. (1949). Mechanical aptitude II; description of group tests . Chicago, IL: Univ. of Chicago, Psychometric Laboratory, Report 54, March, 1949Traub, R.E. Hambleton, R.K. (1972). The effect of scoring instructions and degree of speedness on the validity and reliability of multiple-choice tests. Educational and Psychological Measurement, 32, 737– 758Traub, R.E. Hambleton, R.K. Singh, B. (1969). Effects of promised reward and threatened penalty on performance of a multiple-choice vocabulary test. Educational and Psychological Measurement, 29, 847– 861Tversky, A. Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76, 105– 110Wolf, D.P. (1993). Assessment as an episode of learning. In R.E. Bennett & W.C. Ward (Eds.), Construction versus choice in cognitive measurement (213-240). London: LEAYela, M. (1968). Rotación de figuras macizas. Manual . Madrid: TEA