On the Impact of the Response Options’ Position on Item Difficulty in Multiple-Choice-Items
Abstract
Abstract. The multiple-choice item format is widely used in test construction and Large-Scale Assessment. So far, there has been little research on the impact of the position of the solution among the response options and the few existing results are even inconsistent. Since it would be an easy way to create parallel items for group setting by altering the response options, the influence of the response options’ position on item difficulty should be examined. The Linear Logistic Test Model (Fischer, 1972) was used to analyze the data of 829 students aged 8–20 years, who worked on general knowledge items. It was found that the position of the solution among the response options has an influence on item difficulty. Items are easiest when the solution is in first place and more difficult when the solution is placed in a middle position or at the end of the set of response options.
References
1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140. https://doi.org/10.1007/BF02291180
(2005). Automatic generation of Rasch-calibrated items: Figural matrices test GEOM and endless-loops test EC. International Journal of Testing, 5, 197–224. https://doi.org/10.1207/s15327574ijt0503_2
(2012). Using automatic item generation to meet the increasing item demands of high-stakes educational and occupational assessment. Learning and Individual Differences, 22, 112–117. https://doi.org/10.1177/0022022110397360
(2006). Automatic generation of quantitative reasoning items: A pilot study. Journal of Individual Differences, 27, 2–14. https://doi.org/10.1027/1614-0001.27.1.2
(2003). Guess where: The position of correct answers in multiple-choice test items as a psychometric variable. Journal of Educational Measurement, 40(2), 109–128. https://doi.org/10.1111/j.1745-3984.2003.tb01099.x
(2002). Seek whence: Answer sequences and their consequences in key-balanced multiple-choice tests. The American Statistician, 56(4), 299–303. https://doi.org/10.1198/000313002623
(2016). Preventing of response elimination strategies improves the construct validity of figural matrices. Journal of Intelligence, 4(1), Article 2. https://doi.org/10.3390/jintelligence4010002
(1987). Open-ended versus multiple-choice response formats–it does make a difference for diagnostic purposes. Applied Psychological Measurement, 11, 385–395. https://doi.org/10.1177/014662168701100404
(1992). Effects of response format on diagnostic assessment of scholastic achievement. Applied Psychological Measurement, 16, 353–361. https://doi.org/10.1177/014662169201600406
(1980). A test of Lord’s assumption regarding examinee guessing behavior on multiple-choice tests using elementary school students. Journal of Educational Measurement, 17(2), 147–153.
(1993). To guess or not to guess: A decision-theoretic view of formula scoring. Journal of Educational Measurement, 30(4), 277–291.
(2001). A Multiple Choice Test that rewards partial knowledge. Journal of Further and Higher Education, 25, 157–163. https://doi.org/10.1080/03098770120050828
(1994). The effect of altering the position of options in a multiple-choice examination. Educational and Psychological Measurement, 54, 8–20. https://doi.org/10.1177/0013164494054001002
(2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50, 164–185. https://doi.org/10.1111/jedm.12009
(2011). Examination of the quality of multiple-choice items on classroom tests. The Canadian Journal for the Scholarship of Teaching and Learning, 2(2), 1–23. https://doi.org/10.5206/cjsotl-rcacea.2011.2.4
(1972). Conditional maximum-likelihood estimations of item parameters for a linear logistic test model, Psychological Institute University of Vienna. (Research Bulletin, 9).
(1974). Einführung in die Theorie psychologischer Tests. Grundlagen und Anwendungen
([Introduction of the theory of psychological tests – basics and applications] . Huber.Fischer, G. H.Molenaar, I. W. (Eds.). (1995). Rasch models. Foundations, recent developments, and applications, Springer.
1970). Algorithmen und Programme für das probabilistische Testmodell von Rasch
([Algorithms and programs for the probabilistic Rasch model] . Psychologische Beiträge, 12, 23–51.2017). Developing, analyzing, and using distractors for multiple-choice tests in education: A comprehensive review. Review of Educational Research, 87(6), 1082–1116. https://doi.org/10.3102/0034654317726529
(2009). Conspiracies and Test Compromise: An Evaluation of the Resistance of Test Systems to Small-scale Cheating. International Journal of Testing, 9(4), 283–309.
(2008). Analyzing position effects within reasoning items using the LLTM for structurally incomplete data. Psychology Science Quarterly, 50, 379–390.
(2002). A review of multiple-choice item-writing guidelines. Applied Measurement in Education, 15(3), 309–334. https://doi.org/10.1207/S15324818AME1503_5
(2012). A multilevel item response model for item position effects and individuals persistence. Psychological Test and Assessment Modeling, 54, 418–431.
(2017). Does the position of response options in multiple-choice tests matter? Psicológica, 38, 93–109.
(2009). On varying item difficulty by changing the response format for a mathematical competence test. Austrian Journal of Statistics, 38(4), 231–239.
(2008). Examining item-position effects in large-scale assessment using the linear Logistic Test Model. Psychology Science Quarterly, 20, 391–402.
(2011). Analysing item position effects due to test booklet deign within large-scale assessment. Educational Research and Evaluation, 17, 497–509.
(2000). Effects of response format on difficulty of SAT-mathematics items: It’s not the strategy. Journal of Educational Measurement, 37, 39–57. https://doi.org/10.1111/j.1745-3984.2000.tb01075.x
(2005). Psychological test calibration using the Rasch model – some critical suggestions on traditional approaches. International Journal of Testing, 5(4), 377–394. https://doi.org/10.1207/s15327574ijt0504_3
(2009). Applications of the Linear Logistic Test Model in psychometric research. Educational and Psychological Measurement, 69(2), 232–244. https://doi.org/10.1177/0013164408322021
(2007). Item difficulty of multiple choice tests dependant on different item response formats – an experiment in fundamental research on psychological assessment. Psychology Science, 49(4), 361–374.
(2019). Gruppentest zur Erfassung der Intelligenz auf Basis des AID (AID-G)
([Intelligence diagnosticum for group administration, AID-G] . Hogrefe.2011). Branched adaptive testing with a Rasch-Model-calibrated test: Analysing item presentation’s sequence effects using the Rasch-model-based LLTM. Educational Research and Evaluation, 17(5), 373–385. https://doi.org/10.1080/13803611.2011.630549
(2018). eRm: Extended Rasch modeling (Version 0.16–0). http://erm.r-forge.r-project.org/
(1945). The effect of choice placement on the difficulty of multiple choice questions. Journal of Educational Psychology, 36, 103–113.
(2008). Die verflixten Distraktoren. Über den Nutzen einer theoretischen Distraktorenanalyse bei Matrizentests (für besser Begabte und Hochbegabte)
([The darn distractors. About the benefits of a theorectic distractor analysis in matrices tests (for the highly talented and gifted)] . Diagnostica, 54(4), 193–201. https://doi.org/10.1026/0012-1924.54.4.1932018). R: A language and environment for statistical computing, R Foundation for Statistical Computing.
. (1980). Probabilistic models for some intelligence and attainment tests, University of Chicago Press.
(2017). PP: Estimation of person parameters for the 1, 2, 3, 4-PL model and the GPCM (R package version 0.6.1.), The R Foundation. https://cran.r-project.org/web/packages/pp
(1977). Formula scoring, number-right scoring and test-taking strategy. Journal of Educational Measurement, 14(1), 15–22. https://doi.org/10.1111/j.1745-3984.1977.tb00024.x
(2012). Investigating factors that influence item performance on ACS exams. Journal of Chemical Education, 89, 346–350.
(1954). Who is penalized by the penalty for guessing? The Journal of Educational Psychology, 45(2), 81–90. https://doi.org/10.1037/h0053756
(1975). Converging on correct answers: A peculiarity of multiple choice items. Journal of Educational Measurement, 19, 211–220. https://doi.org/10.1111/j.1745-3984.1982.tb00129.x
(2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66, 331–342. https://doi.org/10.1007/BF02294437
(2018). Construct your own response: The cube construction task as a novel format for the assessment of spatial ability. European Journal of Psychological Assessment, 34(5), 304–311. https://doi.org/10.1027/1015-5759/a000342
(2006). Eye-movement analysis strategic influences on intelligence. Intelligence, 34(3), 261–272. https://doi.org/10.1016/j.intell.2005.11.003
(2017). Item position effects are moderated by changes in test-taking effort. Applied Psychological Measurement, 41, 115–129. https://doi.org/10.1177/0146621616676791
(1970). Ideal multiple-choice items. Journal of the American Statistical Association, 65(329), 71–88. https://doi.org/10.1080/01621459.1970.10481063
(