Original Article

The Answer-Until-Correct Item Format Revisited

José Muñiz

Faculty of Psychology, University of Oviedo, Spain

Search for more papers by this author

and

Fernando Menéndez

Faculty of Psychology, University of Oviedo, Spain

Search for more papers by this author

Published Online:June 27, 2011https://doi.org/10.1027/1614-2241/a000028

Abstract

Current availability of computers has led to the use of a new series of response formats that are an alternative to the classical dichotomic format, and to the recovery of other formats, like the case of the answer-until-correct (AUC) format, whose efficient administration requires this kind of technology. The goal of the present study is to determine whether the use of the AUC format improves test reliability and validity in comparison to the classical dichotomic format. Three samples of 174, 431, and 1,446 Spanish students from secondary education, professional training, and high school, ages between 13 and 20 years, were used. A 100-item test and a 25-item test that assessed knowledge of Universal History were used, both tests administered by Internet with the AUC format. There were 56 experimental conditions, resulting from the manipulation of eight scoring models and seven test lengths. The data were analyzed from the perspective of the Classical Test Theory and also with Item Response Theory (IRT) models. Reliability and construct validity, analyzed from the classic perspective, did not seem to improve significantly when using the AUC format; however, when assessing reliability with the Information Function obtained by means of IRT models, the advantages of the AUC format versus the dichotomic format become clear. For low levels of the trait assessed, scores obtained with the AUC format provide more information than scores obtained with the dichotomic format. Lastly, these results are commented on, and the possibilities and limits of the AUC format in highly computerized psychological and educational contexts are analyzed.

References

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 357–374. First citation in article Crossref, Google Scholar
Ben-Simon, A. , Budescu, D. , & Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21, 65–88. First citation in article Crossref, Google Scholar
Bennett, R. E. , Rock, D. A. , Braun, H. I. , Frye, D. , Spohrer, J. C. , & Soloway, E. (1990). The relationship of expert-system scored constrained free-response items to multiple-choice and open-ended items. Applied Psychological Measurement, 14, 151–162. First citation in article Crossref, Google Scholar
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51. First citation in article Crossref, Google Scholar
Clariana, R. B. (1990). A comparison of answer-until-correct feedback and knowledge-of-correct-response feedback under two conditions of contextualization. Journal of Computer-Based Instruction, 17, 125–129. First citation in article Google Scholar
Clariana, R. B. , Ross, S. , & Morrison, G. (1991). The effects of different feedback strategies using computer-administered multiple-choice questions as instruction. Educational Technology Research and Development, 39, 5–17. First citation in article Crossref, Google Scholar
Crocker, L. , Algina, J. (1986). Introduction to classical and modern test theory. New York, NY: Holt, Rinehart & Winston. First citation in article Google Scholar
Choppin, B. H. (1982a). Educational measurement and the item bank model. In C. Lacey, D. Lawton, (Eds.), Issues in evaluation and accountability. London, UK: Methuen. First citation in article Google Scholar
Choppin, B. H. (1982b). Latent trait models for answer-until-correct tests, Methodology Project. Los Angeles, CA: California University. Center for the Study of Evaluation, Washington, DC. First citation in article Google Scholar
De Ayala, R. J. (1992). The nominal response model in computerized adaptive testing. Applied Psychological Measurement, 16, 327–343. First citation in article Crossref, Google Scholar
DiBattista, D. , Mitterer, J. O. , Grosse, L. (2004). Acceptance by undergraduates of the Immediate Feedback Assessment Technique for multiple-choice testing. Teaching in Higher Education, 9, 17–28. First citation in article Crossref, Google Scholar
Dodd, B. G. , De Ayala, R. R. , Koch, W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19, 5–22. First citation in article Crossref, Google Scholar
Downing, S. M. , Haladyna, T. M. (2006). Handbook of test development. Mahwah, NJ: Erlbaum. First citation in article Google Scholar
Eignor, D. (2007). Linking scores across computer and paper-based modes of test administration. In C. R. Rao, (Ed.), Handbook of statistics (Vol. 26, pp. 1099–1102). North Holland, Netherlands: Elsevier Science & Technology Books. First citation in article Google Scholar
García-Pérez, M. A. (1989). Item sampling guessing, partial information and decision-making in achievement testing. In E. R. Edward, (Ed.), Mathematical psychology in progress. New York: Springer. First citation in article Crossref, Google Scholar
Gilman, D. , Ferry, P. (1972). Increasing test reliability through self-scoring procedures. Journal of Educational Measurement, 9, 205–207. First citation in article Crossref, Google Scholar
Haladyna, T. , Downing, S. , & Rodríguez, M. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15, 309–334. First citation in article Crossref, Google Scholar
Hanna, G. S. (1975). Incremental reliability and validity of multiple choice tests with an answer-until-correct procedure. Journal of Educational Measurement, 12, 175–178. First citation in article Crossref, Google Scholar
Hirose, I. (2000). Answer-until-correct item response model with restricted number of responses. Japan Journal of Educational Technology, 24, 53–62. First citation in article Google Scholar
Holland, J. G. , Skinner, B. F. (1961). The analysis of behavior. New York, NY: McGraw-Hill. First citation in article Google Scholar
Hutchinson, T. P. (1982). Some theories of performance in multiple choice tests and their implications for variants of the tasks. British Journal of Mathematical and Statistical Psychology, 35, 71–89. First citation in article Crossref, Google Scholar
Hutchinson, T. P. (2001). Partial knowledge and answer-until-correct tasks in birds and humans. Biometrics, 57, 1251–1252. First citation in article Crossref, Google Scholar
Kane, M. , Moloney, J. (1978). The effect of guessing on item reliability under answer-until-correct scoring. Applied Psychological Measurement, 2, 41–49. First citation in article Crossref, Google Scholar
Keller, F. S. (1968). Goodbye, teacher … . Journal of Applied Behavior Analysis, 1, 79–89. First citation in article Crossref, Google Scholar
Linacre, J. M. (2006). Winsteps Rasch measurements computer program. Chicago, IL: Winsteps.com. First citation in article Google Scholar
López Pina, A. (2005). Ítems politómicos versus dicotómicos: Un estudio metodológico [Politomous vs. dichotomous items: A methodological study]. Anales de Psicología, 21, 339–344. First citation in article Google Scholar
Martin, M. O. , Mullis, I. V. , Foy, P. (2007). TIMSS 2007 International Science Report: Findings from IEA’s Trends in International Mathematics and Science Study at the fourth and eighth grades. Chestnut Hill, MA: TIMSS & PIRLS International Study Center. First citation in article Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. First citation in article Crossref, Google Scholar
Menéndez, L. , Hierro, P. , Muñiz, J. (2008). Actitudes hacia los tests informatizados aplicados por Internet con formato responder hasta acertar [Attitudes toward computerized testing with answer-until-correct format administered via Internet]. Acción Psicológica, 5, 25–36. First citation in article Google Scholar
Merino, S. C. , Lautenschlager, G. J. (2003). Comparación estadística de la confiabilidad alfa de Cronbach: Aplicaciones en la medición educacional y psicológica [Statistical comparisons of Cronbach alpha coefficient: Applications to the educational and psychological measurement]. Revista de Psicología de la Universidad de Chile, 12, 127–136. First citation in article Google Scholar
Meyer, J. P. , Wise, S. (2005). Including item response time in a distractor analysis via multivariate kernel smoothing. Paper presented at the 2006 meeting of the National Council on Measurement in Education. San Francisco, CA. First citation in article Google Scholar
Moreno, R. , Martínez, R. , Muñiz, J. (2004). Directrices para la construcción de ítems de elección múltiple [Guidelines for the development of multiple choice items]. Psicothema, 16, 490–497. First citation in article Google Scholar
Moreno, R. , Martínez, R. , Muñiz, J. (2006). New Guidelines for developing multiple-choice items. Methodology, 2, 65–72. First citation in article Link, Google Scholar
Muñiz, J. (2003). Teoría clásica de los tests. Madrid, Spain: Pirámide. First citation in article Google Scholar
Muñiz, J. , Fidalgo, A. , Cueto, G. , Martínez, R. , Moreno, R. (2005). Análisis de los ítems. Madrid, Spain: La Muralla. First citation in article Google Scholar
Muñiz, J. , García-Mendoza, A. (2002). La construcción de ítems de elección múltiple [Development of multiple choice items]. Metodología de las Ciencias del Comportamiento, Volumen especial, 416–422. First citation in article Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. First citation in article Crossref, Google Scholar
Osterlind, S. J. (1989). Constructing test items. Boston, MA: Kluwer Academic. First citation in article Crossref, Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research. First citation in article Google Scholar
Rojas, A. J. (2001). Pasado, presente y futuro de los Tests Adaptativos Informatizados: entrevista con Isaac I. Bejar [Past, present and future of computerized adaptive testing: Interview with I. Bejar]. Psicothema, 13, 685–690. First citation in article Google Scholar
Samejima, F. (1969). Estimation of ability using a response pattern of graded scores [Psychometrika Monograph, 17]. Richmond, VA: Psychometric Society. First citation in article Crossref, Google Scholar
Samejima, F. (1972). A general model for free-response data [Psychometrika Monograph, 18]. Richmond, VA: Psychometric Society. First citation in article Google Scholar
Samejima, F. (1997). Departure from normal assumptions: A promise for future psychometrics with substantive mathematical modelling. Psychometrika, 62, 471–493. First citation in article Crossref, Google Scholar
Shindell, S. (1964). Programed instruction and its usefulness for the health professions. American Journal of Public Health, 54, 982–990. First citation in article Crossref, Google Scholar
Shute, V. (2007). Focus on formative feedback. Princeton, NJ: ETS. First citation in article Crossref, Google Scholar
Sireci, S. , Zenisky, A. (2006). Innovative item formats in computer-based testing: In pursuit of improved construct representation. In S. M. Downing, T. M. Haladyna, (Eds.), Handbook of test development (pp. 329–347). Mahwah, NJ: Erlbaum. First citation in article Google Scholar
Skinner, B. F. (1954). The science of learning and the art of teaching. Harvard Educational Review, 24, 86–97. First citation in article Google Scholar
Taylor, J. , West, D. , Tinning, F. (1975). A examination of decision making based on a partial credit scoring system. Paper presented at the meeting of the National Council on Measurement in Education. Washington, DC. First citation in article Google Scholar
Thissen, D. (1991). MULTILOG, multiple categorical item analysis and test scoring using Item Response Theory. Chicago, IL: Scientific Software. First citation in article Google Scholar
Van der Linden, W. J. (2001). Constrained adaptive testing with shadow tests. In W. J. Van der Linden, C. A. Glas, (Eds.), Computerized adaptive testing: Theory and practice (pp. 42–47). North Holland, Netherlands: Kluwer Academic. First citation in article Google Scholar
Van der Linden, W. J. , Hambleton, R. K. (1997). Handbook of modern item response theory. New York, NY: Springer. First citation in article Crossref, Google Scholar
Wilcox, R. (1981). Solving measurement problems with an answer-until-correct scoring procedure. Applied Psychological Measurement, 5, 399–414. First citation in article Crossref, Google Scholar
Wilcox, R. (1982). Some new results on an answer-until-correct scoring procedure. Journal of Educational Measurement, 19, 67–74. First citation in article Crossref, Google Scholar
Wilcox, R. (1983). How do examinees behave when taking multiple-choice tests? Applied Psychological Measurement, 7, 239–240. First citation in article Crossref, Google Scholar
Zenisky, A. L. , Sireci, S. G. (2002). Technological innovations in large-scale assessment. Applied Measurement in Education, 15, 337–362. First citation in article Crossref, Google Scholar

Volume 7Issue 3January 2011

ISSN: 1614-1881eISSN: 1614-2241

Licenses & Copyright

Keywords

PDF download

Verify Phone

Congrats!

The Answer-Until-Correct Item Format Revisited

Abstract

References

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

The Answer-Until-Correct Item Format Revisited

Abstract

References

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners