Skip to main content
Multistudy Report

Validity and Reliability of Automatically Generated Propositional Reasoning Items

A Multilingual Study of the Challenges of Verbal Item Generation

Published Online:https://doi.org/10.1027/1015-5759/a000616

Abstract. This study introduces a newly developed public-domain multilingual automatic item generator that creates propositional reasoning (PR) items belonging to 15 item families by using various inference rules. Psychometric properties of the resulting written PR test were investigated in three diverse samples in English, simplified Chinese, and German, respectively. Internal consistency was good to excellent across samples. The ICAR16 short form test of cognitive abilities (Condon & Revelle, 2014) was used to evaluate construct validity. Correlations of ICAR16 scores and PR scores were high. Furthermore, items within families appeared to be equivalent, with only minor differential item functioning between the Chinese- and English-speaking samples. Performance on the PR test was shown to be reasonably stable over the course of 1 week. Differences of total scores between test forms (pen and paper vs. computerized administration) were not detected. Findings suggest that the automatically generated PR test is a valuable instrument for the assessment of propositional reasoning ability.

References

  • Arendasy, M. E., & Sommer, M. (2005). The effect of different types of perceptual manipulations on the dimensionality of automatically generated figural matrices. Intelligence, 33, 307–324. https://doi.org/10.1016/j.intell.2005.02.002 First citation in articleCrossrefGoogle Scholar

  • Arendasy, M. E., & Sommer, M. (2010). Evaluating the contribution of different item features to the effect size of the gender difference in three-dimensional mental rotation using automatic item generation. Intelligence, 38, 574–581. https://doi.org/10.1016/j.intell.2010.06.004 First citation in articleCrossrefGoogle Scholar

  • Arendasy, M. E., Sommer, M., Gittler, G., & Hergovich, A. (2006). Automatic generation of quantitative reasoning items. Journal of Individual Differences, 27, 2–14. https://doi.org/10.1027/1614-0001.27.1.2 First citation in articleLinkGoogle Scholar

  • Arendasy, M. E., Sommer, M., & Mayr, F. (2012). Using automatic item generation to simultaneously construct German and English versions of a word fluency test. Journal of Cross-Cultural Psychology, 43, 464–479. https://doi.org/10.1177/0022022110397360 First citation in articleCrossrefGoogle Scholar

  • Baddeley, A. (1986). Working memory, Clarendon Press. First citation in articleGoogle Scholar

  • Barrouillet, P., & Lecas, J.-F. (1999). Mental models in conditional reasoning and working memory. Thinking & Reasoning, 5, 289–302. First citation in articleCrossrefGoogle Scholar

  • Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01 First citation in articleCrossrefGoogle Scholar

  • Becker, B. J. (1988). Synthesizing standardized mean-change measures. British Journal of Mathematical and Statistical Psychology, 41, 257–278. https://doi.org/10.1111/j.2044-8317.1988.tb00901.x First citation in articleCrossrefGoogle Scholar

  • Bejar, I. I. (2002). Generative testing: From conception to implementation. In S. H. IrvineP. C. KyllonenEds., Item generation for test development (pp. 199–217). Erlbaum. First citation in articleGoogle Scholar

  • Bejar, I. I., Lawless, R. R., Morley, M. E., Wagner, M. E., Bennett, R. E., & Revuelta, J. (2002). A feasibility study of on-the-fly item generation in adaptive testing. ETS Research Report Series, 2002, i–44. First citation in articleCrossrefGoogle Scholar

  • Blum, D., Holling, H., Galibert, M. S., & Forthmann, B. (2016). Task difficulty prediction of figural analogies. Intelligence, 56, 72–81. https://doi.org/10.1016/j.intell.2016.03.001 First citation in articleCrossrefGoogle Scholar

  • Bonett, D. G. (2002). Sample size requirements for testing and estimating coefficient alpha. Journal of Educational and Behavioral Statistics, 27, 335–340. https://doi.org/10.3102/10769986027004335 First citation in articleCrossrefGoogle Scholar

  • Braine, M. D. S. (1978). On the relation between the natural logic of reasoning and standard logic. Psychological Review, 85, 1–21. https://doi.org/10.1037/0033-295X.85.1.1 First citation in articleCrossrefGoogle Scholar

  • Carriedo, N., Elosúa, M. R., & García-Madruga, J. A. (2011). Working memory, text comprehension, and propositional reasoning: A new semantic anaphora wm test. The Spanish Journal of Psychology, 14, 37–49. https://doi.org/10.5209/rev_SJOP.2011.v14.n1.3 First citation in articleCrossrefGoogle Scholar

  • Carroll, J. B. (1993). Human cognitive abilities. Cambridge University Press. https://doi.org/10.1017/CBO9780511571312 First citation in articleCrossrefGoogle Scholar

  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 1–29. https://doi.org/10.18637/jss.v048.i06 First citation in articleCrossrefGoogle Scholar

  • Chalmers, R. P. (2015). Extended mixed-effects item response models with the MH-RM algorithm. Journal of Educational Measurement, 52, 200–222. First citation in articleCrossrefGoogle Scholar

  • Cheng, P. W., & Holyoak, K. J. (1985). Pragmatic reasoning schemas. Cognitive Psychology, 17, 391–416. https://doi.org/10.1016/0010-0285(85)90014-3 First citation in articleCrossrefGoogle Scholar

  • Cho, S.-J., de Boeck, P., Embretson, S., & Rabe-Hesketh, S. (2014). Additive multilevel item structure models with random residuals: Item modeling for explanation and item generation. Psychometrika, 79, 84–104. https://doi.org/10.1007/s11336-013-9360-2 First citation in articleCrossrefGoogle Scholar

  • Condon, D. M., & Revelle, W. (2014). The international cognitive ability resource: Development and initial validation of a public domain measure. Intelligence, 43, 52–64. https://doi.org/10.1016/j.intell.2014.01.004 First citation in articleCrossrefGoogle Scholar

  • Cosmides, L., & Tooby, J. (1989). Evolutionary psychology and the generation of culture, part II: Case study: A computational theory of social exchange. Ethology and Sociobiology, 10, 51–97. First citation in articleCrossrefGoogle Scholar

  • Cosmides, L., & Tooby, J. (1992). Cognitive adaptations for social exchange. In J. BarkowL. CosmidesJ. ToobyEds., The adapted mind (pp. 163–228). Oxford University Press. First citation in articleGoogle Scholar

  • Cosmides, L., & Tooby, J. (1997). Evolutionary psychology: A primer, Center for Evolutionary Psychology. First citation in articleGoogle Scholar

  • Embretson, S. (1999). Generating items during testing: Psychometric issues and models. Psychometrika, 64, 407–433. First citation in articleCrossrefGoogle Scholar

  • Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374. https://doi.org/10.1016/0001-6918(73)90003-6 First citation in articleCrossrefGoogle Scholar

  • Freund, P. A., Hofer, S., & Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195–210. https://doi.org/10.1177/0146621607306972 First citation in articleCrossrefGoogle Scholar

  • Freund, P. A., & Holling, H. (2011). How to get really smart: Modeling retest and training effects in ability testing using computer-generated figural matrix items. Intelligence, 39, 233–243. First citation in articleCrossrefGoogle Scholar

  • Gadzella, B. M., Stacks, J., Stephens, R. C., & Masten, W. G. (2005). Watson-Glaser critical thinking appraisal, form-s for education majors. Journal of Instructional Psychology, 32, 9–12. First citation in articleGoogle Scholar

  • García-Madruga, J. A., Gutiérrez, F., Carriedo, N., Luzón, J. M., & Vila, J. O. (2007). Mental models in propositional reasoning and working memory’s central executive. Thinking & Reasoning, 13, 370–393. https://doi.org/10.1080/13546780701203813 First citation in articleCrossrefGoogle Scholar

  • Geerlings, H., Glas, C. A. W., & van der Linden, W. J. (2011). Modeling rule-based item generation. Psychometrika, 76, 337–359. https://doi.org/10.1007/s11336-011-9204-x First citation in articleCrossrefGoogle Scholar

  • Gierl, M. J., & Lai, H. (2012). The role of item models in automatic item generation. International Journal of Testing, 12, 273–298. https://doi.org/10.1080/15305058.2011.635830 First citation in articleCrossrefGoogle Scholar

  • Glas, C. A. W., & van der Linden, W. J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261. https://doi.org/10.1177/0146621603027004001 First citation in articleCrossrefGoogle Scholar

  • Gómez-Veiga, I., Vila Chaves, J. O., Duque, G., & García Madruga, J. A. (2018). A new look to a classic issue: Reasoning and academic achievement at secondary school. Frontiers in Psychology, 9, Article 400. https://doi.org/10.3389/fpsyg.2018.00400 First citation in articleCrossrefGoogle Scholar

  • Grotjahn, R. (2010). Gesamtdarbietung, Einzeltextdarbietung, Zeitbegrenzung und Zeitdruck: Auswirkungen auf Item- und Testkennwerte und C-Test-Konstrukt [Overall presentation, single text presentation, time limitation and time pressure: Effects on item and test characteristics and C-test construct]. In R. GrotjahnEd., Der C-Test: Beiträge aus der aktuellen Forschung (pp. 265–296). Peter Lang. First citation in articleGoogle Scholar

  • Grotjahn, R., Schlak, T., & Aguado, K. (2010). S-C-Tests: Messung automatisierter sprachlicher Kompetenzen anhand von C-Tests mit massiver textspezifischer Zeitlimitierung [S-C-Tests: Measurement of automated linguistic competence using C-tests with massive text-specific time limitation]. In R. GrotjahnEd., Der C-Test: Beiträge aus der aktuellen Forschung, Peter Lang, 297–319. First citation in articleGoogle Scholar

  • Heine, S. (2017). Fremd- und Zweitsprachenlernerfolg und seine Erklärung durch Erwerbsalter, kognitive, affektiv-motivationale und sozio-kulturelle Variablen: Eine empirische Studie, [Foreign and second language learning success and its explanation by working age, cognitive, affective-motivational and socio-cultural variables: An empirical study]. Kassel University Press. (PhD thesis) https://dx.medra.org/10.19211/KUP9783737602730 First citation in articleGoogle Scholar

  • Holling, H., Bertling, J. P., & Zeuch, N. (2009). Automatic item generation of probability word problems. Studies in Educational Evaluation, 35, 71–76. https://doi.org/10.1016/j.stueduc.2009.10.004 First citation in articleCrossrefGoogle Scholar

  • Hox, J. J., Moerbeek, M., & van de Schoot, R. (2018). Multilevel analysis: Techniques and applications (3rd ed.). Routledge. First citation in articleGoogle Scholar

  • Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. https://doi.org/10.1080/10705519909540118 First citation in articleCrossrefGoogle Scholar

  • Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings (2nd ed.). Sage. First citation in articleCrossrefGoogle Scholar

  • Irvine, S. H., & Kyllonen, P. C. (2002). Item generation for test development, Routledge. https://doi.org/10.4324/9781410602145 First citation in articleGoogle Scholar

  • Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329–349. https://doi.org/10.1207/S15324818AME1404_2 First citation in articleCrossrefGoogle Scholar

  • Johnson-Laird, P. N. (1983). Mental models: Towards a cognitive science of language, inference, and consciousness, Cambridge University Press. First citation in articleGoogle Scholar

  • Johnson-Laird, P. N., Byrne, R. M. J., & Schaeken, W. (1992). Propositional reasoning by model. Psychological Review, 99, 418–439. https://doi.org/10.1037/0033-295X.99.3.418 First citation in articleCrossrefGoogle Scholar

  • Johnson-Laird, P. N., Legrenzi, P., & Legrenzi, M. S. (1972). Reasoning and a sense of reality. British Journal of Psychology, 63, 395–400. First citation in articleCrossrefGoogle Scholar

  • Johnson-Laird, P. N., & Savary, F. (1996). Illusory inferences about probabilities. Acta Psychologica, 93, 69–90. First citation in articleCrossrefGoogle Scholar

  • Klauer, K. C., Stegmaier, R., & Meiser, T. (1997). Working memory involvement in propositional and spatial reasoning. Thinking & Reasoning, 3, 9–47. First citation in articleCrossrefGoogle Scholar

  • Klimusová, H., & Květon, P. (2016). Psychometric properties of the learning potential test. Procedia – Social and Behavioral Sciences, 217, 652–656. https://doi.org/10.1016/j.sbspro.2016.02.089 First citation in articleCrossrefGoogle Scholar

  • Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82, 1–26. https://doi.org/10.18637/jss.v082.i13 First citation in articleCrossrefGoogle Scholar

  • Loe, B. S., Sun, L., Simonfy, F., & Doebler, P. (2018). Evaluating an automated number series item generator using linear logistic test models. Journal of Intelligence, 6, 1–25. https://doi.org/10.3390/jintelligence6020020 First citation in articleCrossrefGoogle Scholar

  • Lord, F. M. (1965). Item sampling in test theory and in research design, Educational Testing Service. (Technical report). First citation in articleCrossrefGoogle Scholar

  • Magis, D., Beland, S., Tuerlinckx, F., & de Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862. https://doi.org/10.3758/BRM.42.3.847 First citation in articleCrossrefGoogle Scholar

  • McDonald, R. P. (1999). Test theory: A unified treatment, Erlbaum. First citation in articleGoogle Scholar

  • McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37, 1–10. https://doi.org/10.1016/j.intell.2008.08.004 First citation in articleCrossrefGoogle Scholar

  • Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114, 449–458. https://doi.org/10.1037/0033-2909.114.3.449 First citation in articleCrossrefGoogle Scholar

  • Meiser, T., Klauer, K. C., & Naumer, B. (2001). Propositional reasoning and working memory: The role of prior training and pragmatic content. Acta Psychologica, 106, 303–327. https://doi.org/10.1016/S0001-6918(00)00055-X First citation in articleCrossrefGoogle Scholar

  • Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78, 691–692. https://doi.org/10.1093/biomet/78.3.691 First citation in articleCrossrefGoogle Scholar

  • O’Brien, D. P., Braine, M. D. S., & Yang, Y. (1994). Propositional reasoning by mental models? Simple to refute in principle and in practice. Psychological Review, 101, 711–724. https://doi.org/10.1037/0033-295X.101.4.711 First citation in articleCrossrefGoogle Scholar

  • Piburn, M. D. (1989). Reliability and validity of the propositional logic test. Educational and Psychological Measurement, 49, 667–672. https://doi.org/10.1177/001316448904900320 First citation in articleCrossrefGoogle Scholar

  • R Core Team. (2018). R: A language and environment for statistical computing (Version 3.5.0), R Foundation for Statistical Computing. https://www.R-project.org First citation in articleGoogle Scholar

  • Revelle, W. (2018). psych: Procedures for psychological, psychometric, and personality research (Version 1.8.4), Northwestern University. https://cran.R-project.org/package = psych First citation in articleGoogle Scholar

  • Revelle, W., Condon, D. M., Wilt, J., French, J. A., Brown, A., & Elleman, L. G. (2017). Web and phone based data collection using planned missing designs. In N. FieldingR. M. LeeG. BlankEds., The SAGE handbook of online research methods (2nd ed., pp. 578–595). Sage. First citation in articleGoogle Scholar

  • Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods for examining the personality-cognition link. In A. GruszkaG. MatthewsB. SzymuraEds., Handbook of individual differences in cognition: Attention, memory and executive control (pp. 27–49). Springer. https://doi.org/10.1007/978-1-4419-1210-7_2 First citation in articleGoogle Scholar

  • Rheinberg, F., Vollmeyer, R., & Burns, B. D. (2001). FAM: Ein Fragebogen zur Erfassung aktueller Motivation in Lern- und Leistungssituationen [FAM: A questionnaire to assess current motivation in learning and performance situations]. Diagnostica, 47, 57–66. https://doi.org/10.1026//0012-1924.47.2.57 First citation in articleLinkGoogle Scholar

  • Rijmen, F., & de Boeck, P. (2001). Propositional reasoning: The differential contribution of “rules” to the difficulty of complex reasoning problems. Memory & Cognition, 29, 165–175. https://doi.org/10.3758/BF03195750 First citation in articleCrossrefGoogle Scholar

  • Rijmen, F., & de Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271–285. https://doi.org/10.1177/0146621602026003003 First citation in articleCrossrefGoogle Scholar

  • Rips, L. J. (1983). Cognitive processes in propositional reasoning. Psychological Review, 90, 38–71. https://doi.org/10.1037/0033-295X.90.1.38 First citation in articleCrossrefGoogle Scholar

  • Roberge, J. J., & Flexer, B. K. (1979). Further examination of formal operational reasoning abilities. Child Development, 50, 478. https://doi.org/10.2307/1129426 First citation in articleCrossrefGoogle Scholar

  • Roberge, J. J., & Flexer, B. K. (1982). The formal operational reasoning test. The Journal of General Psychology, 106, 61–67. https://doi.org/10.1080/00221309.1982.9710973 First citation in articleCrossrefGoogle Scholar

  • Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. https://doi.org/10.18637/jss.v048.i02 First citation in articleCrossrefGoogle Scholar

  • Scharfen, J., Peters, J. M., & Holling, H. (2018). Retest effects in cognitive ability tests: A meta-analysis. Intelligence, 67, 44–66. https://doi.org/10.1016/j.intell.2018.01.003 First citation in articleCrossrefGoogle Scholar

  • Schneider, W. J., & McGrew, K. S. (2012). The Cattell-Horn-Carroll model of intelligence. In D. P. FlanaganP. L. HarrisonEds., Contemporary intellectual assessment: Theories, tests, and issues (pp. 99–144). Guildford Press. First citation in articleGoogle Scholar

  • Schroeders, U., & Wilhelm, O. (2010). Testing reasoning ability with handheld computers, notebooks, and paper and pencil. European Journal of Psychological Assessment, 26, 284–292. https://doi.org/10.1027/1015-5759/a000038 First citation in articleLinkGoogle Scholar

  • Sinharay, S., Johnson, M. S., & Williamson, D. M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313. https://doi.org/10.3102/10769986028004295 First citation in articleCrossrefGoogle Scholar

  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x First citation in articleCrossrefGoogle Scholar

  • Toms, M., Morris, N., & Ward, D. (1993). Working memory and conditional reasoning. The Quarterly Journal of Experimental Psychology Section A, 46, 679–699. First citation in articleCrossrefGoogle Scholar

  • van der Linden, W. J. (2016). Handbook of item response theory Three Volume Set. Chapman and Hall/CRC. First citation in articleCrossrefGoogle Scholar

  • Vollmeyer, R., & Rheinberg, F. (2006). Motivational effects on self-regulated learning with different tasks. Educational Psychology Review, 18, 239–253. https://doi.org/10.1007/s10648-006-9017-0 First citation in articleCrossrefGoogle Scholar

  • Wilhelm, O. (2000). Psychologie des schlussfolgernden Denkens: Differentialpsychologische Prüfung von Strukturüberlegungen [Psychology of reasoning: Differential psychological testing of structural considerations]. Kovac. First citation in articleGoogle Scholar

  • Wilhelm, O., & McKnight, P. E. (2002). Ability and achievement testing on the world wide web. In B. BatinicU.-D. ReipsM. BosnjakEds., Online social sciences (pp. 151–180). Hogrefe & Huber. First citation in articleGoogle Scholar

  • Zumbo, B. D., & Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF, Edgeworth Laboratory for Quantitative Behavioral Science, University of Northern British Columbia. (Working paper). First citation in articleGoogle Scholar