Abstract. This study introduces a newly developed public-domain multilingual automatic item generator that creates propositional reasoning (PR) items belonging to 15 item families by using various inference rules. Psychometric properties of the resulting written PR test were investigated in three diverse samples in English, simplified Chinese, and German, respectively. Internal consistency was good to excellent across samples. The ICAR16 short form test of cognitive abilities (Condon & Revelle, 2014) was used to evaluate construct validity. Correlations of ICAR16 scores and PR scores were high. Furthermore, items within families appeared to be equivalent, with only minor differential item functioning between the Chinese- and English-speaking samples. Performance on the PR test was shown to be reasonably stable over the course of 1 week. Differences of total scores between test forms (pen and paper vs. computerized administration) were not detected. Findings suggest that the automatically generated PR test is a valuable instrument for the assessment of propositional reasoning ability.

References

Arendasy, M. E., & Sommer, M. (2005). The effect of different types of perceptual manipulations on the dimensionality of automatically generated figural matrices. Intelligence, 33, 307–324. https://doi.org/10.1016/j.intell.2005.02.002 First citation in article Crossref, Google Scholar
Arendasy, M. E., & Sommer, M. (2010). Evaluating the contribution of different item features to the effect size of the gender difference in three-dimensional mental rotation using automatic item generation. Intelligence, 38, 574–581. https://doi.org/10.1016/j.intell.2010.06.004 First citation in article Crossref, Google Scholar
Arendasy, M. E., Sommer, M., Gittler, G., & Hergovich, A. (2006). Automatic generation of quantitative reasoning items. Journal of Individual Differences, 27, 2–14. https://doi.org/10.1027/1614-0001.27.1.2 First citation in article Link, Google Scholar
Arendasy, M. E., Sommer, M., & Mayr, F. (2012). Using automatic item generation to simultaneously construct German and English versions of a word fluency test. Journal of Cross-Cultural Psychology, 43, 464–479. https://doi.org/10.1177/0022022110397360 First citation in article Crossref, Google Scholar
Baddeley, A. (1986). Working memory, Clarendon Press. First citation in article Google Scholar
Barrouillet, P., & Lecas, J.-F. (1999). Mental models in conditional reasoning and working memory. Thinking & Reasoning, 5, 289–302. First citation in article Crossref, Google Scholar
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01 First citation in article Crossref, Google Scholar
Becker, B. J. (1988). Synthesizing standardized mean-change measures. British Journal of Mathematical and Statistical Psychology, 41, 257–278. https://doi.org/10.1111/j.2044-8317.1988.tb00901.x First citation in article Crossref, Google Scholar
Bejar, I. I. (2002). Generative testing: From conception to implementation. In S. H. IrvineP. C. KyllonenEds., Item generation for test development (pp. 199–217). Erlbaum. First citation in article Google Scholar
Bejar, I. I., Lawless, R. R., Morley, M. E., Wagner, M. E., Bennett, R. E., & Revuelta, J. (2002). A feasibility study of on-the-fly item generation in adaptive testing. ETS Research Report Series, 2002, i–44. First citation in article Crossref, Google Scholar
Blum, D., Holling, H., Galibert, M. S., & Forthmann, B. (2016). Task difficulty prediction of figural analogies. Intelligence, 56, 72–81. https://doi.org/10.1016/j.intell.2016.03.001 First citation in article Crossref, Google Scholar
Bonett, D. G. (2002). Sample size requirements for testing and estimating coefficient alpha. Journal of Educational and Behavioral Statistics, 27, 335–340. https://doi.org/10.3102/10769986027004335 First citation in article Crossref, Google Scholar
Braine, M. D. S. (1978). On the relation between the natural logic of reasoning and standard logic. Psychological Review, 85, 1–21. https://doi.org/10.1037/0033-295X.85.1.1 First citation in article Crossref, Google Scholar
Carriedo, N., Elosúa, M. R., & García-Madruga, J. A. (2011). Working memory, text comprehension, and propositional reasoning: A new semantic anaphora wm test. The Spanish Journal of Psychology, 14, 37–49. https://doi.org/10.5209/rev_SJOP.2011.v14.n1.3 First citation in article Crossref, Google Scholar
Carroll, J. B. (1993). Human cognitive abilities. Cambridge University Press. https://doi.org/10.1017/CBO9780511571312 First citation in article Crossref, Google Scholar
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48, 1–29. https://doi.org/10.18637/jss.v048.i06 First citation in article Crossref, Google Scholar
Chalmers, R. P. (2015). Extended mixed-effects item response models with the MH-RM algorithm. Journal of Educational Measurement, 52, 200–222. First citation in article Crossref, Google Scholar
Cheng, P. W., & Holyoak, K. J. (1985). Pragmatic reasoning schemas. Cognitive Psychology, 17, 391–416. https://doi.org/10.1016/0010-0285(85)90014-3 First citation in article Crossref, Google Scholar
Cho, S.-J., de Boeck, P., Embretson, S., & Rabe-Hesketh, S. (2014). Additive multilevel item structure models with random residuals: Item modeling for explanation and item generation. Psychometrika, 79, 84–104. https://doi.org/10.1007/s11336-013-9360-2 First citation in article Crossref, Google Scholar
Condon, D. M., & Revelle, W. (2014). The international cognitive ability resource: Development and initial validation of a public domain measure. Intelligence, 43, 52–64. https://doi.org/10.1016/j.intell.2014.01.004 First citation in article Crossref, Google Scholar
Cosmides, L., & Tooby, J. (1989). Evolutionary psychology and the generation of culture, part II: Case study: A computational theory of social exchange. Ethology and Sociobiology, 10, 51–97. First citation in article Crossref, Google Scholar
Cosmides, L., & Tooby, J. (1992). Cognitive adaptations for social exchange. In J. BarkowL. CosmidesJ. ToobyEds., The adapted mind (pp. 163–228). Oxford University Press. First citation in article Google Scholar
Cosmides, L., & Tooby, J. (1997). Evolutionary psychology: A primer, Center for Evolutionary Psychology. First citation in article Google Scholar
Embretson, S. (1999). Generating items during testing: Psychometric issues and models. Psychometrika, 64, 407–433. First citation in article Crossref, Google Scholar
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374. https://doi.org/10.1016/0001-6918(73)90003-6 First citation in article Crossref, Google Scholar
Freund, P. A., Hofer, S., & Holling, H. (2008). Explaining and controlling for the psychometric properties of computer-generated figural matrix items. Applied Psychological Measurement, 32, 195–210. https://doi.org/10.1177/0146621607306972 First citation in article Crossref, Google Scholar
Freund, P. A., & Holling, H. (2011). How to get really smart: Modeling retest and training effects in ability testing using computer-generated figural matrix items. Intelligence, 39, 233–243. First citation in article Crossref, Google Scholar
Gadzella, B. M., Stacks, J., Stephens, R. C., & Masten, W. G. (2005). Watson-Glaser critical thinking appraisal, form-s for education majors. Journal of Instructional Psychology, 32, 9–12. First citation in article Google Scholar
García-Madruga, J. A., Gutiérrez, F., Carriedo, N., Luzón, J. M., & Vila, J. O. (2007). Mental models in propositional reasoning and working memory’s central executive. Thinking & Reasoning, 13, 370–393. https://doi.org/10.1080/13546780701203813 First citation in article Crossref, Google Scholar
Geerlings, H., Glas, C. A. W., & van der Linden, W. J. (2011). Modeling rule-based item generation. Psychometrika, 76, 337–359. https://doi.org/10.1007/s11336-011-9204-x First citation in article Crossref, Google Scholar
Gierl, M. J., & Lai, H. (2012). The role of item models in automatic item generation. International Journal of Testing, 12, 273–298. https://doi.org/10.1080/15305058.2011.635830 First citation in article Crossref, Google Scholar
Glas, C. A. W., & van der Linden, W. J. (2003). Computerized adaptive testing with item cloning. Applied Psychological Measurement, 27, 247–261. https://doi.org/10.1177/0146621603027004001 First citation in article Crossref, Google Scholar
Gómez-Veiga, I., Vila Chaves, J. O., Duque, G., & García Madruga, J. A. (2018). A new look to a classic issue: Reasoning and academic achievement at secondary school. Frontiers in Psychology, 9, Article 400. https://doi.org/10.3389/fpsyg.2018.00400 First citation in article Crossref, Google Scholar
Grotjahn, R. (2010). Gesamtdarbietung, Einzeltextdarbietung, Zeitbegrenzung und Zeitdruck: Auswirkungen auf Item- und Testkennwerte und C-Test-Konstrukt [Overall presentation, single text presentation, time limitation and time pressure: Effects on item and test characteristics and C-test construct]. In R. GrotjahnEd., Der C-Test: Beiträge aus der aktuellen Forschung (pp. 265–296). Peter Lang. First citation in article Google Scholar
Grotjahn, R., Schlak, T., & Aguado, K. (2010). S-C-Tests: Messung automatisierter sprachlicher Kompetenzen anhand von C-Tests mit massiver textspezifischer Zeitlimitierung [S-C-Tests: Measurement of automated linguistic competence using C-tests with massive text-specific time limitation]. In R. GrotjahnEd., Der C-Test: Beiträge aus der aktuellen Forschung, Peter Lang, 297–319. First citation in article Google Scholar
Heine, S. (2017). Fremd- und Zweitsprachenlernerfolg und seine Erklärung durch Erwerbsalter, kognitive, affektiv-motivationale und sozio-kulturelle Variablen: Eine empirische Studie, [Foreign and second language learning success and its explanation by working age, cognitive, affective-motivational and socio-cultural variables: An empirical study]. Kassel University Press. (PhD thesis) https://dx.medra.org/10.19211/KUP9783737602730 First citation in article Google Scholar
Holling, H., Bertling, J. P., & Zeuch, N. (2009). Automatic item generation of probability word problems. Studies in Educational Evaluation, 35, 71–76. https://doi.org/10.1016/j.stueduc.2009.10.004 First citation in article Crossref, Google Scholar
Hox, J. J., Moerbeek, M., & van de Schoot, R. (2018). Multilevel analysis: Techniques and applications (3rd ed.). Routledge. First citation in article Google Scholar
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. https://doi.org/10.1080/10705519909540118 First citation in article Crossref, Google Scholar
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings (2nd ed.). Sage. First citation in article Crossref, Google Scholar
Irvine, S. H., & Kyllonen, P. C. (2002). Item generation for test development, Routledge. https://doi.org/10.4324/9781410602145 First citation in article Google Scholar
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329–349. https://doi.org/10.1207/S15324818AME1404_2 First citation in article Crossref, Google Scholar
Johnson-Laird, P. N. (1983). Mental models: Towards a cognitive science of language, inference, and consciousness, Cambridge University Press. First citation in article Google Scholar
Johnson-Laird, P. N., Byrne, R. M. J., & Schaeken, W. (1992). Propositional reasoning by model. Psychological Review, 99, 418–439. https://doi.org/10.1037/0033-295X.99.3.418 First citation in article Crossref, Google Scholar
Johnson-Laird, P. N., Legrenzi, P., & Legrenzi, M. S. (1972). Reasoning and a sense of reality. British Journal of Psychology, 63, 395–400. First citation in article Crossref, Google Scholar
Johnson-Laird, P. N., & Savary, F. (1996). Illusory inferences about probabilities. Acta Psychologica, 93, 69–90. First citation in article Crossref, Google Scholar
Klauer, K. C., Stegmaier, R., & Meiser, T. (1997). Working memory involvement in propositional and spatial reasoning. Thinking & Reasoning, 3, 9–47. First citation in article Crossref, Google Scholar
Klimusová, H., & Květon, P. (2016). Psychometric properties of the learning potential test. Procedia – Social and Behavioral Sciences, 217, 652–656. https://doi.org/10.1016/j.sbspro.2016.02.089 First citation in article Crossref, Google Scholar
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82, 1–26. https://doi.org/10.18637/jss.v082.i13 First citation in article Crossref, Google Scholar
Loe, B. S., Sun, L., Simonfy, F., & Doebler, P. (2018). Evaluating an automated number series item generator using linear logistic test models. Journal of Intelligence, 6, 1–25. https://doi.org/10.3390/jintelligence6020020 First citation in article Crossref, Google Scholar
Lord, F. M. (1965). Item sampling in test theory and in research design, Educational Testing Service. (Technical report). First citation in article Crossref, Google Scholar
Magis, D., Beland, S., Tuerlinckx, F., & de Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862. https://doi.org/10.3758/BRM.42.3.847 First citation in article Crossref, Google Scholar
McDonald, R. P. (1999). Test theory: A unified treatment, Erlbaum. First citation in article Google Scholar
McGrew, K. S. (2009). CHC theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37, 1–10. https://doi.org/10.1016/j.intell.2008.08.004 First citation in article Crossref, Google Scholar
Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114, 449–458. https://doi.org/10.1037/0033-2909.114.3.449 First citation in article Crossref, Google Scholar
Meiser, T., Klauer, K. C., & Naumer, B. (2001). Propositional reasoning and working memory: The role of prior training and pragmatic content. Acta Psychologica, 106, 303–327. https://doi.org/10.1016/S0001-6918(00)00055-X First citation in article Crossref, Google Scholar
Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78, 691–692. https://doi.org/10.1093/biomet/78.3.691 First citation in article Crossref, Google Scholar
O’Brien, D. P., Braine, M. D. S., & Yang, Y. (1994). Propositional reasoning by mental models? Simple to refute in principle and in practice. Psychological Review, 101, 711–724. https://doi.org/10.1037/0033-295X.101.4.711 First citation in article Crossref, Google Scholar
Piburn, M. D. (1989). Reliability and validity of the propositional logic test. Educational and Psychological Measurement, 49, 667–672. https://doi.org/10.1177/001316448904900320 First citation in article Crossref, Google Scholar
R Core Team. (2018). R: A language and environment for statistical computing (Version 3.5.0), R Foundation for Statistical Computing. https://www.R-project.org First citation in article Google Scholar
Revelle, W. (2018). psych: Procedures for psychological, psychometric, and personality research (Version 1.8.4), Northwestern University. https://cran.R-project.org/package = psych First citation in article Google Scholar
Revelle, W., Condon, D. M., Wilt, J., French, J. A., Brown, A., & Elleman, L. G. (2017). Web and phone based data collection using planned missing designs. In N. FieldingR. M. LeeG. BlankEds., The SAGE handbook of online research methods (2nd ed., pp. 578–595). Sage. First citation in article Google Scholar
Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods for examining the personality-cognition link. In A. GruszkaG. MatthewsB. SzymuraEds., Handbook of individual differences in cognition: Attention, memory and executive control (pp. 27–49). Springer. https://doi.org/10.1007/978-1-4419-1210-7_2 First citation in article Google Scholar
Rheinberg, F., Vollmeyer, R., & Burns, B. D. (2001). FAM: Ein Fragebogen zur Erfassung aktueller Motivation in Lern- und Leistungssituationen [FAM: A questionnaire to assess current motivation in learning and performance situations]. Diagnostica, 47, 57–66. https://doi.org/10.1026//0012-1924.47.2.57 First citation in article Link, Google Scholar
Rijmen, F., & de Boeck, P. (2001). Propositional reasoning: The differential contribution of “rules” to the difficulty of complex reasoning problems. Memory & Cognition, 29, 165–175. https://doi.org/10.3758/BF03195750 First citation in article Crossref, Google Scholar
Rijmen, F., & de Boeck, P. (2002). The random weights linear logistic test model. Applied Psychological Measurement, 26, 271–285. https://doi.org/10.1177/0146621602026003003 First citation in article Crossref, Google Scholar
Rips, L. J. (1983). Cognitive processes in propositional reasoning. Psychological Review, 90, 38–71. https://doi.org/10.1037/0033-295X.90.1.38 First citation in article Crossref, Google Scholar
Roberge, J. J., & Flexer, B. K. (1979). Further examination of formal operational reasoning abilities. Child Development, 50, 478. https://doi.org/10.2307/1129426 First citation in article Crossref, Google Scholar
Roberge, J. J., & Flexer, B. K. (1982). The formal operational reasoning test. The Journal of General Psychology, 106, 61–67. https://doi.org/10.1080/00221309.1982.9710973 First citation in article Crossref, Google Scholar
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. https://doi.org/10.18637/jss.v048.i02 First citation in article Crossref, Google Scholar
Scharfen, J., Peters, J. M., & Holling, H. (2018). Retest effects in cognitive ability tests: A meta-analysis. Intelligence, 67, 44–66. https://doi.org/10.1016/j.intell.2018.01.003 First citation in article Crossref, Google Scholar
Schneider, W. J., & McGrew, K. S. (2012). The Cattell-Horn-Carroll model of intelligence. In D. P. FlanaganP. L. HarrisonEds., Contemporary intellectual assessment: Theories, tests, and issues (pp. 99–144). Guildford Press. First citation in article Google Scholar
Schroeders, U., & Wilhelm, O. (2010). Testing reasoning ability with handheld computers, notebooks, and paper and pencil. European Journal of Psychological Assessment, 26, 284–292. https://doi.org/10.1027/1015-5759/a000038 First citation in article Link, Google Scholar
Sinharay, S., Johnson, M. S., & Williamson, D. M. (2003). Calibrating item families and summarizing the results using family expected response functions. Journal of Educational and Behavioral Statistics, 28, 295–313. https://doi.org/10.3102/10769986028004295 First citation in article Crossref, Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x First citation in article Crossref, Google Scholar
Toms, M., Morris, N., & Ward, D. (1993). Working memory and conditional reasoning. The Quarterly Journal of Experimental Psychology Section A, 46, 679–699. First citation in article Crossref, Google Scholar
van der Linden, W. J. (2016). Handbook of item response theory Three Volume Set. Chapman and Hall/CRC. First citation in article Crossref, Google Scholar
Vollmeyer, R., & Rheinberg, F. (2006). Motivational effects on self-regulated learning with different tasks. Educational Psychology Review, 18, 239–253. https://doi.org/10.1007/s10648-006-9017-0 First citation in article Crossref, Google Scholar
Wilhelm, O. (2000). Psychologie des schlussfolgernden Denkens: Differentialpsychologische Prüfung von Strukturüberlegungen [Psychology of reasoning: Differential psychological testing of structural considerations]. Kovac. First citation in article Google Scholar
Wilhelm, O., & McKnight, P. E. (2002). Ability and achievement testing on the world wide web. In B. BatinicU.-D. ReipsM. BosnjakEds., Online social sciences (pp. 151–180). Hogrefe & Huber. First citation in article Google Scholar
Zumbo, B. D., & Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF, Edgeworth Laboratory for Quantitative Behavioral Science, University of Northern British Columbia. (Working paper). First citation in article Google Scholar

Volume 37Issue 4July 2021

ISSN: 1015-5759eISSN: 2151-2426

History

ReceivedApril 3, 2019
RevisedJuly 27, 2020
AcceptedJuly 29, 2020
Published onlineNovember 10, 2020

Licenses & Copyright

Keywords

Publication Ethics

: All methods carried out in Study 1 were reviewed by the IRB of Northwestern University (Synthetic Aperture Personality Assessment Project; Reference STU00202975). The study was determined to be exempt as it did not qualify as human subjects’ research.

Approval for Study 2 was obtained from Cambridge Judge Business School Departmental Ethics Review Group (Reference: 17/023).

Study 3 was approved by the Joint Ethics Committee of Departments 12-16 of TU Dortmund University (Reference: 2017-4).

Open Data:

All data and code are freely available at https://osf.io/xa9gw/.

PDF download

Funding:

The work of Fang Luo was supported by the Research on Key Technologies of Social Emergency Service System, National Key R&D Program of China (Grant No. 2018YFC0810600). The work by Daniela Gühne and Philipp Doebler was supported by grant DO 1789/1-1 from the German Research Foundation (DFG). The work of Luning Sun was supported by the Economic and Social Research Council, UK (ESRC; Grant No. ES/L016591/1).

Verify Phone

Congrats!

Validity and Reliability of Automatically Generated Propositional Reasoning Items

A Multilingual Study of the Challenges of Verbal Item Generation

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Validity and Reliability of Automatically Generated Propositional Reasoning Items

A Multilingual Study of the Challenges of Verbal Item Generation

Abstract

References

History

Licenses & Copyright

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners