Skip to main content
Multistudy Report

Can’t Make it Better nor Worse

An Empirical Study About the Effectiveness of General Rules of Item Construction on Psychometric Properties

Published Online:https://doi.org/10.1027/1015-5759/a000471

Abstract. Some of the most popular psychological questionnaires violate general rules of item construction: precise, positively keyed items without negations, multiple aspects of content, absolute statements, or vague quantifiers. To investigate if following these rules results in more desirable psychometric properties, 1,733 participants completed online either the original NEO Five-Factor Inventory, an “improved” version whose items follow the rules of item construction, or a “deteriorated” version whose items strongly violate these rules. We compared reliability estimates, item-total correlations, Confirmatory Factor Analysis (CFA) model fit, and fit to the partial credit model between the three versions. Neither of the manipulations resulted in considerable or consistent effects on any of the psychometric indices. Our results question the ability of standard analyses in test construction to distinguish good items from bad ones, as well as the effectiveness of general rules of item construction. To increase the reproducibility of psychological science, more focus should be laid on improving psychological measures.

References

  • Bäckström, M., Björklund, F., & Larsson, M. R. (2014). Criterion validity is maintained when items are evaluatively neutralized: Evidence from a full‐scale Five‐Factor Model Inventory. European Journal of Personality, 28, 620–633. https://doi.org/10.1002/per.1960 First citation in articleCrossrefGoogle Scholar

  • Beauducel, A., & Wittmann, W. W. (2005). Simulation study on fit indexes in CFA based on data with slightly distorted simple structure. Structural Equation Modeling, 12, 41–75. https://doi.org/10.1207/s15328007sem1201_3 First citation in articleCrossrefGoogle Scholar

  • Bing, M. N., Davison, H. K., & Smothers, J. (2014). Item‐level Frame‐of‐reference Effects in Personality Testing: An investigation of incremental validity in an organizational setting. International Journal of Selection and Assessment, 22, 165–178. https://doi.org/10.1111/ijsa.12066 First citation in articleCrossrefGoogle Scholar

  • Blinkhorn, S., & Johnson, C. (1990). The insignificance of personality testing. Nature, 348, 671–672. https://doi.org/10.1038/348671a0 First citation in articleCrossrefGoogle Scholar

  • Borkenau, P., & Ostendorf, F. (2008). NEO-FFI: NEO-Fünf-Faktoren-Inventar nach Costa und McCrae (2 Aufl.) [NEO Five-Factor Inventory, 2nd ed.]. Göttingen, Germany: Hogrefe. First citation in articleGoogle Scholar

  • Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100, 431–449. https://doi.org/10.1037/a0038047 First citation in articleCrossrefGoogle Scholar

  • Booth, T., & Hughes, D. J. (2014). Exploratory structural equation modeling of personality data. Assessment, 21, 260–271. https://doi.org/10.1177/1073191114528029 First citation in articleCrossrefGoogle Scholar

  • Canty, A. J. (2002). Resampling methods in R: The boot package. R News, 2, 2–7. First citation in articleGoogle Scholar

  • Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO PI-R) and NEO Five-Factor Inventory (NEO FFI): Professional manual. Odessa, FL: Psychological Assessment Resources. First citation in articleGoogle Scholar

  • Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. https://doi.org/10.1007/BF02310555 First citation in articleCrossrefGoogle Scholar

  • Gelman, A. (2015). Working through some issues. Significance, 12, 33–35. https://doi.org/10.1111/j.1740-9713.2015.00828.x First citation in articleCrossrefGoogle Scholar

  • Gelman, A., & Loken, E. (2014). The Statistical Crisis in Science Data-dependent analysis – a “garden of forking paths” – explains why many statistically significant comparisons don’t hold up. American Scientist, 102, 460. https://doi.org/10.1511/2014.111.460 First citation in articleCrossrefGoogle Scholar

  • Gnambs, T. (2015). Facets of measurement error for scores of the Big Five: Three reliability generalizations. Personality and Individual Differences, 84, 84–89. https://doi.org/10.1016/j.paid.2014.08.019 First citation in articleCrossrefGoogle Scholar

  • Greenberger, E., Chen, C., Dmitrieva, J., & Farruggia, S. P. (2003). Item-wording and the dimensionality of the Rosenberg Self-Esteem Scale: Do they matter? Personality and Individual Differences, 35, 1241–1254. https://doi.org/10.1016/S0191-8869(02)00331-8 First citation in articleCrossrefGoogle Scholar

  • Hamby, T., & Ickes, W. (2015). Do the readability and average item length of personality scales affect their reliability? Journal of Individual Differences, 36, 54–63. https://doi.org/10.1027/1614-0001/a000154 First citation in articleLinkGoogle Scholar

  • Heene, M., Hilbert, S., Draxler, C., Ziegler, M., & Bühner, M. (2011). Masking misfit in confirmatory factor analysis by increasing unique variances: A cautionary note on the usefulness of cutoff values of fit indices. Psychological Methods, 16, 319–336. https://doi.org/10.1037/a0024917 First citation in articleCrossrefGoogle Scholar

  • Heene, M., Hilbert, S., Freudenthaler, H. H., & Bühner, M. (2012). Sensitivity of SEM fit indexes with respect to violations of uncorrelated errors. Structural Equation Modeling: A Multidisciplinary Journal, 19, 36–50. https://doi.org/10.1080/10705511.2012.634710 First citation in articleCrossrefGoogle Scholar

  • Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6, 1–55. https://doi.org/10.1080/10705519909540118 First citation in articleCrossrefGoogle Scholar

  • Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited. Journal of Applied Psychology, 85, 869–879. https://doi.org/10.1037/0021-9010.85.6.869 First citation in articleCrossrefGoogle Scholar

  • Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B. Jr., Bahník, S., Bernstein, M. J., … Brumbaugh, C. C. (2014). Data from investigating variation in replicability: A “Many Labs” Replication Project. Journal of Open Psychology Data, 2(1), e4. https://doi.org/10.5334/jopd.ad First citation in articleCrossrefGoogle Scholar

  • Klein, R. A., Vianello, M., Hasselman, F., Adams, B. A., Adams, R. B., & Alper, S. (2015). Many Labs 2: Investigating variation in replicability across sample and setting. Charlottesville, VA: Center for Open Science. Retrieved from https://osf.io/8cd4r/ First citation in articleGoogle Scholar

  • MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569. https://doi.org/10.1146/annurev-psych-120710-100452 First citation in articleCrossrefGoogle Scholar

  • Marsh, H. W., Lüdtke, O., Muthén, B., Asparouhov, T., Morin, A. J., Trautwein, U., & Nagengast, B. (2010). A new look at the big five factor structure through exploratory structural equation modeling. Psychological Assessment, 22, 471–491. https://doi.org/10.1037/a0019227 First citation in articleCrossrefGoogle Scholar

  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/10.1007/BF02296272 First citation in articleCrossrefGoogle Scholar

  • McCrae, R. R., Costa, P. T., & Martin, T. A. (2005). The NEO–PI–3: A more readable revised NEO personality inventory. Journal of Personality Assessment, 84, 261–270. https://doi.org/10.1207/s15327752jpa8403_05 First citation in articleCrossrefGoogle Scholar

  • McCrae, R. R., & Costa, P. T. Jr. (2007). Brief versions of the NEO-PI-3. Journal of Individual Differences, 28, 116–128. https://doi.org/10.1027/1614-0001.28.3.116 First citation in articleLinkGoogle Scholar

  • McDonald, R. P. (1970). The theoretical foundations of principal factor analysis, canonical factor analysis, and alpha factor analysis. British Journal of Mathematical and Statistical Psychology, 23, 1–21. https://doi.org/10.1111/j.2044-8317.1970.tb00432.x First citation in articleCrossrefGoogle Scholar

  • Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45, 137–141. https://doi.org/10.1027/1864-9335/a000192 First citation in articleLinkGoogle Scholar

  • Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349, 1–8. https://doi.org/10.1126/science.aac4716 First citation in articleCrossrefGoogle Scholar

  • Preinerstorfer, D., & Formann, A. K. (2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65, 251–262. https://doi.org/10.1111/j.2044-8317.2011.02020.x First citation in articleCrossrefGoogle Scholar

  • R Core Team. (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/ First citation in articleGoogle Scholar

  • Revelle, W. (2015). An overview of the psych package. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.190.7429&rep=rep1&type=pdf First citation in articleGoogle Scholar

  • Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. https://doi.org/10.18637/jss.v048.i02 First citation in articleCrossrefGoogle Scholar

  • Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282. https://doi.org/10.1177/014662169001400305 First citation in articleCrossrefGoogle Scholar

  • Rost, J., Carstensen, C., & von Davier, M. (1997). Applying the mixed Rasch model to personality questionnaires. In J. RostR. LangeheineEds., Applications of latent trait and latent class models in the social sciences (pp. 324–332). Münster, Germany: Waxmann. First citation in articleGoogle Scholar

  • Schuldt, J. P., Konrath, S. H., & Schwarz, N. (2011). “Global warming” or “climate change”? Whether the planet is warming depends on question wording. Public Opinion Quarterly, 75, 115–124. https://doi.org/10.1093/poq/nfq073 First citation in articleCrossrefGoogle Scholar

  • Schwarz, N. (1999). Self-reports: How the questions shape the answers. The American Psychologist, 54, 93. https://doi.org/10.1037/0003-066X.54.2.93 First citation in articleCrossrefGoogle Scholar

  • Schwarz, N., & Oyserman, D. (2001). Asking questions about behavior: Cognition, communication, and questionnaire construction. American Journal of Evaluation, 22, 127–160. https://doi.org/10.1177/109821400102200202 First citation in articleCrossrefGoogle Scholar

  • Sudman, S., Bradburn, N. M., & Schwarz, N. (1996). Thinking about answers: The application of cognitive processes to survey methodology. San Francisco, CA: Jossey-Bass. First citation in articleGoogle Scholar

  • von Davier, M. (2001). WINMIRA 2001. St. Paul, MN: Assessment Systems Corporation. First citation in articleGoogle Scholar

  • Wickham, H. (2009). ggplot2: Elegant graphics for data analysis. New York, NY: Springer. First citation in articleCrossrefGoogle Scholar

  • Ziegler, M. (2014). Comments on item selection procedures. European Journal of Psychological Assessment, 30, 1–2. https://doi.org/10.1027/1015-5759/a000196 First citation in articleLinkGoogle Scholar