Multistudy Report

Can’t Make it Better nor Worse

An Empirical Study About the Effectiveness of General Rules of Item Construction on Psychometric Properties

Florian Pargent

Department of Psychology, Psychological Methods and Assessment, Ludwig-Maximilians-Universität München, Germany

Search for more papers by this author

Sven Hilbert

Department of Psychology, Psychological Methods and Assessment, Ludwig-Maximilians-Universität München, Germany

Faculty of Psychology, Educational Science and Sport Science, University of Regensburg, Germany

Search for more papers by this author

Kathryn Eichhorn

Department of Psychology, Psychological Methods and Assessment, Ludwig-Maximilians-Universität München, Germany

Search for more papers by this author

, and

Markus Bühner

Department of Psychology, Psychological Methods and Assessment, Ludwig-Maximilians-Universität München, Germany

Search for more papers by this author

Published Online:May 03, 2018https://doi.org/10.1027/1015-5759/a000471

Abstract

Abstract. Some of the most popular psychological questionnaires violate general rules of item construction: precise, positively keyed items without negations, multiple aspects of content, absolute statements, or vague quantifiers. To investigate if following these rules results in more desirable psychometric properties, 1,733 participants completed online either the original NEO Five-Factor Inventory, an “improved” version whose items follow the rules of item construction, or a “deteriorated” version whose items strongly violate these rules. We compared reliability estimates, item-total correlations, Confirmatory Factor Analysis (CFA) model fit, and fit to the partial credit model between the three versions. Neither of the manipulations resulted in considerable or consistent effects on any of the psychometric indices. Our results question the ability of standard analyses in test construction to distinguish good items from bad ones, as well as the effectiveness of general rules of item construction. To increase the reproducibility of psychological science, more focus should be laid on improving psychological measures.

References

Bäckström, M., Björklund, F., & Larsson, M. R. (2014). Criterion validity is maintained when items are evaluatively neutralized: Evidence from a full‐scale Five‐Factor Model Inventory. European Journal of Personality, 28, 620–633. https://doi.org/10.1002/per.1960 First citation in article Crossref, Google Scholar
Beauducel, A., & Wittmann, W. W. (2005). Simulation study on fit indexes in CFA based on data with slightly distorted simple structure. Structural Equation Modeling, 12, 41–75. https://doi.org/10.1207/s15328007sem1201_3 First citation in article Crossref, Google Scholar
Bing, M. N., Davison, H. K., & Smothers, J. (2014). Item‐level Frame‐of‐reference Effects in Personality Testing: An investigation of incremental validity in an organizational setting. International Journal of Selection and Assessment, 22, 165–178. https://doi.org/10.1111/ijsa.12066 First citation in article Crossref, Google Scholar
Blinkhorn, S., & Johnson, C. (1990). The insignificance of personality testing. Nature, 348, 671–672. https://doi.org/10.1038/348671a0 First citation in article Crossref, Google Scholar
Borkenau, P., & Ostendorf, F. (2008). NEO-FFI: NEO-Fünf-Faktoren-Inventar nach Costa und McCrae (2 Aufl.) [NEO Five-Factor Inventory, 2nd ed.]. Göttingen, Germany: Hogrefe. First citation in article Google Scholar
Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100, 431–449. https://doi.org/10.1037/a0038047 First citation in article Crossref, Google Scholar
Booth, T., & Hughes, D. J. (2014). Exploratory structural equation modeling of personality data. Assessment, 21, 260–271. https://doi.org/10.1177/1073191114528029 First citation in article Crossref, Google Scholar
Canty, A. J. (2002). Resampling methods in R: The boot package. R News, 2, 2–7. First citation in article Google Scholar
Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO PI-R) and NEO Five-Factor Inventory (NEO FFI): Professional manual. Odessa, FL: Psychological Assessment Resources. First citation in article Google Scholar
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. https://doi.org/10.1007/BF02310555 First citation in article Crossref, Google Scholar
Gelman, A. (2015). Working through some issues. Significance, 12, 33–35. https://doi.org/10.1111/j.1740-9713.2015.00828.x First citation in article Crossref, Google Scholar
Gelman, A., & Loken, E. (2014). The Statistical Crisis in Science Data-dependent analysis – a “garden of forking paths” – explains why many statistically significant comparisons don’t hold up. American Scientist, 102, 460. https://doi.org/10.1511/2014.111.460 First citation in article Crossref, Google Scholar
Gnambs, T. (2015). Facets of measurement error for scores of the Big Five: Three reliability generalizations. Personality and Individual Differences, 84, 84–89. https://doi.org/10.1016/j.paid.2014.08.019 First citation in article Crossref, Google Scholar
Greenberger, E., Chen, C., Dmitrieva, J., & Farruggia, S. P. (2003). Item-wording and the dimensionality of the Rosenberg Self-Esteem Scale: Do they matter? Personality and Individual Differences, 35, 1241–1254. https://doi.org/10.1016/S0191-8869(02)00331-8 First citation in article Crossref, Google Scholar
Hamby, T., & Ickes, W. (2015). Do the readability and average item length of personality scales affect their reliability? Journal of Individual Differences, 36, 54–63. https://doi.org/10.1027/1614-0001/a000154 First citation in article Link, Google Scholar
Heene, M., Hilbert, S., Draxler, C., Ziegler, M., & Bühner, M. (2011). Masking misfit in confirmatory factor analysis by increasing unique variances: A cautionary note on the usefulness of cutoff values of fit indices. Psychological Methods, 16, 319–336. https://doi.org/10.1037/a0024917 First citation in article Crossref, Google Scholar
Heene, M., Hilbert, S., Freudenthaler, H. H., & Bühner, M. (2012). Sensitivity of SEM fit indexes with respect to violations of uncorrelated errors. Structural Equation Modeling: A Multidisciplinary Journal, 19, 36–50. https://doi.org/10.1080/10705511.2012.634710 First citation in article Crossref, Google Scholar
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6, 1–55. https://doi.org/10.1080/10705519909540118 First citation in article Crossref, Google Scholar
Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited. Journal of Applied Psychology, 85, 869–879. https://doi.org/10.1037/0021-9010.85.6.869 First citation in article Crossref, Google Scholar
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B. Jr., Bahník, S., Bernstein, M. J., … Brumbaugh, C. C. (2014). Data from investigating variation in replicability: A “Many Labs” Replication Project. Journal of Open Psychology Data, 2(1), e4. https://doi.org/10.5334/jopd.ad First citation in article Crossref, Google Scholar
Klein, R. A., Vianello, M., Hasselman, F., Adams, B. A., Adams, R. B., & Alper, S. (2015). Many Labs 2: Investigating variation in replicability across sample and setting. Charlottesville, VA: Center for Open Science. Retrieved from https://osf.io/8cd4r/ First citation in article Google Scholar
MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569. https://doi.org/10.1146/annurev-psych-120710-100452 First citation in article Crossref, Google Scholar
Marsh, H. W., Lüdtke, O., Muthén, B., Asparouhov, T., Morin, A. J., Trautwein, U., & Nagengast, B. (2010). A new look at the big five factor structure through exploratory structural equation modeling. Psychological Assessment, 22, 471–491. https://doi.org/10.1037/a0019227 First citation in article Crossref, Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/10.1007/BF02296272 First citation in article Crossref, Google Scholar
McCrae, R. R., Costa, P. T., & Martin, T. A. (2005). The NEO–PI–3: A more readable revised NEO personality inventory. Journal of Personality Assessment, 84, 261–270. https://doi.org/10.1207/s15327752jpa8403_05 First citation in article Crossref, Google Scholar
McCrae, R. R., & Costa, P. T. Jr. (2007). Brief versions of the NEO-PI-3. Journal of Individual Differences, 28, 116–128. https://doi.org/10.1027/1614-0001.28.3.116 First citation in article Link, Google Scholar
McDonald, R. P. (1970). The theoretical foundations of principal factor analysis, canonical factor analysis, and alpha factor analysis. British Journal of Mathematical and Statistical Psychology, 23, 1–21. https://doi.org/10.1111/j.2044-8317.1970.tb00432.x First citation in article Crossref, Google Scholar
Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45, 137–141. https://doi.org/10.1027/1864-9335/a000192 First citation in article Link, Google Scholar
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349, 1–8. https://doi.org/10.1126/science.aac4716 First citation in article Crossref, Google Scholar
Preinerstorfer, D., & Formann, A. K. (2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65, 251–262. https://doi.org/10.1111/j.2044-8317.2011.02020.x First citation in article Crossref, Google Scholar
R Core Team. (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/ First citation in article Google Scholar
Revelle, W. (2015). An overview of the psych package. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.190.7429&rep=rep1&type=pdf First citation in article Google Scholar
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. https://doi.org/10.18637/jss.v048.i02 First citation in article Crossref, Google Scholar
Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282. https://doi.org/10.1177/014662169001400305 First citation in article Crossref, Google Scholar
Rost, J., Carstensen, C., & von Davier, M. (1997). Applying the mixed Rasch model to personality questionnaires. In J. RostR. LangeheineEds., Applications of latent trait and latent class models in the social sciences (pp. 324–332). Münster, Germany: Waxmann. First citation in article Google Scholar
Schuldt, J. P., Konrath, S. H., & Schwarz, N. (2011). “Global warming” or “climate change”? Whether the planet is warming depends on question wording. Public Opinion Quarterly, 75, 115–124. https://doi.org/10.1093/poq/nfq073 First citation in article Crossref, Google Scholar
Schwarz, N. (1999). Self-reports: How the questions shape the answers. The American Psychologist, 54, 93. https://doi.org/10.1037/0003-066X.54.2.93 First citation in article Crossref, Google Scholar
Schwarz, N., & Oyserman, D. (2001). Asking questions about behavior: Cognition, communication, and questionnaire construction. American Journal of Evaluation, 22, 127–160. https://doi.org/10.1177/109821400102200202 First citation in article Crossref, Google Scholar
Sudman, S., Bradburn, N. M., & Schwarz, N. (1996). Thinking about answers: The application of cognitive processes to survey methodology. San Francisco, CA: Jossey-Bass. First citation in article Google Scholar
von Davier, M. (2001). WINMIRA 2001. St. Paul, MN: Assessment Systems Corporation. First citation in article Google Scholar
Wickham, H. (2009). ggplot2: Elegant graphics for data analysis. New York, NY: Springer. First citation in article Crossref, Google Scholar
Ziegler, M. (2014). Comments on item selection procedures. European Journal of Psychological Assessment, 30, 1–2. https://doi.org/10.1027/1015-5759/a000196 First citation in article Link, Google Scholar

Volume 35Issue 6November 2019

ISSN: 1015-5759eISSN: 2151-2426

History

ReceivedMay 31, 2016
RevisedSeptember 22, 2017
AcceptedSeptember 28, 2017
Published onlineMay 3, 2018

Licenses & Copyright

Keywords

Acknowledgments:

We thank Kathrin Schmidbauer and Valentina Seitz for their help with constructing the questionnaires and their great commitment in recruiting participants.

PDF download

Verify Phone

Congrats!

Can’t Make it Better nor Worse

An Empirical Study About the Effectiveness of General Rules of Item Construction on Psychometric Properties

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Can’t Make it Better nor Worse

An Empirical Study About the Effectiveness of General Rules of Item Construction on Psychometric Properties

Abstract

References

History

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners