Can’t Make it Better nor Worse
An Empirical Study About the Effectiveness of General Rules of Item Construction on Psychometric Properties
Abstract
Abstract. Some of the most popular psychological questionnaires violate general rules of item construction: precise, positively keyed items without negations, multiple aspects of content, absolute statements, or vague quantifiers. To investigate if following these rules results in more desirable psychometric properties, 1,733 participants completed online either the original NEO Five-Factor Inventory, an “improved” version whose items follow the rules of item construction, or a “deteriorated” version whose items strongly violate these rules. We compared reliability estimates, item-total correlations, Confirmatory Factor Analysis (CFA) model fit, and fit to the partial credit model between the three versions. Neither of the manipulations resulted in considerable or consistent effects on any of the psychometric indices. Our results question the ability of standard analyses in test construction to distinguish good items from bad ones, as well as the effectiveness of general rules of item construction. To increase the reproducibility of psychological science, more focus should be laid on improving psychological measures.
References
2014). Criterion validity is maintained when items are evaluatively neutralized: Evidence from a full‐scale Five‐Factor Model Inventory. European Journal of Personality, 28, 620–633. https://doi.org/10.1002/per.1960
(2005). Simulation study on fit indexes in CFA based on data with slightly distorted simple structure. Structural Equation Modeling, 12, 41–75. https://doi.org/10.1207/s15328007sem1201_3
(2014). Item‐level Frame‐of‐reference Effects in Personality Testing: An investigation of incremental validity in an organizational setting. International Journal of Selection and Assessment, 22, 165–178. https://doi.org/10.1111/ijsa.12066
(1990). The insignificance of personality testing. Nature, 348, 671–672. https://doi.org/10.1038/348671a0
(2008). NEO-FFI: NEO-Fünf-Faktoren-Inventar nach Costa und McCrae (2 Aufl.)
([NEO Five-Factor Inventory, 2nd ed.] . Göttingen, Germany: Hogrefe.2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100, 431–449. https://doi.org/10.1037/a0038047
(2014). Exploratory structural equation modeling of personality data. Assessment, 21, 260–271. https://doi.org/10.1177/1073191114528029
(2002). Resampling methods in R: The boot package. R News, 2, 2–7.
(1992). Revised NEO Personality Inventory (NEO PI-R) and NEO Five-Factor Inventory (NEO FFI): Professional manual. Odessa, FL: Psychological Assessment Resources.
(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. https://doi.org/10.1007/BF02310555
(2015). Working through some issues. Significance, 12, 33–35. https://doi.org/10.1111/j.1740-9713.2015.00828.x
(2014). The Statistical Crisis in Science Data-dependent analysis – a “garden of forking paths” – explains why many statistically significant comparisons don’t hold up. American Scientist, 102, 460. https://doi.org/10.1511/2014.111.460
(2015). Facets of measurement error for scores of the Big Five: Three reliability generalizations. Personality and Individual Differences, 84, 84–89. https://doi.org/10.1016/j.paid.2014.08.019
(2003). Item-wording and the dimensionality of the Rosenberg Self-Esteem Scale: Do they matter? Personality and Individual Differences, 35, 1241–1254. https://doi.org/10.1016/S0191-8869(02)00331-8
(2015). Do the readability and average item length of personality scales affect their reliability? Journal of Individual Differences, 36, 54–63. https://doi.org/10.1027/1614-0001/a000154
(2011). Masking misfit in confirmatory factor analysis by increasing unique variances: A cautionary note on the usefulness of cutoff values of fit indices. Psychological Methods, 16, 319–336. https://doi.org/10.1037/a0024917
(2012). Sensitivity of SEM fit indexes with respect to violations of uncorrelated errors. Structural Equation Modeling: A Multidisciplinary Journal, 19, 36–50. https://doi.org/10.1080/10705511.2012.634710
(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6, 1–55. https://doi.org/10.1080/10705519909540118
(2000). Personality and job performance: The Big Five revisited. Journal of Applied Psychology, 85, 869–879. https://doi.org/10.1037/0021-9010.85.6.869
(2014). Data from investigating variation in replicability: A “Many Labs” Replication Project. Journal of Open Psychology Data, 2(1), e4. https://doi.org/10.5334/jopd.ad
(2015). Many Labs 2: Investigating variation in replicability across sample and setting. Charlottesville, VA: Center for Open Science. Retrieved from https://osf.io/8cd4r/
(2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569. https://doi.org/10.1146/annurev-psych-120710-100452
(2010). A new look at the big five factor structure through exploratory structural equation modeling. Psychological Assessment, 22, 471–491. https://doi.org/10.1037/a0019227
(1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/10.1007/BF02296272
(2005). The NEO–PI–3: A more readable revised NEO personality inventory. Journal of Personality Assessment, 84, 261–270. https://doi.org/10.1207/s15327752jpa8403_05
(2007). Brief versions of the NEO-PI-3. Journal of Individual Differences, 28, 116–128. https://doi.org/10.1027/1614-0001.28.3.116
(1970). The theoretical foundations of principal factor analysis, canonical factor analysis, and alpha factor analysis. British Journal of Mathematical and Statistical Psychology, 23, 1–21. https://doi.org/10.1111/j.2044-8317.1970.tb00432.x
(2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45, 137–141. https://doi.org/10.1027/1864-9335/a000192
(2015). Estimating the reproducibility of psychological science. Science, 349, 1–8. https://doi.org/10.1126/science.aac4716
(2012). Parameter recovery and model selection in mixed Rasch models. British Journal of Mathematical and Statistical Psychology, 65, 251–262. https://doi.org/10.1111/j.2044-8317.2011.02020.x
(2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/
. (2015). An overview of the psych package. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.190.7429&rep=rep1&type=pdf
(2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. https://doi.org/10.18637/jss.v048.i02
(1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271–282. https://doi.org/10.1177/014662169001400305
(1997).
(Applying the mixed Rasch model to personality questionnaires . In J. RostR. LangeheineEds., Applications of latent trait and latent class models in the social sciences (pp. 324–332). Münster, Germany: Waxmann.2011). “Global warming” or “climate change”? Whether the planet is warming depends on question wording. Public Opinion Quarterly, 75, 115–124. https://doi.org/10.1093/poq/nfq073
(1999). Self-reports: How the questions shape the answers. The American Psychologist, 54, 93. https://doi.org/10.1037/0003-066X.54.2.93
(2001). Asking questions about behavior: Cognition, communication, and questionnaire construction. American Journal of Evaluation, 22, 127–160. https://doi.org/10.1177/109821400102200202
(1996). Thinking about answers: The application of cognitive processes to survey methodology. San Francisco, CA: Jossey-Bass.
(2001). WINMIRA 2001. St. Paul, MN: Assessment Systems Corporation.
(2009). ggplot2: Elegant graphics for data analysis. New York, NY: Springer.
(2014). Comments on item selection procedures. European Journal of Psychological Assessment, 30, 1–2. https://doi.org/10.1027/1015-5759/a000196
(