Not Very Powerful
The Influence of Negations and Vague Quantifiers on the Psychometric Properties of Questionnaires
Abstract
Abstract. Several guidelines on how to construct questionnaire items exist, even though the literature lacks empirical evidence for their effectiveness. To investigate whether the addition of negations and vague quantifiers worsens the psychometric properties of an established questionnaire, 872 participants completed one version of the Positive and Negative Affect Schedule (PANAS) – the German original, a negated version, a version with vague quantifiers or a version with both negations and vague quantifiers. Reliability estimates, item-total correlations, Confirmatory Factor Analysis (CFA) model fit, and fit to the Partial Credit Model (PCM) were compared among the four conditions. No PANAS version was clearly superior as no systematic pattern in the psychometric properties was found. Our findings question the general applicability of the guidelines of item construction as well as the effectiveness of widely used statistical analyses assessing the quality of scales. The results should encourage researchers to put a stronger focus on careful item construction as relying on psychometric properties might not be sufficient to develop valid questionnaires.
References
1997). Psychology testing. Englewood Cliffs, NJ: Prentice-Hall International.
(1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140. https://doi.org/10.1007/BF02291180
(2013). An expanded derivation of the threshold structure of the polytomous Rasch model that dispels any “threshold disorder controversy”. Educational and Psychological Measurement, 73, 78–124. https://doi.org/10.1177/0013164412450877
(2000). Effects of stem and Likert response option reversals on survey internal consistency: If you feel the need, there is a better alternative to using those negatively worded stems. Educational and Psychological Measurement, 60, 361–370. https://doi.org/10.1177/00131640021970592
(1996). Response latency as a signal to question problems in survey research. Public Opinion Quarterly, 60, 390–399. https://doi.org/10.1086/297760
(2005). Simulation study on fit indexes in CFA based on data with slightly distorted simple structure. Structural Equation Modeling, 12, 41–75. https://doi.org/10.1207/s15328007sem1201_3
(2000). Modeling acquiescence in measurement models for two balanced sets of items. Structural Equation Modeling, 7, 608–628. https://doi.org/10.1207/S15328007SEM0704_5
(1985). Consistency in Interpretation of probabilistic phrases. Organizational Behavior and Human Decision Processes, 36, 391–405. https://doi.org/10.1016/0749-5978(85)90007-X
(2017). boot: Bootstrap R (S-Plus) functions, R package (Version 1.3.20). Retrieved from https://cran.r-project.org/web/packages/boot/boot.pdf
(2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14, 464–504. https://doi.org/10.1080/10705510701301834
(1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assessment Resources.
(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. https://doi.org/10.1007/BF02310555
(1964). Social desirability and acquiescence in response to personality items. Journal of Consulting Psychology, 28, 71–77. https://doi.org/10.1037/h0043753
(1974). Eysenck-Persönlichkeits-Inventar: EPI: Handanweisung für die Durchführung und Auswertung
([Eysenck-Personality Inventory: EPI: Manual for testing and analysis] . Oxford, UK: Verlag für Psychologie.2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8, 370–378. https://doi.org/10.1177/1948550617693063
(2017). An analysis of (dis)ordered categories, thresholds, and crossings in difference and divide-by-total IRT models for ordered responses. The Spanish Journal of Psychology, 20, E10. https://doi.org/10.1017/sjp.2017.11
(2015). Facets of measurement error for scores of the Big Five: Three reliability generalizations. Personality and Individual Differences, 84, 84–89. https://doi.org/10.1016/j.paid.2014.08.019
(1968). How often is often? American Psychologist, 23, 533–534. https://doi.org/10.1037/h0037716
(2011). Masking misfit in confirmatory factor analysis by increasing unique variances: A cautionary note on the usefulness of cutoff values of fit indices. Psychological Methods, 16, 319–336. https://doi.org/10.1037/a0024917
(1996). Reversed-polarity items and scale unidimensionality. Journal of the Academy of Marketing Science, 24, 366–374. https://doi.org/10.1177/0092070396244007
(2003). Urteilseffekte beim NEO-FFI
([Response sets measured with NEO-FFI] . Diagnostica, 49, 157–163. https://doi.org/10.1026//0012-1924.49.4.1571999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55. https://doi.org/10.1080/10705519909540118
(2014). MVN: An R package for assessing multivariate normality (Version 4.0). The R Journal, 6, 151–162. https://doi.org/10.32614/RJ-2014-031
(1996). Untersuchungen mit einer deutschen Version der “Positive and Negative Affect Schedule” (PANAS)
([Investigations with a German version of the Positive and Negative Affect Schedule (PANAS)] . Diagnostica, 42(2), 139–156. https://doi.org/10.1037/t49650-0001980). Eine Untersuchung zu sprachlichen Formulierungen der Items in deutschen Persönlichkeitsfragebogen
([An investigation of item wording in German personality questionnaires] . Zeitschrift für Differentielle und Diagnostische Psychologie, 1, 217–235.2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20, 1–20. Retrieved from http://www.jstatsoft.org/v20/i09. (Version 0.15-7)
(1996). Positive and negative global self-esteem: A substantively meaningful distinction or artifactors? Journal of Personality and Social Psychology, 70, 810–819. https://doi.org/10.1037/0022-3514.70.4.810
(1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. https://doi.org/10.1007/BF02296272
(1970). The theoretical foundations of principal factor analysis, canonical factor analysis, and alpha factor analysis. British Journal of Mathematical and Statistical Psychology, 23, 1–21. https://doi.org/10.1111/j.2044-8317.1970.tb00432.x
(2001). Wording effects in the Italian version of the Penn State Worry Questionnaire. Clinical Psychology and Psychotherapy, 8, 282–287. https://doi.org/10.1002/cpp.294
(1992).
(Context effects and the communicative functions of quantifiers: Implications for their use in attitude research . In N. SchwarzS. SudmanEds., Context effects in social and psychological research (pp. 279–296). New York, NY: Springer.1993). Prior expectation and the interpretation of natural language quantifiers. European Journal of Cognitive Psychology, 5, 73–91. https://doi.org/10.1080/09541449308406515
(2008, April). How often is “often” revisited: The meaning and linearity of vague quantifiers used on the National Survey of Student Engagement. Paper presented at the Annual Meeting of the American Educational Research Association, San Diego, CA
(2015). Dimensional structure of the Spanish version of the Positive and Negative Affect Schedule (PANAS) in adolescents and young adults. Psychological Assessment, 27, e1–e9. https://doi.org/10.1037/pas0000107
(2018). Can’t make it better nor worse: An empirical study about the effectiveness of general rules of item construction on psychometric properties. European Journal of Psychological Assessment, Advance online publication. https://doi.org/10.1027/1015-5759/a000471
(1951). The art of asking questions. Princeton, NJ: Princeton University Press.
(2008). Methods for assessing item, step, and threshold invariance in polytomous items following the partial credit model. Educational and Psychological Measurement, 68, 717–733. https://doi.org/10.1177/0013164407312602
(1974). Sometimes frequently means seldom: Context effects in the interpretation of quantitative expressions. Journal of Research in Personality, 8, 95–101. https://doi.org/10.1016/0092-6566(74)90049-X
(2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63, 539–569. https://doi.org/10.1146/annurev-psych-120710-100452
(2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. (Version 3.3.3). Retrieved from https://www.R-project.org/
. (2017). psych: Procedures for personality and psychological research. Evanston, IL, USA: Northwestern University. (Version 1.8.4). Retrieved from https://CRAN.R-project.org/package=psych
(1978). Empirische Studien zur Entwicklung von Antwortskalen für die sozialwissenschaftliche Forschung
([Empirical Studies of the development of rating scales in social sciences] . Zeitschrift für Sozialpsychologie, 9, 222–245.2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. Retrieved from http://www.jstatsoft.org/v48/i02/. (Version 0.5-22)
(1995).
(Mixture distribution Rasch models . In G. FischerI. MolenaarEds., Rasch models (pp. 257–268). New York, NY: Springer.1983). Some techniques for assessing multivariate normality based on the Shapiro-Wilk W. Applied Statistics, 32, 121–133. Retrieved from http://www.jstor.org/stable/2347291
(1991). Hardly ever or constantly? Group comparisons using vague quantifiers. Public Opinion Quarterly, 55, 395–423. https://doi.org/10.1086/269270
(1963). The MMPI in a German speaking population standardization report and methodological problems of cross-cultural interpretations. Acta Psychologica, 21, 265–273. https://doi.org/10.1016/0001-6918(63)90052-0
(2008). Assessing three sources of misresponse to reversed Likert items. Journal of Marketing Research, 45, 116–131. https://doi.org/10.1509/jmkr.45.1.116
(2001). WINMIRA 2001 (Version 1.45). St. Paul, MN: Assessment Systems Corporation.
(1995).
(Polytomous mixed Rasch models . In G. H. FischerI. W. MolenaarEds., Rasch models (pp. 371–379). New York, NY: Springer.2012). Misresponse to reversed and negated items in surveys: A review. Journal of Marketing Research, 49, 737–747. https://doi.org/10.1509/jmr.11.0368
(2014). Reversed thresholds in Partial Credit Models: A reason for collapsing categories? Assessment, 21, 765–774. https://doi.org/10.1177/1073191114530775
(1994). How much is “quite a bit”? Mapping between numerical values and vague quantifiers. Applied Cognitive Psychology, 8, 479–496. https://doi.org/10.1002/acp.2350080506
(2014). Comments on item selection procedures. European Journal of Psychological Assessment, 30, 1–2. https://doi.org/10.1027/1015-5759/a000196
(