Skip to main content
Free AccessEditorial

Comments on Item Selection Procedures

Published Online:https://doi.org/10.1027/1015-5759/a000196

Over the last 12 months I have had the pleasure to read some extraordinarily good papers that were submitted to the European Journal of Psychological Assessment. Oftentimes, my overly positive impression was mirrored by the reviews obtained for these papers. However, sometimes the submissions are not only impressive and sound; sometimes the potential of a paper is visible but buried beneath methodological and analytical problems. From time to time I would like to use the editorial pages and provide an overview of issues often raised by reviewers and used as reasons to criticize or even reject a submission. My goal is to patronize; rather, my aim is to address some reoccurring issues in the hopes of eliminating them in the future and thereby improving the journal’s overall quality. In this Editorial I want to focus on the process of item selection.

To Explore or to Confirm?

Using modern software, it has become fairly easy to test the factorial validity of a measure. In fact, as the study by Alonso-Arbiol and van de Vijver (2010) shows, 40% of the papers published between 2005 and 2009 reported findings from a factor analysis. My personal impression is that this number has not decreased since then. Of course, factorial validity is important, and such findings are of interest to both researchers and practitioners. When a new measure is introduced or an existing one adapted, factor analysis is sometimes used to select items. The first important question here is always whether an exploratory or confirmatory approach is used. In general, exploratory factor analyses (EFA) are used when the number of factors within a measure is unknown. Also, the allocation of items to factors needs to be determined. Thus, EFA is appropriate when little is known about a measure (Mussel, Spengler, Litman, & Schuler, 2012). In most cases, however, there is a clear idea about the number of factors as well as the relationships between factors and items. Then it is more appropriate to employ a confirmatory approach to test these assumptions. Schulze and Roberts (2006) showed how item selection can be done within a confirmatory factor analytical approach. My impression is that many reviewers have grown impatient with the incorrect use of factor analytical methods, leading to outright rejections or at least considerable revision requests. Thus, choosing the right factor analysis might save both authors and reviewers a lot of trouble. Gorsuch (1997) gives an excellent introduction to the issue of EFA and item analysis which also covers issues not addressed here, e.g., rotation method, principal component analysis, etc.

To Keep or to Eliminate?

Regardless of the analytical approach used, reviewers regularly mention certain issues when it comes to item selection. The most critical remark is that authors often select items solely on the basis of statistical grounds. Decision rules like item discrimination rules <.x or factor loadings <.y are a safe way to attract reviewer critique. Such decision rules are useless if the authors do not state the aim of the measure initially in focus. But why is that?

Let us assume that a new measure is introduced to be used in a clinical setting as a screening instrument for a specific, potentially pathological trait. Thus, the aim might be to describe the trait standing for persons with a subclinical value. However, it is also necessary to identify those testtakers with clinically relevant trait standings. Thus, the items need to cover a wide range of trait values. To this end, item difficulties need to vary, and item distributions will also vary. These differences in item difficulties or item distributions might potentially lead to low loadings or low item total discriminations for those items with extreme difficulties and/or deviating item distributions. Gorusch (1997) wrote:

To measure all variations of a construct well it is necessary to have items with means that vary across values of the response range. The items with means at the extremes usually have skewed distributions. The skewed distributions impact the correlations because a correlation can only be high and positive if the two items being correlated have the same distributions. Different distributions among the items, therefore, reduce the correlations among the items. (p. 538)

Thus, authors who only applying statistical cutoffs run the risk of eliminating exactly those items that are needed to fulfill the measurement aim, i.e., to screen out testtakers with high trait values.

Reviewers repeatedly request authors to first clearly state the measurement aims of the instrument in focus. Only then it is possible to formulate an item selection strategy that ensures reaching this aim while also maintaining a good psychometric quality. Obviously, such strategies require more than the application of a simple parameter cutoff. Authors need to pay attention to the actual purpose of a specific item and the consequences this might have for item difficulty and distribution.

To Know or Not to Know?

Reviewers often also remark on missing item analyses judged as fundamental to ensuring an instrument’s usability. The number one issue raised by reviewers is whether the items formulated are actually understood by the testtakers. Answers to this question often require the application of qualitative techniques such as think-aloud studies or cognitive surveys. These approaches request a sample from the envisioned population to work on the instrument while uttering their thoughts aloud (Krosnick, 1999). The idea is that problems in item and instruction understanding can be spotted and summarily corrected before collecting larger data sets needed for statistical analyses (Willis, Royston, & Bercini, 1991). Moreover, these techniques can also be used to analyze the answer process itself (Robie, Brown, & Beaty, 2007; Ziegler, 2011).

Finally, another issue that reviewers often bring up when commenting on item-selection procedures is the modality of an item. Authors usually report item difficulties and standard deviations, sometimes skewness and curtosis, but very rarely proper information regarding the provided modality. Usually, it is assumed that the analyzed items only have one maximum. However, it is not unusual that items are bimodal. This phenomenon can occur if an item is phrased ambiguously. An example I always use is: “I like singing and dancing.” The consequence of such ambiguous items sometimes is that one part of the sample only rates one aspect (singing) while the other focuses on the second aspect (dancing) and a final part rates both. The resulting item distribution then often has two or more modi – which do not represent important subgroups regarding the measured trait, but more likely represent subgroups with different item understandings. Without specific information regarding the item distribution modality, these problems in item understanding go unnoticed.

To sum up, item selection still represents a major challenge in studies focusing on new measures or on the adaptation of existing measures. Problems within the review process occur due to the incorrect use of factor analytical approaches, the blind application of statistical cutoff rules, and the lack of attention to the testtakers’ understanding of the items. Solutions may be found in stating a measurement aim and formulating a corresponding item-selection procedure. Moreover, qualitative analyses at an early stage can be as informative as close inspections of the actual item distribution. All of this ensures, in my opinion as well as in the opinion of the reviewers’ I have summarized here, better instrument quality than a narrow focus on internally consistent measures.

References

  • Alonso-Arbiol, I. , van de Vijver, F. J. R. (2010). A historical analysis of the European Journal of Psychological Assessment. European Journal of Psychological Assessment, 26, 238–247. doi 10.1027/1015-5759/a000032 First citation in articleLinkGoogle Scholar

  • Gorsuch, R. L. (1997). Exploratory factor analysis: Its role in item analysis. Journal of Personality Assessment, 68, 532–560. First citation in articleCrossrefGoogle Scholar

  • Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537–567. First citation in articleCrossrefGoogle Scholar

  • Mussel, P. , Spengler, M. , Litman, J. A. , Schuler, H. (2012). Development and validation of the German Work-Related Curiosity Scale. European Journal of Psychological Assessment, 28, 109–117. doi 10.1027/1015-5759/a000098 First citation in articleLinkGoogle Scholar

  • Robie, C. , Brown, D. J. , Beaty, J. C. (2007). Do people fake on personality inventories? A verbal protocol analysis. Journal of Business and Psychology, 21, 489–509. doi 10.1007/s10869-007-9038-9 First citation in articleCrossrefGoogle Scholar

  • Schulze, R. , Roberts, R. D. (2006). Assessing the Big Five: Development and validation of the Openness Conscientiousness Extraversion Agreeableness Neuroticism Index Condensed (OCEANIC). Zeitschrift für Psychologie, 214, 133–149. doi 10.1026/0044-3409.214.3.133 First citation in articleLinkGoogle Scholar

  • Willis, G. B. , Royston, P. , Bercini, D. (1991). The use of verbal report methods in the development and testing of survey questionnaires. Applied Cognitive Psychology, 5, 251–267. First citation in articleCrossrefGoogle Scholar

  • Ziegler, M. (2011). Applicant faking: A look into the black box. The Industrial and Organizational Psychologist, 49, 29–36. First citation in articleGoogle Scholar

Matthias Ziegler, Institut für Psychologie, Humboldt Universität zu Berlin, Rudower Chaussee 18, 12489 Berlin, Germany, +49 30 2093-9447, +49 30 2093-9361,