Skip to main content
Open AccessResearch Article

High Subset Homogeneity Impairs Structural Validity of the Barratt Impulsiveness Scale

Published Online:https://doi.org/10.1027/1614-0001/a000407

Abstract

Abstract: Problems in providing evidence of structural validity of the Barratt Impulsiveness Scale (BIS) are addressed. As the German translation of a short version of the scale (BIS-15) included pairs of items with highly similar item statements, like the original English version, we hypothesized high subset homogeneity (HSH) as the source of these problems. HSH denotes a situation in which a subset of items shows a larger degree of homogeneity than the remainder of the items of the scale. In a sample of 287 university students, we investigated BIS data by means of confirmatory factor analysis (CFA) models including BIS factors only and models additionally including HSH factors. Whereas the models without HSH factors yielded model misfit, good model fit was observed for the models with HSH factors. These results suggested that BIS items basically showed structural validity, but this validity was impaired by HSH.

The Barratt Impulsiveness Scale, currently in its eleventh revision (BIS-11; Patton et al., 1995), is a popular scale for measuring impulsivity in educational settings. Different versions have been created, and the scale has been translated into various languages. The 30 items of BIS-11 give rise to six first-order factors and three second-order factors. However, this factor structure has shown a low degree of replicability (e.g., Krampen et al., 2016; Preuss et al., 2008; Stanford et al., 2009; Vasconcelos et al., 2012). A possible reason for the failure to replicate the assumed structure of impulsivity is an especially high degree of similarity among subsets of items. Since such similarity leads to a high degree of homogeneity restricted to subsets of items, we refer to it as high subset homogeneity (HSH). The present paper investigates whether this kind of homogeneity is responsible for problems in replicating the described structure of the BIS scale.

Impulsivity is a relatively stable personality trait comprising spontaneous reactions and a lack of planning abilities. Since the first-order structure seems hardly replicable, research has focused on the second-order factors (Stanford et al., 2009): (1) attentional impulsivity is characterized by very fast cognitive decisions, (2) motor impulsivity is obvious in acting without prior deliberation, and (3) non-planning impulsivity manifests itself in a lack of future orientation. Spinella (2007) presented an abbreviated English-language version of BIS consisting of 15 items from BIS-11 (BIS-15). It was translated into German (Krampen et al., 2016) following the original items of BIS-11 (Patton et al., 1995) and proved to be reliable in identifying the three broad factors of attentional, motor, and non-planning impulsivity in a regular student sample.

BIS-15 includes two pairs of items that are likely to lead to HSH: items 6 (“Ich rutsche auf dem Stuhl hin und her im Theater und bei Vorträgen.” [“I ‘squirm’ at plays or lectures.”]) and 14 (“Ich bin unruhig im Theater und bei Vorträgen.” [“I am restless at the theater or lectures.”]) trigger the imagination of the same situation, while items 8 (“Ich plane meine berufliche Zukunft.” [“I plan for job security.”]) and 15 (“Ich bin zukunftsorientiert.” [“I am future oriented.”]) aim at the orientation towards the future. In both cases, the items are similar in wording and meaning, especially in the German translation. For example, items 6 and 14 require the participant to imagine being at “theater/plays or lectures” and to focus on the same behavioral characteristics. Items 6 and 14 load on the second-order factor attention-based impulsivity and the reversely scored items 8 and 15 load on the second-order factor non-planning impulsivity (Krampen et al., 2016; Spinella, 2007). High correlations indicate the relationships among the items of each pair (items 6 and 14; items 8 and 15; cf. Krampen et al., 2016; Spinella, 2007; Steinberg et al., 2013) in the dataset selected for this study. For detailed information, see the Results section.

Investigations of the structures of scales are usually conducted on the assumption that there is one source of responding that creates systematic variation, which is captured by the factor of the confirmatory factor analysis (CFA) model employed in the investigation (Brown, 2015; Graham, 2006). But there are also alternative theories suggesting that a larger number of sources constitute the relationships among the items of a scale, instead of only one source. Examples are sampling theory (Thomson, 1916) and process overlapping theory (Kovacs & Conway, 2016). In both cases, many influences contributing to a response are assumed. If the items of a scale share a large percentage of these influences, a high degree of homogeneity is likely to be observed that is obvious in close relationships among these items.

Alternative theories can also explain situations in which the relationships among a set of items are fractured. There can be the situation that a subset of items shows an especially high degree of homogeneity (HSH), while the degree of homogeneity among the remaining items of the scale is smaller. We refer to the latter case as normal remainder homogeneity (NRH). It is homogeneity as reflected by McDonald’s (1999) Omega coefficient. According to the alternative theories, both HSH and NRH can characterize the relationships of the items oF a scale. The different degrees of homogeneity shape the systematic variation of data that is investigated in CFA when comparing the covariances or correlations of items with the model-implied covariance matrix by a fitting function (Deng et al., 2018; Jöreskog, 1970). The combination of HSH and NRH means a low degree of overall homogeneity that can be expected to be reflected by the outcome of the fit investigation. As a consequence, the systematic variation of data cannot be captured sufficiently by one factor: a single factor can only account for part of the systematic variation. A supplementary factor may be necessary in order to account for the remainder.

Customary CFA measurement models include one latent variable and many manifest variables representing the items (Brown, 2015; Graham, 2006). Since the BIS subscales comprise only a few items each, subsets showing HSH are unlikely to include more than two or three items. Therefore, we propose a CFA model for a subscale that, for example, includes five items, two of which are assumed to show HSH. Because of the HSH items, the model comprises a supplementary factor with two factor loadings besides the main factor. This means that the research problem at hand requires a measurement model showing the structural features of a bifactor model (Gibbons & Hedeker, 1992; Reise, 2012; Rijmen, 2010). While in explorative factor analysis (EFA) a factor should show at least three or better four substantial factor loadings (Bandalos & Gerstner, 2016), in CFA, models with loadings of two indicator variables are also possible. Figure 1 illustrates this model.

Figure 1 Illustration of confirmatory factor model that includes a high subset homogeneity (HSH) factor for a subset of two items (4* and 5*) showing HSH.

The main factor is identified as the construct factor and the supplementary factor as the HSH factor. Items 4* and 5* are assumed to show HSH. Arrows link these factors to the manifest variables (= items). Solid arrows signify that the factor loading has to be estimated. Dashed arrows indicate that the factor loading is fixed. The value of .707 is assigned to the factor loading to obtain a scaled variance estimate (Schweizer & Troche, 2019). The scaling is expected to lead to variance estimates similar to eigenvalues for standardized data. Although this model does not include a relationship between the main and supplementary factors, there is the possibility of including such a relationship in order to determine whether the additional systematic variation is associated with the construct under investigation, or whether there is a correlation with any other variable of interest.

Objectives

One aim of the statistical investigation was to demonstrate that the presence of a subset of BIS items showing HSH included in the complete set of BIS items was likely to amount to model misfit. Another aim was to demonstrate that an HSH factor included in the CFA measurement model for BIS could account for the additional systematic variation due to HSH so that the investigation of model fit would yield good results. A further aim was to demonstrate structural validity according to the original BIS structure in a dataset that showed model misfit without considering HSH.

Method

Participants

The scale was completed by 287 undergraduate students at Goethe University Frankfurt, Germany. The mean age of the sample was 22.83 years (SD = 4.21). About twice as many females as males participated. Students received a financial reward or course credit for participation. They were recruited through advertising on university campuses. Each participant was informed of the study protocol and gave his or her written informed consent. Since the number of manifest variables of the CFA model was 15, 287 participants were sufficient (the ratio is almost 1:20).

Scale

The above-described German short version of BIS (Krampen et al., 2016) was used in this study. Items were answered on a 4-point scale (1 = rarely/never, 2 = occasionally, 3 = often, 4 = almost always/always). Items were administered by a computerized BIS version during a lab session. A secure online questionnaire system was used, which prevented missing data and implausible answers as well as possible.

Statistical Investigation

Data were investigated using four CFA models. (1) A model with one BIS factor (BIS(1F)); (2) a model with three BIS subscale factors (BIS(3F)); (3) a model with three BIS subscale factors and one HSH factor for the highest correlation of a pair of similar items (BIS(3F)+1HSH); and (4) a model with three BIS subscale factors and two HSH factors for the two highest correlations of pairs of similar items (BIS(3F)+2HSH). The factor loadings of the BIS factors were free for estimation and the loadings of the HSH factors were fixed to .707 (see above), while the variance parameter was free for estimation so that the variance would reflect the amount of captured systematic variation.

Statistical modeling was conducted by means of LISREL (Jöreskog & Sörbom, 2006). The robust scaled maximum likelihood estimation method was used for parameter estimation. Because of the ordered-categorical scale of data, polychoric correlations served as input to CFA. In evaluating the outcome regarding model fit, the following fit indices were considered: chi-square (χ2), root-mean-square error of approximation (RMSEA) (≤ .06), standardized root-mean-square residual (SRMR) (≤ .08), comparative fit index (CFI) (≥ .95), Tucker–Lewis index (TLI) (≥ .95), and Akaike information criterion (AIC). The cutoffs of the indices provided in parentheses were employed for this purpose (see DiStefano, 2016; Hu & Bentler, 1999). Models were compared by means of the CFI difference: differences larger than .01 were regarded as indicators of good model fit (Cheung & Rensvold, 2002).

Results

The correlations between the items suspected to show HSH are included in Table 1. Furthermore, means and standard deviations of the corresponding items are provided.

Table 1 Correlations between the two pairs of items showing preconditions for high subset homogeneity (HSH) (left-hand side) and means and standard deviations of these items (right-hand side) (N = 287)

While the mean correlation among all other pairs of items is .18 (SD = 0.02), the two crucial correlations are .80 (items 6 and 14) and .78 (items 8 and 15). All item means are close to 2, with standard deviations between 0.8 and 1.0. The full inter-item correlation matrix for the measure can be found in the Appendix.

Table 2 provides the estimates of the fit indices observed in investigating model fit in the four CFA models.

Table 2 Fit results for Barratt Impulsiveness Scale (BIS-15, German translation) with and without considering high subset homogeneity (HSH) (N = 287)

Regarding model BIS(1F), all fit indices indicated bad model fit. Regarding model BIS(3F), none of the fit indices met the requirements according to the cutoffs. The same applied to model BIS(3F)+1HSH, although all fit indices displayed improvement. The consideration of a second HSH factor (BIS(3F)+2HSH) led to the indication of good model fit according to CFI. Furthermore, the RMSEA result could be considered marginally good, whereas the SRMR and TLI results indicated an acceptable model fit. These outcomes underlined the necessity of taking both HSH item pairs into account.

Table 3 provides the standardized Phi estimates that reflect the correlations between the factors. Note: In structural equation modeling (SEM), relationships between latent variables are represented as parameters of the phi matrix. After standardization, the estimates are interpreted as correlations between latent variables.

Table 3 Phi estimates of relationships (correlations) among the Barratt Impulsiveness Scale (BIS) subscale factors and the high subset homogeneity (HSH) factors (N = 287)

The results suggest very close relationships between the non-planning factor and the other factors and a strong correlation between the other BIS factors. Note: Standardized Phi estimates quantify the relationships between random variables. They are estimated during the search for the best fit between the empirical covariances or correlations and the model-implied covariance matrix. Although standardized Phi estimates can be interpreted as a kind of correlation, they are not restricted to the exact range between −1 and 1. Therefore, the correlation between motor and non-planning impulsivity that exceeds 1 is to be considered a valid result. It suggests that in the investigated sample one factor might be sufficient for describing the relationships between the items loading on the motor and non-planning factors.

The HSH factors did not correlate with the subscale factors, with one exception. The HSH factors of items 8 and 15 showed a substantial correlation with the non-planning subscale. However, the size of this correlation was much smaller than the correlations among the subscales. It did not even account for five percent of the common variance.

Discussion

The structural validity of BIS has been questioned (e.g., Krampen et al., 2016; Preuss et al., 2008; Stanford et al., 2009; Vasconcelos et al., 2012). However, a convincing proposal for an alternative structure of BIS items is also missing. This situation suggests that there may be a disturbance, such as a method effect, that is more effective in some datasets than in others, so that the results of structural investigations using different datasets are inconsistent.

Method effects extend to characteristics of measurement that do not reflect the construct to be measured but nevertheless contribute to the measurement result (Maul, 2013; Schweizer, 2020). These are characteristics such as the amount of time available to complete the items, special features of the items (e.g., item wording), or the degree of similarity among the items. For example, BIS items 6 and 14 show a high degree of similarity because they refer to being at “theater/plays or lectures”. Using BIS-11 in an undergraduate student sample, Steinberg and colleagues (2013) found that items 6 and 14 show local dependence as “a result of item similarity” (p. 218). Local dependence violates one assumption of item response theory (IRT) and “occurs when items are more strongly correlated than can be accounted for by the underlying construct” (ibid.). The similarity of these items is additionally moderated by the experiences with these situations. In our study, participants were university students who could be assumed to be familiar with being in a lecture and also with being in a theater/attending a play, because of their educational backgrounds (visiting the theater/plays is quite common in schools in Germany). This may have caused the very high correlation between items 6 and 14. In contrast, the experiences of participants from other populations with such situations may be rarer, and therefore may not have given rise to consistent response patterns as in our study. As a consequence of this moderation, inconsistent results of investigations of data from different populations (BIS is traditionally used in a variety of clinical and non-clinical samples; cf. Stanford et al., 2009) are possible.

In the present study, we hypothesized that HSH might serve as a disturbance to the expected factor structure of the BIS scale, and found evidence in favor of this hypothesis: there was very high similarity of the wording and meaning characterizing two pairs of items; the correlations among the items of each pair were very high, while the average correlation was much smaller. Adding HSH factors linked to the items with the highest correlations to the measurement model, which in this way was turned into a bifactor model (Reise, 2012), led to a change from a bad to a good or acceptable model fit. Good model fit means that the factors of the model provide a decent account of the systematic variation of the data.

Therefore, we conclude that BIS basically shows structural validity, but also that there is possible impairment due to HSH. Since the influence of HSH because of two items can differ from one sample to another, the outcomes of structural investigations are likely to vary. Furthermore, the validity of BIS may depend on what is reflected by the HSH factor. The investigation of the relationships between the HSH factor and the BIS factors revealed that the additional systematic variation captured by this factor does not represent impulsivity. Although one of the correlations reached significance, its size was too small to be of importance. This means that what is captured by the HSH factor should be prevented from contributing to a BIS score.

It is crucial to understand that each item can become involved in HSH. Each item shows variation that is partly common variation and partly unique variation. However, what is common variation and what is unique variation depends to some degree on the environment that is constituted by the other items of a scale. In the ideal case, all items show common variation due to the same specific underlying source, while the remaining variation of each item is unique. In the less ideal case, there may be another source that turns some of the remaining variation in a subset of items into additional common variation, that is, into HSH. This means that part of the variation of an item can switch between being a unique variation and an additional common variation, depending on the environment constituted by the other items. The consequence is that removing one item of a pair of items showing HSH restores structural validity so that it is no longer necessary to consider a method factor. Starting from a situation like the one characterizing BIS-15, it does not matter which of a pair of items showing HSH is removed.

Conclusion

The reported research was guided by the following aims: demonstrating that HSH in BIS items leads to model misfit in CFA, and demonstrating that modeling HSH by HSH factors restores expected good model fit. The methodology of the investigation included CFA models comprising additional specific factors to account for the common variation of pairs of HSH items. The results were in line with the aims of the demonstrations. Therefore, our recommendation for dealing with HSH is the following: control the influence of HSH in investigations of structural and construct validity by using a CFA model with HSH factor(s). Alternatively, eliminate one of a pair of items that causes HSH. In the latter case, an additional investigation of structural validity should not be required since the nuisance variation of the remaining item of such a pair becomes a unique variation that no longer disturbs the underlying structure.

References

  • Bandalos, D. L., & Gerstner, J. J. (2016). Using factor analysis in test construction. In K. SchweizerC. DiStefanoEds., Principles and methods of test construction (pp. 26–51). Hogrefe Publishing. First citation in articleGoogle Scholar

  • Brown, T. A. (2015). Confirmatory factor analysis for applied research (2nd ed.). The Guilford Press. First citation in articleGoogle Scholar

  • Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. https://doi.org/10.1207/S15328007SEM0902_5 First citation in articleCrossrefGoogle Scholar

  • Deng, L., Yang, M., & Marcoulides, K. M. (2018). Structural equation modeling with many variables: A systematic review of issues and developments. Frontiers in Psychology, 9, Article 580. https://doi.org/10.3389/fpsyg.2018.00580 First citation in articleCrossrefGoogle Scholar

  • DiStefano, C. (2016). Examining fit with structural equation models. In K. SchweizerC. DiStefanoEds., Principles and methods of test construction (pp. 166–193). Hogrefe Publishing. First citation in articleGoogle Scholar

  • Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57(3), 423–436. https://doi.org/10.1007/bf02295430 First citation in articleCrossrefGoogle Scholar

  • Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them. Educational and Psychological Measurement, 66(6), 930–944. https://doi.org/10.1177/0013164406288165 First citation in articleCrossrefGoogle Scholar

  • Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 First citation in articleCrossrefGoogle Scholar

  • Jöreskog, K. G. (1970). A general method for analysis of covariance structure. Biometrika, 57(2), 239–251. https://doi.org/10.2307/2334833 First citation in articleCrossrefGoogle Scholar

  • Jöreskog, K. G., & Sörbom, D. (2006). LISREL 8.80. Scientific Software International Inc. First citation in articleGoogle Scholar

  • Kovacs, C., & Conway, A. R. A. (2016). Process overlap theory: A unified account of the general factor of intelligence. Psychological Inquiry, 27(3), 151–177. https://doi.org/10.1080/1047840x.2016.1153946 First citation in articleCrossrefGoogle Scholar

  • Krampen, D., Schweizer, K., Reiß, S., & Gold, A. (2016). Erprobung einer Kurzskala zur Erfassung von Impulsivität [Pilot run of a short scale to measure impulsivity]. In M. KrämerS. PreiserK. BrusdeylinsEds., Psychologiedidaktik und Evaluation XI (pp. 339–347). Shaker. First citation in articleGoogle Scholar

  • Maul, A. (2013). Method effects and the meaning of measurement. Frontiers in Psychology, 4, Article 169. https://doi.org/10.3389/fpsyg.2013.00169 First citation in articleCrossrefGoogle Scholar

  • McDonald, R. P. (1999). Test theory: A unified treatment. Erlbaum. First citation in articleGoogle Scholar

  • Patton, J. H., Stanford, M. S., & Barratt, E. S. (1995). Factor structure of the Barratt Impulsiveness Scale. Journal of Clinical Psychology, 51(6), 768–774. https://doi.org/10.1002/1097-4679(199511)51:6%3C768::AID-JCLP2270510607%3E3.0.CO;2-1 First citation in articleCrossrefGoogle Scholar

  • Preuss, U. W., Rujescu, D., Giegling, I., Watzke, S., Koller, G., Zetzsche, T., Meisenzahl, E. M., Soyka, M., & Möller, H. J. (2008). Psychometrische Evaluation der deutschsprachigen Version der Barratt-Impulsiveness-Skala [Psychometric evaluation of the German version of the Barratt Impulsiveness Scale]. Der Nervenarzt, 79, 305–319. https://doi.org/10.1007/s00115-007-2360-7 First citation in articleCrossrefGoogle Scholar

  • Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47(5), 667–696. https://doi.org/10.1080/00273171.2012.715555 First citation in articleCrossrefGoogle Scholar

  • Rijmen, F. (2010). Formal relations and an empirical comparison among the bifactor, the testlet, and a second‐order multidimensional IRT model. Journal of Educational Measurement, 47(3), 361–372. https://doi.org/10.1111/j.1745-3984.2010.00118.x First citation in articleCrossrefGoogle Scholar

  • Schweizer, K. (2020). Method effects in psychological assessment. Psychological Test and Assessment Modeling, 64, 337–343. First citation in articleGoogle Scholar

  • Schweizer, K., & Troche, S. (2019). The EV scaling method for variances of latent variables. Methodology, 15(4), 175–184. https://doi.org/10.1027/1614-2241/a000179 First citation in articleLinkGoogle Scholar

  • Spinella, M. (2007). Normative data and a short form of the Barratt Impulsiveness Scale. International Journal of Neuroscience, 117(3), 359–368. https://doi.org/10.1080/00207450600588881 First citation in articleCrossrefGoogle Scholar

  • Stanford, M. S., Mathias, C. W., Dougherty, D. M., Lake, S. L., Anderson, N. E., & Patton, J. H. (2009). Fifty years of the Barratt Impulsiveness Scale: An update and review. Personality and Individual Differences, 47(5), 385–395. https://doi.org/10.1016/j.paid.2009.04.008 First citation in articleCrossrefGoogle Scholar

  • Steinberg, L., Sharp, C., Stanford, M. S., & Tharp, A. T. (2013). New tricks for an old measure: The development of the Barratt Impulsiveness Scale–Brief (BIS-Brief). Psychological Assessment, 25(1), 216–226. https://doi.org/10.1037/a0030550 First citation in articleCrossrefGoogle Scholar

  • Thomson, G. H. (1916). A hierarchy without a general factor. British Journal of Psychology, 8(3), 271–281. https://doi.org/10.1111/j.2044-8295.1916.tb00133.x First citation in articleCrossrefGoogle Scholar

  • Vasconcelos, A. G., Malloy-Diniz, L., & Correa, H. (2012). Systematic review of psychometric proprieties of Barratt Impulsiveness Scale Version 11 (BIS-11). Clinical Neuropsychiatry: Journal of Treatment Evaluation, 9(2), 61–74. First citation in articleGoogle Scholar

Appendix

Table A1 Full inter-item correlation matrix