Skip to main content
Open Access

Measurement Invariance Widely Holds for the Yale-Brown Obsessive Compulsive Scale

Published Online:https://doi.org/10.1027/1015-5759/a000788

Abstract

Abstract: The clinician-rated Yale-Brown Obsessive Compulsive Scale (Y-BOCS) is a widely used assessment tool for obsessive-compulsive disorder (OCD). However, the measurement invariance (MI) properties of the Y-BOCS, a prerequisite for group or time point comparisons in clinical research, have received little attention in previous studies. In this study, we aim to comprehensively investigate the factor structure and MI of the Y-BOCS severity rating and its symptom checklist, utilizing a large sample of OCD patients (N = 1,066). Our analysis considers various MI covariates, including time (pre- and post-therapy), severity, comorbidity, previous treatments, and demographics. Overall, the majority of tests conducted on the Y-BOCS severity rating and its symptom checklist revealed no substantial issues with MI, reinforcing the validity of the Y-BOCS for comparative clinical research. Specifically, we discuss a three-factor model for the severity rating, contrasting with a two-factor model for obsessions and compulsions when excluding the resistance items. Notably, our findings underscore the advantages and validity of employing latent factors rather than sum scores to model OCD severity and symptoms.

For the assessment of obsessive-compulsive disorder (OCD), the Yale-Brown Obsessive Compulsive Scale (Y-BOCS; Goodman et al., 1989) represents the gold standard as the dominant clinician-rated instrument (Deacon & Abramowitz, 2005; Fatori et al., 2020).

The Y-BOCS comprises two distinct parts: (1) an assessment of the severity of OCD symptoms (Y-BOCS-SR), usually viewed as its core section, and (2) an extensive symptom checklist (Y-BOCS-SCL) to gauge the occurrence of specific OCD symptoms both in the present and the past. Due to its widespread use, an impressive body of research dealing with psychometric aspects of the Y-BOCS has accumulated in the last 30 years, even allowing meta-analysis of factor analytical studies (Bloch et al., 2008). Despite this research effort, studies vary considerably in the reported factor structures, hampering replication in subsequent samples. Thus, the factor structure of the Y-BOCS is still a matter of debate and proposed models need validation (for recent overviews regarding severity and symptom scales see e.g., Fatori et al., 2020; Schulze et al., 2018). On top of the issue of factorial structure, another psychometric aspect has broadly been neglected in the literature: measurement invariance (MI). MI refers to the stability of measurement properties over time or in group comparisons. In a clinical setting, MI is especially important when assessing the outcome of a treatment, as valid conclusions on the assessed construct can only be drawn if the measurement properties themselves remain unchanged.

Empirical studies often show surprising violations of MI even in highly standardized and broadly used assessment tools (Fried et al., 2016; Wicherts, 2016). In general, MI properties of the Y-BOCS were rarely reported in the past, even in studies focusing on psychometrics. We, therefore, lack information about MI in OCD assessment.

The present study thus evaluates the baseline factorial structure of the Y-BOCS and provides an extensive investigation into MI for many clinically relevant variables, like comparing before and after therapy, overall impairment, and comorbidity among others.

Measurement Invariance

The conceptual definition of MI as the stability of a measurement instrument translates to the mathematical properties of a measurement model. In factor analysis, such models comprise three item characteristics: loadings, intercepts, and residuals. MI holds if these model parameters are unaffected by a covariate, i.e., multiple time points, a grouping variable like gender, or a continuous variable like age. Global tests for MI thus consist of comparing a model with fixed parameters to another one where parameters depend on the covariate. The three-item parameters yield several stages of invariance tests (Meredith, 1993): Initially, weak MI implies a similar factor structure (configural invariance) and equal factor loadings. These conditions are viewed as prerequisites for comparisons of regression slopes between groups (Chen, 2007) and are therefore sufficient in correlational studies. The second stage, called strong invariance, adds equal item intercepts to the model. Strong MI is a prerequisite if latent mean scores are to be compared between groups or time points (Chen, 2008). Finally, as a third and last step, strict invariance represents the additional equality of the item residuals. This stage assures equality in reliability.

In this paper, we will focus on strong invariance, as comparisons of latent means make up the major portion of clinical research questions, like comparing patients before and after therapy. If strong invariance is violated in such a comparison, we might draw biased conclusions on the treatment effect, that is, finding a falsely inflated or diminished effect. The validity of the measurement instrument thus influences the validity of the treatment study as well as subsequent meta-analyses. For example, in pre-post therapy study designs, lack of longitudinal MI could result from the instrument having unequal sensitivity across the severity range. Assuming that therapy actually changes severity on average, the instrument would measure at different sensitivity levels in the pre-post comparison. It would thus yield a distorted image of change.

Surprisingly little is known about the MI properties of the Y-BOCS as a gold standard tool in OCD assessment. Only two studies to our knowledge explicitly investigated MI, both in the Y-BOCS-SR. Vanhille and colleagues (2018) found no violations of MI for gender, while Garnaat and Norton (2010) assessed ethnicity, for which weak MI was supported but strong invariance was not. In summary, there is a lack of evidence that MI holds true for the Y-BOCS scales with respect to multiple clinically relevant variables.

Nevertheless, such studies build the foundation for the validity of score comparisons across Y-BOCS assessments.

Aim of This Study

In this paper, we will first investigate the factor structure of both, Y-BOCS-SR and Y-BOCS-SCL, in a sample of OCD patients. Measurement models for both parts of the Y-BOCS were selected from the literature and will be tested comparatively (Deacon & Abramowitz, 2005; Fatori et al., 2020; Kim et al., 1994; McKay et al., 1995; Schulze et al., 2018). After establishing sound measurement models, we will move on to test MI. For selecting variables of interest for MI testing, we considered characteristics that are recurrent research questions in OCD research: Demographics, clinical features, and time course over therapy (Osland et al., 2018). In detail, we will thus address gender, age, migration background, and SES for demographics. Clinical characteristics will comprise age of symptom onset, years since onset, previous treatment, motivation for change, general mental health, and the presence of common comorbidities such as anxiety disorders, depression, and obsessive-compulsive personality disorder.

Methods

Sample

We analyzed data collected from 2007 until February 2020 at an academic outpatient clinic of a psychology department. The clinic is specialized in obsessive-compulsive disorder and provides comprehensive diagnostics and cognitive-behavioral therapy (CBT). During the study period, adult patients (18 + years) were typically self-referred or referred to the clinic by psychiatrists or psychotherapists. If an initial clinical interview by a trained clinical psychologist indicated obsessive-compulsive symptoms, the Structured Clinical Interview for DSM-IV (SKID I & SKID II; First et al., 1995; Spitzer et al., 1992) was conducted to review the criteria for OCD as well as possible other diagnoses. Individuals with one of the following features were generally excluded after initial diagnostics: predominant hoarding symptoms, comorbid neurological, psychotic or borderline personality disorder, substance dependence, suicidal ideation, and patients with a verbal IQ below 80. All other persons, including those with other comorbid mental disorders, were offered individual CBT. In the current analysis, we included only patients with a primary diagnosis of OCD as defined by DSM-IV or, depending on the time of inclusion in the sample, DSM-V criteria of OCD (American Psychiatric Association, 2000, 2013). This resulted in a total sample of N = 1,066 patients who were offered individual CBT.

Individual CBT was conducted in accordance with German psychotherapy regulations and paid for by health insurance companies, thus being comparable with routine care provided by private practices. Therapists are paid for up to 80 therapy sessions with a duration of 50 min, depending on the conditions of the individual case. In the outpatient clinic, therapists were strongly encouraged to follow treatment guidelines (Hohagen et al., 2014) and to make use of CBT interventions for OCD, including exposure and response prevention as well as cognitive techniques as core interventions (see Kathmann et al., 2022, for further details). As customary in routine outpatient care, therapies were terminated on the basis of a consensual decision between patient and therapist, resulting in variable total numbers of therapy sessions (Mdn = 45, IQR [3160] for the subset of completed therapies).

While the general treatment conditions resembled naturalistic routine care, evaluation followed scientific standards and was conducted on a regular basis, that is, at initial diagnostics, start of therapy, every 20 therapy sessions thereafter, and a final assessment after the end of therapy. All time points included the application of the German Y-BOCS (interview and symptom checklist; Hand & Büttner-Westphal, 1991).

The local ethics committee at Humboldt-Universität zu Berlin approved the study (protocol number 2016-33). We did not preregister for the study. Summary statistics (e.g., frequencies and correlations) will be made available upon request from the authors.

Yale-Brown Obsessive Compulsive Scale (Y-BOCS)

The 10 items of the Y-BOCS-SR (Goodman et al., 1989) assess time spent, interference, distress, resistance, and control for obsessions and compulsions. These items were rated on item-specific five-point scales by a trained clinician, with higher values indicating higher severity. The inter-rater reliability of the Y-BOCS-SR is very high (.90; Jacobsen et al., 2003). In our sample, the mean sum score of the Y-BOCS-SR totaled 22.4 (SD = 6.2), which indicates average severity in our sample compared to the world-wide mean (24.0; Hunt, 2020).

We included 60 closed-response items of a German translation of the Y-BOCS-SCL (Ertle, 2012). These items cover a broad range of OCD symptoms, from characteristic symptoms like checking locks and stoves (75% prevalence in our sample) to less common symptoms like sexual obsessions involving children or incest (10% prevalence). In our sample, the checklist was self-rated for current symptoms (present/not present). In diagnostics, the Y-BOCS-SCL is used for identifying symptoms of interest for the subsequent severity rating with the Y-BOCS-SR. However, it can also be used to extract the prevalence of distinct symptom clusters in a sample (Bloch et al., 2008). With this in mind, we applied the 10-factor model by Schulze and colleagues (2018) with factors like Aggressive Impulses and Pure Repetitions. The factors and items are reproduced in the Electronic Supplementary Materials, ESM 1, Table E4.

Measurement Invariance Covariates

The time of assessment for all covariates was the initial diagnostic session Tdiag (except for time itself). We thus used this time point’s Y-BOCS data for cross-sectional MI analysis. Detailed information on the sample characteristics in the Y-BOCS and the covariates are given in Table 1 and in the following.

Table 1 Summary of MI covariates as sample characteristics at Tdiag

Demographics

Besides gender and age, we measured socioeconomic status (SES). Three seven-point rating scales on education, income, and occupational prestige were summed up to compute SES (Winkler & Stolzenberg, 1999). When considering ethnicity, we rated all patients as having a migration background who themselves or whose parents were not born in Germany (first and second generation immigrants). The five most prevalent countries of origin in the treatment center’s city are Turkey, Poland, Russia, Italy, and Bulgaria.

Obsessive-Compulsive Disorder Characteristics

Patients were asked at what age the first obsessive-compulsive symptom occurred. Additionally, we calculated the latency between the age of symptom onset and first contact with our institution. We collected data on previous treatments due to OCD, which could comprise psychotherapy (n = 754), pharmacological treatment (n = 695), or both. Based on an initial interview the clinician rated the patient’s motivation to change behaviors causing psychological stress on a 5-point scale from not present to very high. As responses were strongly skewed toward the upper end, we dummy-coded this variable with 1 representing high and very high.

General Mental Health

We applied the general severity index of the Brief Symptom Inventory (BSI; Derogatis & Melisaratos, 1983; Franke, 2000) as a self-report measure of a broad range of psychiatric symptoms. Additionally, the diagnostician-rated global assessment of functioning (GAF) is defined by the DSM-IV (American Psychiatric Association, 2000).

Comorbidities

Comorbid anxiety disorders comprised the following: Panic disorder with and without agoraphobia, social phobia, specific phobias, generalized anxiety disorder, and post-traumatic stress disorder (N = 276, 25.9%). For comorbid depressive disorder, we combined single major depression episodes, recurring depressive episodes, and dysthymia. N = 319 (29.9%) patients suffered from a concurrent depressive disorder at the time of initial assessment, while the total count increased to n = 608 (57.0%) when past remitted episodes were included. Additionally, we evaluated MI for two self-report questionnaires on depression: the Montgomery-Åsberg Depression Rating Scale (MADRS; Montgomery & Åsberg, 1979) and the revised Beck’s Depression Inventory (BDI-II; Beck et al., 1996; Hautzinger et al., 2006). Sum scores were calculated in both measures.

Obsessive-compulsive personality disorder (OCPD) was the most frequent personality disorder in our sample and was diagnosed with n = 58 (5.4 %) of our patients.

Time

While all N = 1,066 patients were assessed with the Y-BOCS at initial diagnostics Tdiag, there was attrition at subsequent measurement points due to ongoing therapy or dropout. Only n = 423 patients provided complete data for a comparison of Tpre and Tpost, at the beginning and end of therapy respectively. The time differences between Tpre and Tpost varied across patients, with a median duration of 12 months and IQR [618 months].

Analysis

All models besides the longitudinal models were tested with data from the initial diagnostics session Tdiag. The Y-BOCS items were five-point ordinal (Y-BOCS-SR) or dichotomous (Y-BOCS-SCL) which was why we applied factor analysis models for non-continuous data (B. Muthén, 1984). These analyses were mostly conducted in R (R Core Team, 2020) using the package lavaan (Rosseel, 2012) with the WLSMV estimator.

For MI models with continuous covariates, we alternatively used MPlus 8.4 (L. K. Muthén & Muthén, 1998–2017) with the MLR estimator assisted by the R package MplusAutomation (Hallquist & Wiley, 2018).

Baseline Models

Initially, we tested measurement models proposed in the literature (Y-BOCS-SR: see Figure 1; Y-BOCS-SCL: Schulze et al., 2018) without any MI covariate in lavaan. Factors were tested one by one with the exception of the Y-BOCS severity models (due to cross-loadings) and the symptom factors of Sexual Obsessions, Somatic Obsessions, and Mental Exactness (for items see ESM 1, Table E4). The latter consisted of only three items prohibiting factor model tests. We thus opted for a three-factor model in the case of the symptom model.

Figure 1 Assessed factor models for the Y-BOCS severity rating. For item description see Goodman et al. (1989). Obs = Obsessions. Comp = Compulsions. Items in grey are dropped in the model of Fatori et al. (2020).

Modeling factors in a joint model poses an even stricter test of model fit compared to unidimensional models as potential cross-loadings decrease fit. Model fit was assessed by the chi-square test complemented by fit indices as proposed by Hu and Bentler (1999). For good fit they recommend the combination of a SRMR ≤ 0.08 with either a RMSEA ≤ 0.06 (or the lower bound of its 90% confidence interval ≤ 0.06, respectively), or a CFI ≥ 0.95.

In cases where the proposed baseline models failed to provide an adequate fit to the data, we turned to modification indices in order to identify possible improvements. Modification indices capture the decrease in chi-square should a single previously constrained model parameter be released (e.g., a correlated error set to zero). We aimed to change the original models as little as possible and incorporated theoretical considerations as well as findings from competing models to justify alterations.

Testing Measurement Invariance

We estimated multigroup and longitudinal models in lavaan and evaluated model fit using the same standards described by Hu and Bentler (1999), as mentioned above. In all longitudinal models, we included pairwise correlated error terms for repeatedly measured items (Vandenberg & Lance, 2000).

For testing strong MI we compared the configurable factor model with a model where strong MI was imposed. In the present case of binary data, strong MI means that item loadings and thresholds were independent of the respective covariate.1 We tested the change in the chi-square statistic for statistical significance. Chi-square tests are susceptible to sample size in the sense that marginal deviations from MI easily reach statistical significance with typical sample sizes (Brannick, 1995). We thus, secondly, looked at the change of fit indices. Following Chen’s 2007 recommendations, we retained the hypothesis of strong MI compared to configural MI, when a decrease in CFI < .01 was accompanied by an increase in RMSEA < .015 or an increase in SRMR < .01.

In the case of the metric covariates BDI-II, MADRS, BSI, GAF, age, age of symptom onset, years since onset, and SES we used moderated non-linear factor analysis (Bauer, 2017) in order to treat the covariates as continuous. The items of the Y-BOCS-SCL were treated as dichotomous, whereas the ordinal response format of the Y-BOCS-SR had to be treated as a metric in this model class. Currently, MPlus provides only means for model comparison but no common model fit assessments for this novel MI model type. Configural MI was thus not tested explicitly for these covariates. For strong MI we relied on information criteria and chi-square difference tests as modified by Satorra and Bentler (2010). We considered decreases in model fit if the chi-square difference test yielded p < .05, accompanied by an increased Akaike information criterion (AIC), Bayesian information criterion (BIC), and sample size adjusted BIC (Liang & Luo, 2020).

Investigating Non-MI Structure

In those cases where configural or strong MI did not hold, we investigated further to portray possible causes. To address issues with configural MI, we carefully examined modification indices (Saris et al., 2009). Failing strong MI initiated the search for item subsets where strong MI held individually. We applied the approach proposed by Pohl and Schulze (2020) and Schulze and Pohl (2020) in combination with a significance test as described by Bechger and Maris (2015) at a p level of .05. Identification of invariant item subsets is an important step in partial MI modeling, where the assumption that MI holds for all items, is relaxed. Please note that the above-mentioned approach to partial MI does not identify single items that violate MI, but instead breaks the item list into subsets for which MI holds. The researcher then has to decide which subset they use when anchoring the scale. For further details on the rationale, we would like to direct the reader to the sources cited in this paragraph.

Results

Baseline Models

For the Y-BOCS-SR, we tested several measurement models taken from the literature (see Table 2 and Figure 1). Of the previously published models, a good fit was only achieved with a recently proposed two-factor model by Fatori and colleagues (2020) which excludes the item pair measuring resistance to symptoms (see Figure 1). The factors correlated moderately; r = 0.48. In order to provide an alternative model that keeps all items, we modified the measurement model via the use of modification indices in confirmatory factor analysis. Good model fit was only reached when splitting Obsessions and Compulsions, introducing a third Resistance/Control factor as suggested by Deacon and Abramowitz (2005), and allowing cross-loadings on the latter factor (see Figure 1). As this model is related to a model proposed by Kim and colleagues (1994), we labeled it “modified Kim model”. This model fitted the data best. On the other hand, the advantage in model fit is only small compared to the model by Fatori and colleagues (2020), and from a theoretical perspective, the modified Kim model is substantially more complex.

Table 2 Fit of baseline models

Schulze and colleagues (2018) differentiated the Y-BOCS-SCL into ten factors. When using their proposed model, four scales needed small changes (see Table 2). First, we deleted two items concerning hoarding disorder in the Keeping Order factor, as hoarding disorder has been separated from OCD (American Psychiatric Association, 2013). Model fit increased considerably after deletion (see Table 2; fit with hoarding items: χ2(9) = 93.90, p < .001, CFI = 0.926, RMSEA = 0.094, SRMR = 0.093). Secondly, we introduced a few correlated residuals after inspection of modification indices and theoretical considerations for three scales: (1) In the Cleanliness factor the residuals of closely related items “Concerned will get ill because of contaminant” and “Concerned will get others ill by spreading contaminant” were allowed to correlate. (2) The Mental Urges factor was slightly modified by introducing covarying errors for the related items “need to know and remember” and “need to tell, ask, or confess”. (3) For the Responsibility factor, correlated residuals for “Checking locks, stoves, appliances, etc.” with “Checking that did not make mistake” and “Fear will be responsible for something else terrible happening” were allowed due to large modification indices.

Measurement Invariance

Overall, there was little evidence for violations of MI (summary in Table 3, detailed results in the Supplements). The Y-BOCS-SR did not fail strong MI for any covariate but GAF, regardless of using the model by Fatori and colleagues (2020) or the modified Kim model. A search for homogeneous item subsets (Schulze & Pohl, 2020) for GAF by using the Fatori model split the Obsessions factor into two sets with the time spent, distress, and control item versus the interference item. For Compulsions, the pattern differed with the time spent and interference items versus distress and control items. We did not look into the modified Kim model as the approach by Schulze and Pohl (2020) cannot handle cross-loadings.

Table 3 Global tests on measurement invariance of Y-BOCS

MI held for most cases in the Y-BOCS-SCL, with four exceptions only. Configural invariance did not hold in the case of 1) the scale Responsibility with the covariate comorbid concurrent depression, and 2) for the factor Cleanliness and time. Both symptom scales had a mediocre fit in the baseline tests, to begin with, and made slight modifications necessary. Adding configurable MI restrictions pushed the model to fit below commonly accepted thresholds in those two cases. For Responsibility the main discrepancy was caused by the correlation of “Fear will harm others because not careful enough” with “Checking that nothing terrible did/will happen” in the non-depressed group. In the longitudinal model for Cleanliness the items “Excessive or ritualized hand washing” and “Excessive or ritualized showering, bathing, tooth brushing, or grooming” had substantial residual correlation especially at Tpost, thus explaining the lack of configural invariance with the covariate time.

Strong MI was violated for the Keeping Order symptom factor with the covariates comorbid anxiety disorder and BSI as a measure for overall impairment. In both cases, two item subsets were identified for which strong MI held individually. For comorbid anxiety disorder, items “Obsessions with the need for symmetry or exactness, not with magical thinking” and “Ordering/arranging compulsions” formed one subset, while “Fear of losing things” and “Excessive listmaking” made up the other. For BSI, the single item “Excessive listmaking” is separated from the other three.

For the continuous covariates BDI-II, MADRS, and BSI, there were several further symptom factors with significant chi-square difference tests and increasing AICs which indicated possible issues with strong invariance. The BIC on the other hand was more conservative in those cases and did not indicate violation of MI. These disparities are not unusual (Marsh et al., 2009) and as BIC weights model parsimony higher than other fit measures, the strong MI model is more likely endorsed by BIC. Although we decided to portray the results of the more complex indicator (BIC) in Table 3, these results have to be interpreted with greater caution than those of models with categorical MI covariates.

Discussion

Our findings on MI of the Y-BOCS put its widespread use in good light as we found little evidence for violations of strong MI. Also, we could generally reproduce basic factor structures reported in previous studies. Regarding the Y-BOCS severity scale, we would like to emphasize that our findings do not support the common practice to sum up Y-BOCS ratings to a single total score, as the single-factor model displayed a bad fit (see Table 1). The model by Fatori and colleagues (2020) reflects the practice, to sum up separate obsession and compulsions scores, but a good fit can only be attained when deleting both resistance items.

These items have been under discussion since early studies on the Y-BOCS (McKay et al., 1995). This is clinically comprehensible as the role of resistance is ambiguous (Kim et al., 1990). On the one hand, resistance is considered salutogenic as it reflects the patients’ insight and willingness to stop obsessions and compulsions. On the other hand, some forms of resistance are held to contribute to the maintenance of symptoms in the sense of avoidance behavior and thus constitute a pathological factor (Rachman, 1997). This ambiguity led to a removal of active resistance in the diagnostical criteria of OCD over the decades (Rasmussen & Parnas, 2022). Alternatively, in an exploratory fashion, defined a more complex model with three factors, which retained all items and was close to a model proposed by Kim and colleagues (1994) and supported by Moritz and colleagues (2002). When comparing the model by Fatori and colleagues (2020) and the altered version of Kim and colleagues (1994), we see a tradeoff between better overall fit for the modified Kim model and its higher complexity on the other hand.

With regard to MI, there was pleasing little to report. For those cases, where configural MI did not hold, we provided further evidence indicating some heterogeneity in the broad scales of Cleanliness and Responsibility. When configured MI held, but issues with strong MI were found, we applied the item cluster approach (Pohl & Schulze, 2020). It aims at establishing partial MI, that is, if MI does not hold for a full item set, it might still hold for item subsets. Using such a subset as an anchor allows for modeling as if full MI would hold. We report the item subsets found. In the light of a specific substantive research question (e.g., finding group differences) the researcher has to decide which item subset could serve as an anchor in the case at hand.

Limitations

We applied psychometrical tests of the Y-BOCS in a specific and confined setting – German (-speaking) OCD patients treated with CBT in an ambulatory setting. Our sample can thus not provide evidence for the stability of the Y-BOCS in cross-cultural research.

OCD phenomena in an urban European context differ to some extent from other populations, for example, with respect to religious obsessions (Okasha et al., 1994). Furthermore, we cannot give insight into time courses other than treatment with CBT, like no treatment or pharmacological treatments.

In a few instances, the symptom scales of the Y-BOCS-SCL had to be adjusted in comparison to the findings of Schulze and colleagues (2018). First, we deleted both hoarding items from the Keeping Order factor. Including these items resulted in considerable misfit which is understandable as, after the publication of the Y-BOCS, hoarding symptoms have been separated from OCD into a self-contained disorder (American Psychiatric Association, 2013). Accordingly, our outpatient center declines patients with predominant hoarding symptoms. Secondly, three other scales made it necessary to add correlated error terms, which can be seen as indicators of underlying multidimensionality. Further research with other samples is needed, especially with the longer and thus more complex symptom dimensions of Cleanliness and Responsibility.

We provide MI tests for continuous covariates in our study, applying a rather new feature in MI analysis. We chose to use moderated non-linear factor analysis (Bauer, 2017) as this technique provides full capabilities for fixing model parameters for MI and partial MI models. However, some alternatives present themselves in terms of local SEM (Hildebrandt et al., 2016) and penalized likelihood estimators Robitzsch and Lüdtke (2018).

While these approaches in part have the advantage of semi-parametric MI modeling, they cannot provide an in-depth analysis of partial MI (Schulze & Pohl, 2020). A limitation of the approach (Schulze & Pohl, 2020) is its inability to deal with cross-loadings. We thus could not look into partial MI modeling for the modified Kim model of OCD severity and the covariate GAF.

Conclusions

In sum, the Y-BOCS is largely measurement invariant, but only if analyzed in ways guided by latent variable models. If scores are calculated by simple sums, as often done in applied research, only a rough approximation to factor scores results, even if the same item sets are used. This detriment results from the assumptions made by sum scores (Rasch, 1960): (1) a single underlying latent variable, (2) equal loadings for all items, and (3) the neglect of measurement error. The first assumption is roughly obeyed in the OCD literature, for example, by calculating separate Obsessions and Compulsion scores for the Y-BOCS-SR. The second and third assumptions, however, are not incorporated and thus neglected in most clinical research with scores. This leads to bias in effect measures and generally diminished effect sizes with respect to ignoring measurement error.

In our study, we could provide evidence for clinically and statistically sound measurement models for the Y-BOCS. We portrayed two competing models with differing clinical appeal for describing the Y-BOCS severity scale. Considering the symptom scale we located the need for further refinements with the scales Cleanliness and Responsibility. We hope our favorable findings on MI properties of latent models for the Y-BOCS’ severity and symptom dimensions foster the use of latent variable modeling with the most prominent clinician-rated instrument in OCD research.

The study described in this manuscript is part of a doctoral thesis (https://edoc.hu-berlin.de/handle/18452/24811). The manuscript is under no embargo.

1Note, that we skipped the common step of weak MI. We did so for two reasons: Firstly, MI of loadings is preceded by threshold/intercept invariance in binary data MI analysis, contrary to the analysis of continuous items (Wu & Estabrook, 2016). Secondly, this restriction of binary data MI analysis nevertheless allowed us to go directly for the important goal of strong MI in order to check the legitimacy of comparing latent means (Chen, 2008).

References

  • American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed., text revision). APA. First citation in articleGoogle Scholar

  • American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). APA. First citation in articleCrossrefGoogle Scholar

  • Bauer, D. J. (2017). A more general model for testing measurement invariance and differential item functioning. Psychological Methods, 22(3), 507–526. https://doi.org/10.1037/met0000077 First citation in articleCrossrefGoogle Scholar

  • Bechger, T. M., & Maris, G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80(2), 317–340. https://doi.org/10.1007/s11336-014-9408-y First citation in articleCrossrefGoogle Scholar

  • Beck, A., Steer, R., & Brown, G. (1996). Manual for Beck Depression Inventory-II. The Psychological Corporation. First citation in articleGoogle Scholar

  • Bloch, M. H., Landeros-Weisenberger, A., Rosario, M. C., Pittenger, C., & Leckman, J. F. (2008). Meta-analysis of the symptom structure of obsessive-compulsive disorder. American Journal of Psychiatry, 165(12), 1532–1542. https://doi.org/10.1176/appi.ajp.2008.08020320 First citation in articleCrossrefGoogle Scholar

  • Brannick, M. T. (1995). Critical comments on applying covariance structure modeling. Journal of Organizational Behavior, 16(3), 201–213. https://doi.org/10.1002/job.4030160303 First citation in articleCrossrefGoogle Scholar

  • Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 14(3), 464–504. https://doi.org/10.1080/10705510701301834 First citation in articleCrossrefGoogle Scholar

  • Chen, F. F. (2008). What happens if we compare chopsticks with forks? The impact of making inappropriate comparisons in cross-cultural research. Journal of Personality and Social Psychology, 95(5), 1005–1018. https://doi.org/10.1037/a0013193 First citation in articleCrossrefGoogle Scholar

  • Deacon, B. J., & Abramowitz, J. S. (2005). The Yale-Brown Obsessive Compulsive Scale: Factor analysis, construct validity, and suggestions for refinement. Journal of Anxiety Disorders, 19(5), 573–585. https://doi.org/10.1016/j.janxdis.2004.04.009 First citation in articleCrossrefGoogle Scholar

  • Derogatis, L. R., & Melisaratos, N. (1983). The Brief Symptom Inventory: An introductory report. Psychological Medicine, 13(3), 595–605. https://doi.org/10.1017/S0033291700048017 First citation in articleCrossrefGoogle Scholar

  • Ertle, A. (2012). Zwangsstörung [Obsessive-compulsive disorder]. In G. MeinlschmidtS. SchneiderJ. MargrafEds., Lehrbuch der Verhaltenstherapie. Band 4: Materialien für die Psychotherapie [Cognitive behavioral therapy, Volume 4: Materials for conducting psychotherapy] (pp. 514–532). Springer. First citation in articleGoogle Scholar

  • Fatori, D., Costa, D. L., Asbahr, F. R., Ferrão, Y. A., Rosário, M. C., Miguel, E. C., Shavitt, R. G., & Batistuzzo, M. C. (2020). Is it time to change the gold standard of obsessive-compulsive disorder severity assessment? Factor structure of the Yale-Brown Obsessive-Compulsive Scale.Australian & New Zealand Journal of Psychiatry (54(7), 732–742). https://doi.org/10.1177/0004867420924113 First citation in articleCrossrefGoogle Scholar

  • First, M. B., Spitzer, R. L., Gibbon, M., & Williams, J. B. W. (1995). The Structured Clinical Interview for DSM-III-R Personality Disorders (SCID-II): Part I. Description. Journal of Personality Disorders, 9(2), 83–91. https://doi.org/10.1521/pedi.1995.9.2.83 First citation in articleCrossrefGoogle Scholar

  • Franke, G. (2000). Brief Symptom Inventory (BSI) von L. R. Derogatis (Kurzform der SCL-90-R). Deutsche Version [Brief Symptom Inventory (BSI) by L. R. Derogatis (Short form of SCL-90-R). German version]. Beltz. First citation in articleGoogle Scholar

  • Fried, E. I., van Borkulo, C. D., Epskamp, S., Schoevers, R. A., Tuerlinckx, F., & Borsboom, D. (2016). Measuring depression over time or not? Lack of unidimensionality and longitudinal measurement invariance in four common rating scales of depression. Psychological Assessment, 28(11), 1354–1367. https://doi.org/10.1037/pas0000275 First citation in articleCrossrefGoogle Scholar

  • Garnaat, S. L., & Norton, P. J. (2010). Factor structure and measurement invariance of the Yale-Brown Obsessive Compulsive Scale across four racial/ethnic groups. Journal of Anxiety Disorders, 24(7), 723–728. https://doi.org/10.1016/j.janxdis.2010.05.004 First citation in articleCrossrefGoogle Scholar

  • Goodman, W. K., Price, L. H., Rasmussen, S. A., Mazure, C., Fleischmann, R. L., Hill, C. L., Heninger, G. R., & Charney, D. S. (1989). The Yale-Brown Obsessive Compulsive Scale. I. Development, use, and reliability. Archives of General Psychiatry, 46(11), 1006–1011. https://doi.org/10.1001/archpsyc.1989.01810110048007 First citation in articleCrossrefGoogle Scholar

  • Hallquist, M. N., & Wiley, J. F. (2018). MplusAutomation: An R package for facilitating large-scale latent variable analyses in Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 621–638. https://doi.org/10.1080/10705511.2017.1402334 First citation in articleCrossrefGoogle Scholar

  • Hand, I., & Büttner-Westphal, H. (1991). Die Yale-Brown Obsessive Compulsive Scale (Y-BOCS): Ein halbstrukturiertes Interview zur Beurteilung des Schweregrades von Denk- und Handlungszwängen [The Yale-Brown Obsessive-Compulsive Scale (Y-BOCS): A semistructured interview for assessing severity of compulsive cognitions and behavior]. Verhaltenstherapie, 1(3), 223–225. https://doi.org/10.1159/000257972 First citation in articleCrossrefGoogle Scholar

  • Hautzinger, M., Keller, F., & Kühner, C. (2006). Das Beck Depressionsinventar II. Deutsche Bearbeitung und Handbuch zum BDI II [Beck Depression Inventory II. German translation and manual for BDI II]. Harcourt Test Services. First citation in articleGoogle Scholar

  • Hildebrandt, A., Lüdtke, O., Robitzsch, A., Sommer, C., & Wilhelm, O. (2016). Exploring factor model parameters across continuous variables with local structural equation models. Multivariate Behavioral Research, 51(2–3), 257–258. https://doi.org/10.1080/00273171.2016.1142856 First citation in articleCrossrefGoogle Scholar

  • Hohagen, F., Wahl-Kordon, A., Lotz-Rambaldi, W., & Muche-Borowski, C. (2014). S3-Leitlinie Zwangsstörungen [S3 – Guideline obsessive compulsive disorders]. Springer. First citation in articleGoogle Scholar

  • Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. https://doi.org/10.1080/10705519909540118 First citation in articleCrossrefGoogle Scholar

  • Hunt, C. (2020). Differences in OCD symptom presentations across age, culture, and gender: A quantitative review of studies using the Y-BOCS Symptom Checklist. Journal of Obsessive-Compulsive and Related Disorders, 26, Article 100533. https://doi.org/10.1016/j.jocrd.2020.100533 First citation in articleCrossrefGoogle Scholar

  • Jacobsen, D., Kloss, M., Fricke, S., Hand, I., & Moritz, S. (2003). Reliabilität der Deutschen Version der Yale-Brown Obsessive Compulsive Scale [Reliability of the German version of the Yale-Brown Obsessive Compulsive Scale]. Verhaltenstherapie, 13(2), 111–113. First citation in articleCrossrefGoogle Scholar

  • Kathmann, N., Jacobi, T., Elsner, B., & Reuter, B. (2022). Effectiveness of individual cognitive-behavioral therapy and predictors of outcome in adult patients with obsessive-compulsive disorder. Psychotherapy and Psychosomatics, 91(2), 123–135. https://doi.org/10.1159/000520454 First citation in articleCrossrefGoogle Scholar

  • Kim, S. W., Dysken, M. W., & Kuskowski, M. (1990). The Yale-Brown Obsessive-Compulsive Scale: A reliability and validity study. Psychiatry Research, 34(1), 99–106. https://doi.org/10.1016/0165-1781(90)90061-9 First citation in articleCrossrefGoogle Scholar

  • Kim, S. W., Dysken, M. W., Pheley, A. M., & Hoover, K. M. (1994). The Yale-Brown Obsessive-Compulsive Scale: Measures of internal consistency. Psychiatry Research, 51(2), 203–211. https://doi.org/10.1016/0165-1781(94)90039-6 First citation in articleCrossrefGoogle Scholar

  • Liang, X., & Luo, Y. (2020). A comprehensive comparison of model selection methods for testing factorial invariance. Structural Equation Modeling: A Multidisciplinary Journal, 27(3), 380–395. https://doi.org/10.1080/10705511.2019.1649983 First citation in articleCrossrefGoogle Scholar

  • Marsh, H. W., Muthén, B., Asparouhov, T., Lüdtke, O., Robitzsch, A., Morin, A. J., & Trautwein, U. (2009). Exploratory structural equation modeling, integrating CFA and EFA: Application to students’ evaluations of university teaching. Structural Equation Modeling: A Multidisciplinary Journal, 16(3), 439–476. https://doi.org/10.1080/10705510903008220 First citation in articleCrossrefGoogle Scholar

  • McKay, D., Danyko, S., Neziroglu, F., & Yaryura-Tobias, J. A. (1995). Factor structure of the Yale-Brown Obsessive-Compulsive Scale: A two dimensional measure. Behaviour Research and Therapy, 33(7), 865–869. https://doi.org/10.1016/0005-7967(95)00014-O First citation in articleCrossrefGoogle Scholar

  • Meredith, W. (1993). Measurement invariance, factor analysis, and factorial invariance. Psychometrika, 58(4), 525–543. https://doi.org/10.1007/BF02294825 First citation in articleCrossrefGoogle Scholar

  • Montgomery, S. A., & Åsberg, M. (1979). A new depression scale designed to be sensitive to change. The British Journal of Psychiatry, 134(4), 382–389. https://doi.org/10.1192/bjp.134.4.382 First citation in articleCrossrefGoogle Scholar

  • Moritz, S., Meier, B., Kloss, M., Jacobsen, D., Wein, C., Fricke, S., & Hand, I. (2002). Dimensional structure of the Yale–Brown Obsessive-Compulsive Scale (Y-BOCS). Psychiatry Research, 109(2), 193–199. https://doi.org/10.1016/S0165-1781(02)00012-4 First citation in articleCrossrefGoogle Scholar

  • Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115–132. https://doi.org/10.1007/BF02294210 First citation in articleCrossrefGoogle Scholar

  • Muthén, L. K., & Muthén, B. O. (1998–2017). Mplus user’s guide (8th ed.). Muthén & Muthén. First citation in articleGoogle Scholar

  • Okasha, A., Saad, A., Khalil, A. H., El Dawla, A. S., & Yehia, N. (1994). Phenomenology of obsessive-compulsive disorder: A transcultural study. Comprehensive Psychiatry, 35(3), 191–197. https://doi.org/10.1016/0010-440x(94)90191-0 First citation in articleCrossrefGoogle Scholar

  • Osland, S., Arnold, P. D., & Pringsheim, T. (2018). The prevalence of diagnosed obsessive compulsive disorder and associated comorbidities: A population-based Canadian study. Psychiatry Research, 268, 137–142. https://doi.org/10.1016/j.psychres.2018.07.018 First citation in articleCrossrefGoogle Scholar

  • Pohl, S., & Schulze, D. (2020). Assessing group comparisons or change over time under measurement non-invariance: The cluster approach for nonuniform DIF. Psychological Test Assessment and Modelling, 2(62), 281–303. https://www.psychologie-aktuell.com/fileadmin/%20Redaktion/Journale/ptam-2020-2/04%5C_Pohl.pdf First citation in articleGoogle Scholar

  • R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/ First citation in articleGoogle Scholar

  • Rachman, S. (1997). A cognitive theory of obsessions. Behavior and Cognitive Therapy Today, 35(9), 793–802. https://doi.org/10.1016/S0005-7967(97)00040-5 First citation in articleGoogle Scholar

  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Institute of Educational Research. First citation in articleGoogle Scholar

  • Rasmussen, A. R., & Parnas, J. (2022). What is obsession? Differentiating obsessive-compulsive disorder and the schizophrenia spectrum. Schizophrenia Research, 243, 1–8. https://doi.org/10.1016/j.schres.2022.02.014 First citation in articleCrossrefGoogle Scholar

  • Robitzsch, A., & Lüdtke, O. (2018, July). A regularized moderated item response model for assessing differential item functioning. Talk given at the VIII. European Congress of Methodology, Jena, Germany. First citation in articleGoogle Scholar

  • Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. http://www.jstatsoft.org/v48/i02/ First citation in articleCrossrefGoogle Scholar

  • Saris, W. E., Satorra, A., & Van der Veld, W. M. (2009). Testing structural equation models or detection of misspecifications? Structural Equation Modeling: A Multidisciplinary Journal, 16(4), 561–582. https://doi.org/10.1080/10705510903203433 First citation in articleCrossrefGoogle Scholar

  • Satorra, A., & Bentler, P. M. (2010). Ensuring positiveness of the scaled difference chi-square test statistic. Psychometrika, 75(2), 243–248. https://doi.org/10.1007/s11336-009-9135-y First citation in articleCrossrefGoogle Scholar

  • Schulze, D., Kathmann, N., & Reuter, B. (2018). Getting it just right: A reevaluation of OCD symptom dimensions integrating traditional and Bayesian approaches. Journal of Anxiety Disorders, 56, 63–73. https://doi.org/10.1016/j.janxdis.2018.04.003 First citation in articleCrossrefGoogle Scholar

  • Schulze, D., & Pohl, S. (2020). Finding clusters of measurement invariant items for continuous covariates. Structural Equation Modeling: A Multidisciplinary Journal, 28(2), 219–228. https://doi.org/10.1080/10705511.2020.1771186 First citation in articleCrossrefGoogle Scholar

  • Spitzer, R. L., Williams, J. B., Gibbon, M., & First, M. B. (1992). The Structured Clinical Interview for DSM-III-R (SCID): I: History, rationale, and description. Archives of General Psychiatry, 49(8), 624–629. https://doi.org/10.1001/archpsyc.1992.01820080032005 First citation in articleCrossrefGoogle Scholar

  • Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–70. https://doi.org/10.1177/109442810031002 First citation in articleCrossrefGoogle Scholar

  • Vanhille, S., Baldwin, S., Larson, M., & Storch, E. (2018, November). Y-BOCS factor structure analysis and calculation of measurement and structural invariance between genders. Poster presented at the Association for Behavioral and Cognitive Therapies 52nd Annual Convention, Washington, DC. First citation in articleGoogle Scholar

  • Wicherts, J. M. (2016). The importance of measurement invariance in neurocognitive ability testing. The Clinical Neuropsychologist, 30(7), 1006–1016. https://doi.org/10.1080/13854046.2016.1205136 First citation in articleCrossrefGoogle Scholar

  • Winkler, J., & Stolzenberg, H. (1999). Der Sozialschichtindex im Bundes-Gesundheitssurvey [Index of social class in the German Federal Health Survey]. Gesundheitswesen, 61(2), 178–183. First citation in articleGoogle Scholar

  • Wu, H., & Estabrook, R. (2016). Identification of confirmatory factor analysis models of different levels of invariance for ordered categorical outcomes. Psychometrika, 81(4), 1014–1045. https://doi.org/10.1007/s11336-016-9506-0 First citation in articleCrossrefGoogle Scholar