Skip to main content
Free AccessMultistudy Report

A Multilevel Multidimensional Finite Mixture Item Response Model to Cluster Respondents and Countries

The Forms of Self-Criticising/Attacking and Self-Reassuring Scale

Published Online:https://doi.org/10.1027/1015-5759/a000631

Abstract

Abstract. The aim of this study was to test the multilevel multidimensional finite mixture item response model of the Forms of Self-Criticising/Attacking and Self-Reassuring Scale (FSCRS) to cluster respondents and countries from 13 samples (N = 7,714) and from 12 countries. The practical goal was to learn how many discrete classes there are on the level of individuals (i.e., how many cut-offs are to be used) and countries (i.e., the magnitude of similarities and dissimilarities among them). We employed the multilevel multidimensional finite mixture approach which is based on an extended class of multidimensional latent class Item Response Theory (IRT) models. Individuals and countries are partitioned into discrete latent classes with different levels of self-criticism and self-reassurance, taking into account at the same time the multidimensional structure of the construct. This approach was applied to the analysis of the relationships between observed characteristics and latent trait at different levels (individuals and countries), and across different dimensions using the three-dimensional measure of the FSCRS. Results showed that respondents’ scores were dependent on unobserved (latent class) individual and country membership, the multidimensional structure of the instrument, and justified the use of a multilevel multidimensional finite mixture item response model in the comparative psychological assessment of individuals and countries. Latent class analysis of the FSCRS showed that individual participants and countries could be divided into discrete classes. Along with the previous findings that the FSCRS is psychometrically robust we can recommend using the FSCRS for measuring self-criticism.

Ever since Freud (1917) identified anger at the self as a crucial factor in depression the link between psychopathology and self-criticism has been subject to much theorising and research. In a major systematic review, Werner et al. (2019) outlined a number of different models for self-criticism and its measurement that include both self-report and qualitative studies. They highlight that self-criticism is a transdiagnostic vulnerability factor for mental health problems such as eating disorders, depression, suicidality, anxiety, psychotic symptoms, and interpersonal problems (Werner et al., 2019). Self-criticism also affects susceptibility to and persistence of psychopathology (Bergner, 1995; Blatt & Zuroff, 1992; Falconer et al., 2015) and stress (Kupeli et al., 2017) and influences the response to medical and psychological interventions and treatments (Blatt & Zuroff, 2005; Bulmash et al., 2009; Shahar et al., 2015). Löwet al. (2020) offered a systematic review of self-criticism and psychotherapy outcomes showing that the intensity of self-criticism is linked to poor outcomes, highlighting the importance of improving psychotherapy for these individuals. For these and other reasons, it is important to explore the psychometric properties of measures of self-criticism.

There exist a number of self-report measures assessing self-criticism including: the Depressive Experiences Questionnaire (DEQ; Blatt et al., 1976), the Levels of Self-Criticism Scale (LOSC; Thompson & Zuroff, 2004), the Forms of Self-Criticising/Attacking and Self-Reassuring Scale (FSCRS; Gilbert et al., 2004), the Self-Critical Rumination Scale (Smart et al., 2016), and a scale assessing situational state self-criticism, the Self-Compassion and Self-Criticism Scales (SCCS; Falconer et al., 2015). In a systematic review of measures of five scales of self-criticism, Rose and Rimes (2018) noted that many studies lacked methodological rigour but suggested that two scales, the Self-Critical Rumination Scale (Smart et al., 2016) and the FSCRS (Gilbert et al., 2004) were the most robust in terms of psychometric properties.

The FSCRS is unique in distinguishing between forms and functions of self-criticism. This is important because many psychological processes, for example, anger, anxiety, or caring behaviour, can have different forms and functions and will relate to social difficulties and mental health problems in different ways. Some individuals are self-critical because they experience failure and feel they should and could do better, which may be linked to perfectionism (Curran & Hill, 2019; Shahar, 2015), where others are not so much trying to improve but have a more self-hating attitude to self and wanting to get rid of aspects of the self (Gilbert et al., 2004). In a study of psychiatric patients, Castilho et al. (2017) found that concerns of being inadequate and self-hating forms of self-criticism linked to psychopathology in different ways, with self-hating being particularly linked to shame.

Given the importance of these core processes underpinning vulnerability to mental health and other difficulties, a number of authors have argued there is a need for closer inspection of self-criticism across cultures (Lau et al., 2010; Luyten & Blatt, 2013). Indeed, little is known about cross-cultural aspects of self-criticism. The main obstacle to performing cross-cultural research on self-criticism is the lack of invariant measurement tools with a stable factor structure. Therefore, any findings achieved without previous testing of scales’ measurement invariance must be considered with caution. A measure of self-criticism which has been evaluated across cultures is the FSCRS which, as noted above, has been subjected to meta-analytic reviews. A number of studies have reported on the invariance (Halamová et al., 2019) and factor structure (Halamová et al., 2018) of the FSCRS using data from 13 international samples. Thus, it is necessary to examine this scale further so more cross-cultural research on self-criticism is possible.

In order to advance knowledge of the performance of the FCSRS across cultures, we applied a multilevel mixture model to a multinational sample. Evidence of good fit for such a model would yield a valuable practical outcome – the knowledge of how many discrete classes there on the level of individuals and countries. For individuals, we will know how to discretize sum scores (i.e., how many cut-offs are to be used), and for countries, we can obtain important information on the magnitude of similarities and dissimilarities among them. An alternative modelling approach, the widely used multi-group structural equation model, faces well-known problems (Kim et al., 2017) of how to establish scalar measurement invariance which is necessary for a meaningful comparison of latent means (or more precisely, to display acceptable fit of such a model with the data). On the other hand, a multilevel finite mixture model is very useful when the violation of exact invariance and potential clustering of groups is assumed due to measurement non-invariance, factor mean heterogeneity, or both (Kim et al., 2017). Because a multilevel finite mixture model estimates far fewer parameters than a multi-group structural equation model, it is also more parsimonious (Kim et al., 2017). One of the main advantages of the latent class analysis model is the fact that countries and persons are clustered in classes, and that within a class they are more similar than between classes: this method is able to resolve the problems of measurement non-invariance and factor mean heterogeneity in one unified approach.

All previous studies have assumed that self-criticism is a continuous latent ability (characteristic), and to our knowledge, there is no study investigating potential differences and heterogeneities among groups of respondents. Measurement invariance studies focus on the comparison of latent means across countries, but they cannot reveal in principle how many respondents in each country contribute to these differences. Ignoring this fact could lead to failing to notice that a small number of highly self-critical respondents could have approximately the same impact on the latent factor mean as a large number of moderately self-critical respondents. The assumption of a homogenous population with a continuous distribution of the latent ability that differs from another homogenous population with a continuous distribution of the latent ability with a higher (or lower) level might be untenable in practice. The latent class analysis could provide a more fine-grained picture of how self-criticism is distributed within and across countries taking into account potential heterogeneity.

The Aim of the Research Study

The aim of this study was, therefore, to test the multilevel multidimensional finite mixture item response model of the FSCRS to cluster respondents and countries from 13 samples (N = 7,714) and from 12 countries.

Our research hypotheses were as follows:

Hypothesis 1 (H1):

We hypothesized that there would be at least five latent classes on the individual level. Our hypothesis was that they would approximately cover respondents with very low, moderately low, neutral, moderately high, and very high levels of self-criticism. On the other hand, we did not expect more than nine latent classes on the individual level; having more than nine latent classes would erase the main advantage of latent class analysis, and a standard multilevel Item Response Theory (IRT) model with the continuous distribution of latent ability would be more relevant.

Hypothesis 2 (H2):

We hypothesized that there would be at least three latent classes on the country level. Our hypothesis was that they will approximately cover countries with low, medium, and high overall levels of self-criticism. On the other hand, we did not expect more than five latent classes on the country level, as having more than five latent classes for 13 units would provide no meaningful comparison.

Hypothesis 3 (H3):

We hypothesized that the IRT model with different discriminations and different difficulties (graded response model) would better fit the data than more constrained one-parameter and rating scale models.

Materials and Methods

Research Sampling Procedure

Various methods were used to identify data on the FSCRS. Firstly we used Google Scholar to identify published studies by searching terms such as “the forms of self-criticising/attacking & self-reassuring scale” or “FSCRS”. We contacted the authors of studies that reported data on the FSCRS and comprised a nonclinical sample of 215 or more participants In addition, we searched the Compassionate Mind website (https://compassionatemind.co.uk/uploads/files/research-register-for-website.pdf) for unpublished research studies. Out of approximately 40 emails of invitations for collaboration, we obtained data for 13 non-clinical samples (Halamová et al., 2018). Because self-criticism is a clinically important issue, it is crucial to distinguish between clinical and non-clinical samples. This article is the first study clustering respondents and countries from non-clinical samples. In the subsequent second article, we will compare clinical samples from different countries.

Procedures for Different Samples

To date, the FSCRS has been translated into 11 languages. The current study presents data for eight versions of the FSCRS including five distinct English language samples from four different countries: Australia (N = 319; Kirby et al., 2017), Canada (N = 383; Hermanto & Zuroff, 2016, 2017; Zuroff et al., 2016), the United Kingdom 1 (N = 1,570; Kupeli et al., 2013), the United Kingdom 2 (N = 883; Baião et al., 2015; Gilbert, Baldwin, et al., 2006; Gilbert, Durrant, et al., 2006; Gilbert & Miles, 2000; Gilbert et al., 2002, 2004, 2005, 2012), and USA (N = 331; Gilbert et al., 2017). There were also data from studies using versions of the FSCRS translated to seven other languages including Chinese (N = 417; Yu, 2013), Dutch (N = 360; Sommers-Spijkerman et al., 2017), German (N = 230; Krieger et al., 2016; Krieger, personal communication), Hebrew (N = 476; Shahar et al., 2015; Shahar, personal communication), Italian (N = 389; Petrocchi & Couyoumdjian, 2016), Japanese (N = 264; Kenichi, personal communication), Portuguese (N = 764; Gilbert et al., 2017), and Slovak (N = 1,326; Halamová et al., 2017). Overall, data from 13 distinct non-clinical samples (N = 7,714) was used to test the FSCRS. All studies adhered to the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Research Instrument

The Forms of Self-Criticising/Attacking and Self-Reassuring Scale (FSCRS; Gilbert et al., 2004)

FSCRS is a 22-item instrument, which was created to measure the forms and functions of self-criticism and contrast it with self-reassurance when things do not go as expected or hoped for. Participants use a 5-point Likert scale (1 = not at all like me; 5 = extremely like me). The first of the three factors, Inadequate Self (IS), consists of nine items that capture experiences of a personal sense of inadequacy, and failure (e.g., “There is a part of me that feels I am not good enough”). The second factor, Hated Self (HS), has five items. It assesses destructive self-hatred, contempt, disgust, and desires to harm oneself (e.g., “I have a sense of disgust with myself”). The contrasting third factor, Reassured Self (RS), has seven items and captures the capacity for self-soothing, encouragement, support, and validation while experiencing negative events (e.g.,“I can still feel lovable and acceptable”). In general, criticising oneself or being inadequate is linked to competitive drive and fear of inferiority (Gilbert et al., 2004). Self-hatred on the other hand is more pathogenic, typically linked to problems of early life trauma including abuse. Here the individual is not trying to improve themselves but to get rid of parts of themselves (Gilbert et al., 2004). We have to add that all items in the sub-domain self-reassurance were reverse-scored to assure that this instrument will measure self-criticism.

Statistical Models and Data Analyses

This section is organized as follows. First, we will define a latent class (mixture) model for polytomous item responses with the IRT parameterization (Bacci et al., 2014; Bartolucci, 2007). Second, we will present the extension and specify the model taking into account the multidimensional structure of the data (3 dimensions), and subsequently, we will present the extension and specify the multidimensional model taking into account the multilevel structure of the data (11 countries). Then we will provide criteria for model selection. Finally, we will briefly describe the steps in our statistical analysis. Since our aim is to provide a practical guide rather than a very detailed mathematical specification, we will limit our explanation only to basic formulations leaving the exact mathematical expressions to the Electronic Supplementary Material (ESM 2).

Unidimensional Latent Class IRT Models

Following Bacci et al. (2014), let Xj denotes the observed response variable for the jth item of the analysed questionnaire, with j = 1, …, r. This variable has lj levels, indexed from 0 to lj − 1. In our case, we have 22 items of the FSCRS; each item is scored from 0 to 4 and therefore there are 5 levels, indexed from 0 to 4. In the unidimensional case, we define the probability that a respondent with latent trait (ability) level θ responds by category x to this particular item as:

(1)

Moreover, we define λj(θ) as the probability vector λj0(θ), …, λj,lj−1(θ))′. Now, we can specify the IRT model for polytomous responses:

(2) where gx(·) is a specific link function for the category x, and γj and βjx are IRT item parameters, usually called discrimination indices and difficulty levels. We will observe if the assumption that properly defined constraints on these parameters will result in differently parameterized IRT models.

We can have three different kinds of IRT specifications: (a) the specific link function; (b) constraints on discrimination parameters; (c) constraints on difficulty parameters.

Type of Link Function

As far as the link function is concerned, the most widely used IRT models are the graded response model with global logits (Bacci et al., 2014; Samejima 1969), the generalized partial credit model with local logits (Bacci et al., 2014; Muraki, 1992), and the continuation ratio model with sequential logits (Bacci et al., 2014; Fienberg, 1980). See ESM 2 for their mathematical formulations. Our instrument (FSCRS) is assumed to measure a continuous latent ability (self-criticism); therefore we will adopt the graded response (cumulative) model. It is of little interest to estimate the probability of answering adjacent categories, let alone the probability of a sequential process. The conclusion is that only the graded response models will be analyzed in this paper.

Constraints on Discrimination Parameters

We can specify two different types of IRT model models based on constrained or unconstrained discrimination parameters. First, each item might discriminate differently from the other items, and γi parameters are freely estimated. Second, all the items discriminate in the same way, that is, γj = 1 for all items (the one-parameter graded response model).

Constraints on Item Difficulty Parameters

Again, we can specify two different types of IRT model models based on constrained or unconstrained versions. First, the difficulty parameters βjx are unconstrained and freely estimated. Second, the difficulty parameters βjx are constrained so that the distance between difficulty levels from category to category is the same for all items – τx is the (same) difficulty of response category x for all j (the rating scale graded response model).

We can combine different constraints to obtain IRT models with very different parameterizations. For clarity, we reproduce in ESM 1a (slightly modified) Table E1 from Bacci et al. (2014).

Multidimensional Latent Class IRT Models

This unidimensional IRT model for polytomous responses can be easily extended to the multidimensional setting. Following Bacci et al. (2014, chap. 3.1), let us define s as the number of different latent traits measured by the items, Θ = (Θ1, …, Θs)′ as a vector of latent variables corresponding to these latent traits, and θ = (θ1, …, θs)′ as one of its possible realizations. This random vector Θ is assumed to have a discrete distribution with k support points, denoted by ξ1, …, ξk, and probabilities π1, …, πk, with πc = p(Θ = ξc). In addition, let us define δjd to be a dummy variable equal to 1 if item j measures a latent trait of type d and to zero otherwise, with j = 1, …, r and d = 1, …, s. Now we can redefine equation 2 above as follows:

(3)

Note that from the discreteness of the distribution of the random vector Θ, it immediately follows that the manifest distribution of X = (X1, …, Xr)′ for all subjects in the cth latent class is:

(4)

Of course, all possibilities of different IRT parameterizations (Table 1) apply for multidimensional IRT models as well, including the option to select a different link function. For the exact formulation of the multidimensional IRT model with global logits (the multidimensional graded response model), see ESM 2a. We must add that some constraints on the parameters must be placed to ensure the identifiability of models. It is required that one discriminant index is equal to 1 for each latent trait, and one difficulty parameter is equal to zero for each latent trait.

Table 1 Standard LC models for detecting the optimal number (k) of latent classes on the level of respondents

Multidimensional Multilevel Latent Class IRT Models

The multilevel generalization assumes that responses are a multivariate dependent variable and their hierarchical structure is expressed by respondents (units on first level) who are nested in countries (units on second level). This second-level is in itself a discrete latent variable. To comply with previous work in this field (Bacci & Gnaldi, 2015) and to avoid any confusion concerning various indices, we rename and redefine all indices as follows (Bacci & Gnaldi, 2015, p. 931): j is the generic item (j = 1, …, J), i is a respondent or unit on first level of the latent class k (i = 1, …, I; k = 1, …, K), h is a country (or any group) or unit on second level of the latent class c (h = 1, …, H; c = 1, …, C), and Yji is the response to item j of respondent i which might assume L ordered categories (l = 0, …, L − 1). We will also define the D-dimensional vector of latent variables, one of its possible realizations being , with a = 1 for units on first level units and a = 2 for units on second level units. Θ(1) has support points and corresponding weight , with k = 1, …, K( > 0, = 1), and Θ(2) has support points , corresponding weight , with c = 1, …, C( > 0, = 1). Finally, λj(θ) = (λj0(θ), …, λj,L−1(θ)), with θ = (θ(1)θ(2)) and λjl(θ) = p(Yji = l|θ). Now, the multidimensional multilevel IRT model is framed as follows:

(5) where j = 1, …, J; k = 1, …, K; c = 1, …, C, γj is the discrimination index for item j, βjl is the difficulty of item j and cut-off point l (l = 1, …, L −1), and δjd is an indicator variable assuming value 1 if item j measures dimension (or latent trait) d (d = 1, …, D), and zero otherwise. Additionally, in equation 5, and are the random effects of first- and second-level, or more specifically represents the value of the latent variable corresponding to latent trait d for a respondent who belongs to first-level latent class k, whereas is the value of the latent variable corresponding to latent trait d for a country (group) belonging to second-level latent class c.

Each latent variable has a discrete distribution, characterised by a finite number of support points (or mixture components) ξ and related mass probabilities π. Each support point ξ identifies a latent class of individuals; individuals belonging to the same latent class share the same level of the latent variable. Standardized support points are directly comparable with particular locations in a continuous latent ability from a standard IRT model.

We should add that, in this formula, weights and do not depend on observable covariates, and therefore and . This means that respondents (or countries) come from K (or C) latent classes which are homogenous in terms of the characteristics measured by the questionnaire (self-criticism). This model could be extended to include covariates, but we will not pursue this issue further here (see Bacci & Gnaldi, 2015 for such an extension).

For the sake of brevity, we will not provide here either a matrix formulation or mathematical description of the class of estimation procedures employed to fit models, expected maximised (EM) estimation algorithms (Bacci et al., 2014; Bartolucci, 2007; Vermunt, 2003, 2008 for details).

Criteria of Model Selection

The key issue in any latent class analysis is to select the best fitting model, especially the number of latent classes on both levels. Various methods have been proposed, all of them based on the log-likelihood function (for a review, see McLachlan & Peel, 2000). They can be broadly divided into two categories: parsimony-based (information criteria) or testing-based (likelihood-ratio tests). We will not discuss all their advantages and disadvantages, but rather focus on summarising the most frequently used approach to justify our procedure of model selection.

Likelihood-ratio tests are widely used, but their main problem for finite mixture models is that the parameter vector lies on the boundary of the parameter space under the null hypothesis. The usual asymptotic null distribution of the likelihood-ratio test statistics is therefore not valid (Melnykov & Maitra, 2010). A recently proposed procedure (Maitra & Melnykov, 2010) to overcome this issue is not yet implemented in any available statistical software; therefore we refrain from using the likelihood-ratio testing. On the other hand, of the two most frequently used information criteria, Akaike Information Criterion (AIC) typically substantially overestimates the number of classes (Melnykov & Maitra, 2010, p. 88), while Bayes-Schwarz Information Criterion (BIC) has been shown to demonstrate good performance (Keribin, 2000). However, these come with two caveats: first, the BIC tends to underestimate the number of components of a model when sample sizes are small (Melnykov & Maitra, 2010, p. 88). Fortunately, the sample size used in this article is large enough to minimise this issue. Second, to obtain a meaningful comparison of model fit from one model structure to another, improvements in BIC should be substantive to provide strong evidence; Kass and Raftery (1995) recommend differences greater than 10. To summarise, we will use the BIC for model selection, but only differences greater than 10 will be regarded as constituting strong evidence in favour of the respective model.

Statistical Procedure

We used R package for the statistical analysis, multiLCIRT package (Bartolucci et al., 2014; see ESM 3 for all R-codes).

  • (a)
    Descriptive statistics will be provided, namely distributions of the FSCRS item responses (raw percentage frequencies for each item category) and descriptives of raw scores of all three sub-dimensions (Inadequate Self, Reassured Self, Hated Self) of the FSCRS for each country.
  • (b)
    Selection of the optimal number k of latent classes on the level of respondents: Following recommendations in Bacci et al. (2014, chap. 4.2.2), we will adopt the standard latent class model, characterized by one dimension for each item; therefore, no choice on the item parameterization is required. We note that we will not test the dimensionality of the FSCRS; it is known that this instrument is three-dimensional, and we will rely on this previous knowledge (Gilbert et al., 2004). Second, we will not test the fit of models with different logit link functions: we have justified theoretically the choice of the global logits (Graded Response Model). Our procedure will consist of fitting models with increasing numbers of latent classes and stopping this procedure when the value of BIC of the tested model is greater (and not lower) than the value of the previously tested model. We will also stop this procedure when the value of BIC of the tested model is less than 10 lower than the value of the previously tested model. In both cases, the previously tested model will be retained.

A crucial problem with standard latent class models is the multimodality of the likelihood function, that is, their tendency to find a local, rather than global, maximum point. In order to avoid such an outcome, we will repeat the estimation process by randomly varying the starting values of the model parameters, in other words, repeating the procedure of testing as a whole with random starting values. Values of log-likelihood, number of parameters, and BIC will be reported, and the final model with the optimal number k of latent classes (the smallest BIC value) will be specified.

  • (c)
    Selection of the item discriminating and difficulty parameterization: This step consists of the choice of the possible constraints on the discriminating and difficulty parameters (see above, Table 1). Four different types of models (with k latent classes specified in the previous step) will be fitted by combining free or constrained γj parameters with free or constrained βjx parameters. Again, the final model with the optimal IRT parameterization (the smallest BIC value) will be specified.
  • (d)
    Selection of the optimal number u of latent classes on the level of countries: Again, our procedure will consist of fitting three-dimensional models (parameterized in accordance with the best fitting model from the previous step) with increasing numbers of latent classes on the level of countries (starting with the model with an appropriate number of latent classes based on the level of respondents), and stopping this procedure when the value of BIC of the tested model will be greater (and not lower) than the value of the previously tested model. We will also stop this procedure when the value of BIC of the tested model is less than 10 lower than the value of the previously tested model. In both cases, the previously tested model will be retained.
  • (e)
    Interpretation and visualization of estimated parameters of the final model: We will report support points and weights (membership in latent classes) for each latent class, both on the level of respondents and countries. Note that to ensure the identifiability of models, one discriminant index was set to 1 for each latent trait, and one difficulty parameter was set to 0 for each latent trait. However, it is possible to express estimates of the model parameters under the alternative identifiability constraint; the latent abilities on the level of respondents could be standardized (with mean 0 and variance 1). As a consequence, all parameters are more easily and directly interpretable in the IRT context; for example, it would be possible to compare standardized support points of finite mixture models with parameters of classical IRT models with continuous latent abilities. Moreover, the standardized latent abilities on the level of respondents will enable us to obtain the distribution of the estimated average abilities for all latent classes on the level of countries; having the standardized latent abilities on the level of respondents, we can easily compute the mean values for each class of countries, taking into account the weights of each respondents’ latent class. We will therefore report the standardized support points on the respondents’ level, the estimated average abilities for all latent classes on the level of countries, together with their weights (membership in latent classes), and their visualization. All IRT parameters for each item (standardized discrimination and difficulty indices) will also be reported.

Results

  1. (1)
    Descriptives: after inspecting the distribution of the FSCRS item responses (i.e., raw percentage frequencies, see Table E1b in ESM 1) and raw scores of sub-dimensions of the FSCRS pooled across countries (see Table E1c in ESM 1), we can clearly see that responses are hugely skewed to the right (positive skewness), especially for the Hated Self sub-dimension. We note again that responses for the sub-dimension Reassured Self were reverse-scored. This result justifies the use of the logistic (non-linear item-response theory) method because the obsersssved distribution cannot be approximated by a multivariate normal distribution.
  2. (2)
    Selection of the optimal number k of latent classes on the level of respondents: after fitting a series of standard LC models for detecting the optimal number (k) of latent classes on the level of respondents, we can conclude that models with deterministic outputs suggest k = 8 (Table 1). However, after re-fitting these models with random inputs to rule out the possibility that the likelihood function has reached a local rather than global maximum, we had to modify such a conclusion; models with random starting values suggest k = 7. This outcome underlines the importance of careful checking of fitted models to avoid relying on biased results. We will therefore retain the model with 7 latent classes on the level of respondents.
  3. (3)
    Selection of the item discriminating and difficulty parameterization: After fitting four three-dimensional models with different IRT parameterizations, it is clear that the most complex model (the Graded Response Model [GRM] with freely estimated discriminations and unconstrained difficulties) fits the data better than its alternatives (Table 2). We will continue with this model in our subsequent testing.
  4. (4)
    Selection of the optimal number u of latent classes on the level of countries: After fitting a series of three-dimensional GRM models with an increasing number of latent classes on the level of countries, we can observe that the model with 5 latent classes is the best option (Table 3).
  5. (5)
    Parameters of the final model: In Table 4, the values of standardized support points and weights (membership in latent classes) for each latent class on the level of respondents are reported (see previous section for the justification of the standardization procedure). These results reflect the positive skewness in responses; the membership in the first three latent classes is 45%, whereas the membership in the last 3 latent classes is only 37% (see Figure 1). This figure displays the position of support points for individual respondents on the scale of the latent ability (x-axis), and the percentage of respondents in each latent class (y-axis). Furthermore, in the sub-dimension Hated Self, only 3 support points (and corresponding latent classes) have negative values of latent ability (self-criticism) in comparison to 4 support points in the sub-dimension Inadequate Self. This agrees with the theoretical assumption that the Inadequate Self is relatively sensitive to low levels of latent Inadequate Self, whereas the Hated Self is relatively more sensitive to high levels of Hated Self (with potential clinical implications). Support points (and latent classes) of the sub-dimension Hated Self include more highly self-critical respondents than support points of the sub-dimension Inadequate Self. This is clearly reflected in IRT parameters as well (see Table E1d in ESM 1): difficulty indices (thresholds) for the sub-dimension Hated Self are clearly shifted to higher values of the latent ability (self-criticism) which means that endorsing the higher levels of observed responses requires higher levels of the latent ability. Note that responses in the sub-dimension Reassured Self were reverse-scored, so support points in this sub-dimension follow the same scaling.

Table 5 summarizes estimated average abilities, weights, and membership for all latent classes on the level of countries. Also, on this level, the distribution is clearly positively skewed to the right see Figure 2. This figure displays the position of support points for countries on the scale of the latent ability (x-axis) and the percentage of respondents in each latent class (y-axis) on the level of countries. If we inspect the membership of a particular country, the less self-critical class consists of participants from Portugal and Israel (class 1), and the most self-critical classes are composed of participants from Taiwan (class 4) and Japan (class 5).

Figure 1 Standardized support points and weights for sub-dimensions of the FSCRS on the level of respondents. (A) Inadequate Self; (B) Reassured Self; (C) Hated Self.
Table 2 IRT item parameterization selection for the three-dimensional model with 7 latent classes
Table 3 Multilevel graded response finite mixture models with 7 latent classes for detecting the optimal number (u) of latent classes on the level of countries
Table 4 Standardized support points and membership for 7 latent classes (respondents)
Figure 2 Average values of latent ability and weights on the level of countries.
Table 5 Estimated average abilities, weights, and membership for all latent classes on the level of countries

Discussion

In this study, we tested the multilevel multidimensional finite mixture item response model of the Forms of Self-Criticising/Attacking and Self-Reassuring Scale to cluster respondents and countries. We analysed the relationships between observed characteristics and the latent trait at the different levels (individuals and countries), and across different dimensions of FSCRS.

Results showed that respondents’ scores were dependent on the unobserved (latent class) individual and countries membership and multidimensional structure of the instrument, and justified the use of a multilevel multidimensional finite mixture item response model in comparative psychological assessment. However, a very important question is if latent classes on the individual level and on the country level are meaningfully different classes, or are they merely an arbitrary discretization of a continuum which fits the data? This question is very difficult to answer without including covariates that could predict membership in latent classes (both on the individual and on a country level), for example, gender and age differences, social status, childhood experiences. In fact, we cannot speculate about the sources and causes of the heterogeneity without such data and models, but we do think that these latent classes are not arbitrary; patterns of responses which assign individuals into latent classes are systematic and are probably driven by yet unknown sets of social and psychological sources. Further research is required to resolve this problem.

According to the findings, there are some items of the FSCRS which discriminate better than others. Related to Hated self, items 10 (“I have a sense of disgust with myself”) and 22 (“I do not like being me”) have the best discriminations while items 15 (“I call myself names”) discriminates less well. However, in terms of difficulties, all items from the Hated Self sub-scale are clearly more informative about highly self-critical participants compared with items from the Inadequate Self sub-scale. This may be partly because self-hatred represents a more pathogenic form of self-criticism and is linked to shame (Castilho et al., 2017).

Items from the Inadequate Self sub-scale provide balanced information across the whole range of the latent ability (self-criticism). Therefore, they are informative of participants with high, low as well as average levels of self-criticism. The best discriminating items from the Inadequate Self sub-scale are items 7 (“I feel beaten down by my own self-critical thoughts”) and 6 (“There is a part of me that feels I am not good enough”) and the worst is item 18 (“I think I deserve my self-criticism”).

All items from the Reassured Self subscale have low discriminations in comparison with items from the Inadequate and Hated Self subscales. The Reassured Self sub-scale has low discrimination specifically on the latent trait of (reversed) self-reassurance. This may be because the items in this sub-scale were reverse scored and the content of the items is positively phrased. It may also mean that self-reassurance is not the opposite of self-criticism on a unipolar continuum. Although self-reassurance can be an antidote for self-criticism it operates through completely different neurophysiological systems to those of self-criticism (Longe et al., 2010).

As high self-criticism is strongly linked to psychopathology (Werner et al., 2019), the distribution of self-criticism is positively skewed as various forms of psychopathology are. Consequently, the most self-critical people (class 6 and 7 combined is 18.9%) are less common in the general population than the least self-critical people (class 1 and 2 combined is 33.7%).

Similar to the findings of Halamová et al. (2019), participants from Japan are the most self-critical, followed by Taiwan, while the least self-critical participants are from Israel and Portugal. In addition to their findings, in this article, we found that there are 5 classes of countries according to their level of self-criticism and 7 classes of participants according to their level of self-criticism. In the future these could be used for diagnostic purposes; at the level of individuals, each sub-score for the particular dimension could be discretized into 7 classes, and norms for individuals should be created and used taking into account this latent class structure. For example, it would be inadvisable to distinguish sub-scores from any sub-dimension into “highly self-critical”, “average”, or “low self-critical” individuals without taking into account weighting (see Table 4); for example, “lowest self-critical group” (class 1) should contain only around 11% of individuals, and “highest self-critical” group (class 7) only around 5% of respondents. Secondly, we should be careful when comparing the sum scores from different countries. If countries do not belong to the same latent class, such comparisons are not warranted and could be biased, because measurement invariance is not guaranteed.

Limitations

The main limitation of this research study is our sampling. Most of the samples were convenience samples, largely consisting of college students. Therefore, the results are not representative of the involved nationalities.

In addition, our model is unable to provide an explanation for why individuals belong to their respective latent classes. This important limitation is a consequence of the fact that our model did not include any covariates predicting membership in latent classes, for example, gender and age differences, social and economic status, childhood maltreatment, experience with meditation. However, further systematic international research with a unified set of covariates is required to answer questions concerning the sources of the cross-cultural heterogeneity of self-criticism. This would also be a meaningful future research direction with useful implications.

Conclusion

In conclusion, the FSCRS is a well-designed self-report measure with good discrimination of highly self-critical participants. Therefore, we can recommend using the FSCRS for measuring self-criticism. From the practical point of view, Inadequate and Hated Self are functionally different and certainly have different distributions. Our results go also in contradiction to the suggestion of some that Inadequate and Hated Self should or may be combined into a global self-criticism score (Halamová et al., 2018).

We would like to thank Marion Sommers-Spijkerman from Centre for eHealth and Wellbeing Research, University of Twente, Enschede, The Netherlands, for providing us the Dutch sample.

References

  • Bacci, S., Bartolucci, F., & Gnaldi, M. (2014). A class of Multidimensional Latent Class IRT models for ordinal polytomous item responses. Communications in Statistics-Theory and Methods, 43(4), 787–800. https://doi.org/10.1080/03610926.2013.827718 First citation in articleCrossrefGoogle Scholar

  • Bacci, S., & Gnaldi, M. (2015). A classification of university courses based on students’ satisfaction: An application of a two-level mixture item response model. Quality & Quantity, 49(3), 927–940. https://doi.org/10.1007/s11135-014-0101-0 First citation in articleCrossrefGoogle Scholar

  • Baião, R., Gilbert, P., McEwan, K., & Carvalho, S. (2015). Forms of Self-Criticising/Attacking & Self-Reassuring Scale: Psychometric properties and normative study. Psychology and Psychotherapy, 88(4), 438–452. https://doi.org/10.1111/papt.12049 First citation in articleCrossrefGoogle Scholar

  • Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141–157. https://doi.org/10.1007/s11336-005-1376-9 First citation in articleCrossrefGoogle Scholar

  • Bartolucci, F., Bacci, S., & Gnaldi, M. (2014). MultiLCIRT: An R package for multidimensional latent class item response models. Computational Statistics & Data Analysis, 71, 971–985. https://doi.org/10.1080/03610926.2013.827718 First citation in articleCrossrefGoogle Scholar

  • Bergner, R. M. (1995). Pathological self-criticism: Assessment and treatment, Springer. First citation in articleCrossrefGoogle Scholar

  • Blatt, S. J., D’Afflitti, J. P., & Quinlan, D. M. (1976). The Depressive Experiences Questionnaire, Unpublished manuscript, Yale University, New Haven, CT. First citation in articleGoogle Scholar

  • Blatt, S. J., & Zuroff, D. C. (1992). Interpersonal relatedness and self-definition: Two prototypes for depression. Clinical Psychology Review, 12, 527–562. https://doi.org/10.1016/0272-7358(92)90070-O First citation in articleCrossrefGoogle Scholar

  • Blatt, S. J., & Zuroff, D. C. (2005). Empirical evaluation of the assumptions in identifying evidence based treatments in mental health. Clinical Psychology Review, 25, 459–486. https://doi.org/10.1016/j.cpr.2005.03.001 First citation in articleCrossrefGoogle Scholar

  • Bulmash, E., Harkness, K. L., Stewart, J. G., & Bagby, R. M. (2009). Personality, stressful life events, and treatment response in major depression. Journal of Consulting and Clinical Psychology, 77, 1067–1077. https://doi.org/10.1037/a0017149 First citation in articleCrossrefGoogle Scholar

  • Castilho, P., Pinto-Gouveia, J., & Duarte, J. (2017). Two forms of self-criticism mediate differently the shame–psychopathological symptoms link. Psychology and Psychotherapy: Theory, Research and Practice, 90(1), 44–54. https://doi.org/10.1111/papt.12094 First citation in articleCrossrefGoogle Scholar

  • Curran, T., & Hill, A. P. (2019). Perfectionism is increasing over time: A meta-analysis of birth cohort differences from 1989 to 2016. Psychological Bulletin, 145(4), 410–429. https://doi.org/10.1037/bul0000138 First citation in articleCrossrefGoogle Scholar

  • Falconer, C. J., King, J. A., & Brewin, C. R. (2015). Demonstrating mood repair with a situation-based measure of self-compassion and self-criticism. Psychology and Psychotherapy, 88(4), 351–365. https://doi.org/10.1111/papt.12056 First citation in articleCrossrefGoogle Scholar

  • Fienberg, S. E. (1980). The analysis of cross-classified categorical data, The MIT Press. First citation in articleGoogle Scholar

  • Freud, S. (1917). Mourning and melancholia. In J. StracheyEd., The standard edition of the complete psychological works of Sigmund Freud (Vol. 14, pp. 239–258). Hogarth Press. First citation in articleGoogle Scholar

  • Gilbert, P., Allan, S., Brough, S., Melley, S., & Miles, J. N. V. (2002). Relationship of anhedonia and anxiety to social rank, defeat and entrapment. Journal of Affective Disorders, 71, 141–151. https://doi.org/10.1016/S0165-0327(01)00392-5 First citation in articleCrossrefGoogle Scholar

  • Gilbert, P., Baldwin, M. W., Irons, C., Baccus, J. R., & Palmer, M. (2006). Self-criticism and self-warmth: An imagery study exploring their relation to depression. Journal of Cognitive Psychotherapy, 20, 183–200. First citation in articleCrossrefGoogle Scholar

  • Gilbert, P., Catarino, F., Duarte, C., Matos, M., Kolts, R., Stubbs, J., Ceresatto, L., Duarte, J., Pinto-Gouveia, J., & Basran, J. (2017). The development of compassionate engagement and action scales for self and others. Journal of Compassionate Health Care, 4, 311–323. https://doi.org/10.1186/s40639-017-0033-3 First citation in articleCrossrefGoogle Scholar

  • Gilbert, P., Cheung, M., Irons, C., & McEwan, K. (2005). An exploration into depression-focuseda nd anger-focused rumination in relation to depression in a student population. Behavioural and Cognitive Psychotherapy, 33, 273–283. https://doi.org/10.1017/S1352465804002048 First citation in articleCrossrefGoogle Scholar

  • Gilbert, P., Clark, M., Hempel, S., Miles, J. N. V., & Irons, C. (2004). Criticising and reassuring oneself: An exploration of forms, styles and reasons in female students. British Journal of Clinical Psychology, 43, 31–50. First citation in articleCrossrefGoogle Scholar

  • Gilbert, P., Durrant, R., & McEwan, K. (2006). Investigating relationships between perfectionism, forms and functions of self-criticism, and sensitivity to put-down. Personality and Individual Differences, 41, 1299–1308. https://doi.org/10.1016/j.paid.2006.05.004 First citation in articleCrossrefGoogle Scholar

  • Gilbert, P., McEwan, K., Gibbons, L., Chotai, S., Duarte, J., & Matos, M. (2012). Fears of compassion and happiness in relation to alexithymia, mindfulness and self-criticism. Psychology and Psychotherapy: Theory, Research and Practice, 8, 374–390. https://doi.org/10.1111/j.2044-8341.2011. 02046.x First citation in articleCrossrefGoogle Scholar

  • Gilbert, P., & Miles, J. N. V. (2000). Sensitivity to social put-down: Its relationship to perceptions of social rank, shame, social anxiety, depression, anger and self-other blame. Personality and Individual Differences, 29, 757–774. https://doi.org/10.1016/S0191-8869(99)00230-5 First citation in articleCrossrefGoogle Scholar

  • Halamová, J., Kanovský, M., Gilbert, P., Kupeli, N., Troop, N., Zuroff, D., Hermanto, N., Petrocchi, N., Sommers-Spijkerman, M., Kirby, J., Shahar, B., Krieger, T., Matos, M., Asano, K., Yu, F., & Basran, J. (2018). The factor structure of the Forms of Self-Criticising/Attacking & Self-Reassuring Scale in thirteen populations. Journal of Psychopathology and Behavioral Assessment, 40(4), 736–751. https://doi.org/10.1007/s10862-018-9686-2 First citation in articleCrossrefGoogle Scholar

  • Halamová, J., Kanovský, M., Gilbert, P., Kupeli, N., Troop, N., Zuroff, D., Petrocchi, N., Hermanto, N., Sommers-Spijkerman, M., Kirby, J., Shahar, B., Krieger, T., Matos, M., Asano, K., Yu, F., & Basran, J. (2019). Multiple Group IRT Measurement Invariance Analysis of the Forms of Self-Criticising/Attacking and Self-Reassuring Scale in Thirteen International Samples. Journal of Rational-Emotive & Cognitive-Behavior Therapy. Advance online publication. https://doi.org/10.1007/s10942-019-00319-1 First citation in articleCrossrefGoogle Scholar

  • Halamová, J., Kanovský, M., & Pacúchová, M. (2017). Robust psychometric analysis and factor structure of the Forms of Self-criticizing/Attacking and Self-Reassuring Scale. Československá Psychologie, 4, 456–471. First citation in articleGoogle Scholar

  • Hermanto, N., & Zuroff, D. Ch. (2016). The social mentality theory of self-compassion and self-reassurance: The interactive effect of care-seeking and caregiving. The Journal of Social Psychology, 156(5), 523–535. https://doi.org/10.1080/00224545.2015.1135779 First citation in articleCrossrefGoogle Scholar

  • Hermanto, N., & Zuroff, D. (2017). Experimentally enhancing self-compassion: Moderating effects of trait care-seeking and perceived stress. The Journal of Positive Psychology, 8, 1–10. https://doi.org/10.1080/17439760.2017.1365162 First citation in articleCrossrefGoogle Scholar

  • Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795. https://doi.org/10.1080/01621459.1995.10476572 First citation in articleCrossrefGoogle Scholar

  • Keribin, C. (2000). Consistent estimation of the order of finite mixture models. Sankhyā: The Indian Journal of Statistics, Series A (1961–2002), 62(1), 49–66. https://www.jstor.org/stable/25051289 First citation in articleGoogle Scholar

  • Kim, E. S., Cao, C., Wang, Y., & Nguyen, D. T. (2017). Measurement invariance testing with many groups: A comparison of five approaches. Structural Equation Modeling: A Multidisciplinary Journal, 24(4), 524–544. https://doi.org/10.1080/10705511.2017.1304822 First citation in articleCrossrefGoogle Scholar

  • Kirby, J. N., Steindl, S., Filus, A., Seppala, E., & Doty, J. R. (2017). The development and validation of the Compassionate Motivation and Action Scale. Manuscript under review. First citation in articleGoogle Scholar

  • Krieger, T., Martig, D. S., van den Brink, E., & Berger, T. (2016). Working on self-compassion online: A proof of concept and feasibility study. Internet Interventions, 6, 64–70. First citation in articleCrossrefGoogle Scholar

  • Kupeli, N., Chilcot, J., Schmidt, U. H., Campbell, I. C., & Troop, N. A. (2013). A confirmatory factor analysis and validation of the Forms of Self-Criticism/Reassurance Scale. British Journal of Clinical Psychology, 52(1), 12–25. First citation in articleCrossrefGoogle Scholar

  • Kupeli, N., Norton, S., Chilcot, J., Campbell, I. C., Schmidt, U. H., & Troop, N. A. (2017). Affect systems, changes in body mass index, disordered eating and stress: An 18-month longitudinal study in women. Health Psychology and Behavioral Medicine, 5(1), 214–228. https://doi.org/10.1080/21642850.2017.1316667 First citation in articleCrossrefGoogle Scholar

  • Lau, A. S., Chang, D. F., & Okazaki, S. (2010). Methodological challenges in treatment outcome research with ethnic minorities. Cultural Diversity and Ethnic Minority Psychology, 16, 573–580. https://doi.org/10.1037/a0021371 First citation in articleCrossrefGoogle Scholar

  • Longe, O., Maratos, F. A., Gilbert, P., Evans, G., Volker, F., Rockliffe, H., & Rippon, G. (2010). Having a word with yourself: Neural correlates of self-criticism and self-reassurance. NeuroImage, 49, 1849–1856. https://doi.org/10.1016/j.neuroimage.2009.09.019 First citation in articleCrossrefGoogle Scholar

  • Löw, C. A., Schauenburg, H., & Dinger, U. (2020). Self-criticism and psychotherapy outcome: A systematic review and meta-analysis. Clinical Psychology Review, 75, 101808. https://doi.org/10.1016/j.cpr.2019.101808 First citation in articleCrossrefGoogle Scholar

  • Luyten, P., & Blatt, S. J. (2013). Interpersonal relatedness and self definition in normal and disrupted personality development: Retrospect and prospect. American Psychologist, 68, 172–183. https://doi.org/10.1037/a0032243 First citation in articleCrossrefGoogle Scholar

  • Maitra, R., & Melnykov, V. (2010). Assessing significance in finite mixture models. Statistics Publications (Technical Report, 10-01). Department of Statistics, Iowa StateUniversity. https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=1069&context=stat_las_pubs First citation in articleGoogle Scholar

  • McLachlan, G. J., & Peel, D. (2000). Finite mixture models, Wiley. First citation in articleCrossrefGoogle Scholar

  • Melnykov, V., & Maitra, R. (2010). Finite mixture models and model-based clustering. Statistics Surveys, 4, 80–116. https://doi.org/10.1214/09-SS053 First citation in articleCrossrefGoogle Scholar

  • Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16, 159–176. https://doi.org/10.1002/j.2333-8504.1992.tb01436.x First citation in articleCrossrefGoogle Scholar

  • Petrocchi, N., & Couyoumdjian, A. (2016). The impact of gratitude on depression and anxiety: the mediating role of criticizing, attacking, and reassuring the self. Self and Identity, 15(2), 191–205. First citation in articleCrossrefGoogle Scholar

  • Rose, A. V., & Rimes, K. A. (2018). Self-criticism self-report measures: Systematic review. Psychology and Psychotherapy, 91(4), 450–489. https://doi.org/10.1111/papt.1217 First citation in articleCrossrefGoogle Scholar

  • Samejima, F. (1969). Estimation of ability using a responsepattern of graded scores. Psychome-trika, 34(Suppl 1, Monograph 17), 1–97. https://doi.org/10.1007/BF03372160 First citation in articleGoogle Scholar

  • Shahar, B., Doron, G., & Szepsenwol, O. (2015). Childhood maltreatment, shame-proneness, and self-criticism in social anxiety disorder: A sequential mediational model. Clinical Psychology and Psychotherapy, 22, 570–579. First citation in articleCrossrefGoogle Scholar

  • Shahar, G. (2015). Erosion: The psyhopathology of self-criticism, Oxford University Press. First citation in articleCrossrefGoogle Scholar

  • Smart, L. M., Peters, J. R., & Baer, R. A. (2016). Development and validation of a measure of self-critical rumination. Assessment, 23(3), 321–332. https://doi.org/10.1177/1073191115573300 First citation in articleCrossrefGoogle Scholar

  • Sommers-Spijkerman, M. P. J., Trompetter, H. R., ten Klooster, P. M., Schreurs, K. M. G., Gilbert, P., & Bohlmeijer, E. T. (2017). Development and Validation of the Forms of Self-Criticising/Attacking and Self-Reassuring Scale – Short Form. Psychological Assessment, 30(6), 729–743. First citation in articleCrossrefGoogle Scholar

  • Thompson, R., & Zuroff, C. (2004). The Levels of Self-Criticism Scale: Comparative self-criticism and internalized self-criticism. Personality and Individual Differences, 36(2), 419–430. First citation in articleCrossrefGoogle Scholar

  • Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology, 33, 213–239. https://doi.org/10.1111/j.0081-1750.2003.t01-1-00131.x First citation in articleCrossrefGoogle Scholar

  • Vermunt, J. K. (2008). Multilevel latent variable modeling: An application in education testing. Austrian Journal of Statistics, 37(3), 285–299. https://doi.org/10.17713/ajs.v37i3&4.309 First citation in articleGoogle Scholar

  • Werner, A. M., Tibubos, A. N., Rohrmann, S., & Reiss, N. (2019). The clinical trait self-criticism and its relation to psychopathology: A systematic review–update. Journal of Affective Disorders, 246, 530–547. https://doi.org/10.1016/j.jad.2018.12.069 First citation in articleCrossrefGoogle Scholar

  • Yu, F. Y. (2013). The relationship among self-criticism, self-compassion, rumination response style, and depression (Unpublished MA thesis). Chung Yuan Christian University. First citation in articleGoogle Scholar

  • Zuroff, D. C., Sadikaj, G., Kelly, A. C., & Leybman, M. J. (2016). Conceptualizing and measuring self-criticism as both a personality trait and a personality state. Journal of Personality Assessment, 98(1), 14–21. https://doi.org/10.1080/00223891.2015.1044604 First citation in articleCrossrefGoogle Scholar