Distinctions Without Differences?
Effects of Instruction Sets for Situation Characteristic Ratings
Abstract
Abstract: The interplay between persons and situations is central to psychology, and there has been a recent increase in research on psychological situation characteristics. One key issue in situation research concerns the distinction between consensual situation perceptions and subjective situation construal. We examined for the first time whether different instructions can be used to shift the degree to which situation characteristic ratings reflect consensual (i.e., shared) versus subjective perceptions. N = 631 participants were randomly assigned to one of three instructions: standard (unspecified), personal (how participants personally perceive a situation), and consensual (how participants think most other people perceive a situation). Each participant rated 31 of 62 standardized situation stimuli (pictures of everyday life situations) on the DIAMONDS situation characteristics. Using Bayesian multilevel models, we found that (1) instructions did not affect inter-rater agreement, although residual variation was somewhat higher in the personal instruction; (2) averages of ratings did not differ across instructions; and (3) Big Five traits had significant but small effects on situation characteristic ratings, which did not differ across instructions. Our findings highlight that situation characteristic ratings behave very similarly across instructions, and we discuss conceptual and practical implications.
Examining the interplay between persons and situations is an essential task for psychology. Accompanied by a growth of research on personality dynamics (Kuper et al., 2021a), work on the conceptualization and assessment of psychological situations has recently increased (Rauthmann & Sherman, 2020). Situational information includes cues (physically present stimuli), characteristics (dimensions of situation perceptions), and classes (defined by cue/characteristic compositions; Rauthmann, Sherman, & Funder, 2015). Here, we examine situation characteristics, the taxonomization and assessment of which has received high attention recently.
A key issue in situation research concerns the differentiation of properties of the situation (i.e., situation variables) from subjective situation perceptions (i.e., person variables; Rauthmann, Sherman, & Funder, 2015). This differentiation is closely related to the distinction between situation contact (i.e., people encounter actually different situations) and situation construal (i.e., people subjectively construe identical situations differently; Rauthmann, Sherman, Nave, et al., 2015). Differentiating these aspects empirically is difficult but central for various research topics, such as the influence of personality traits on situation contact versus construal, process accounts of situational reactivities, or effects of persons on situations and vice versa. Prior work has focused on specific study designs that allow differentiating consensual perceptions and construal. For instance, ex-situ ratings of situation descriptions from daily life were contrasted with in-situ ratings to examine the effects of personality traits on situation contact and construal – with effects on construal being more pronounced (Hong et al., 2020; Rauthmann, Sherman, Nave, et al., 2015). Abrahams et al. (2021) used juxta situm ratings (a physically present rater observing the situation), also finding descriptively larger trait correlations for construal. Here, we use standardized situation stimuli (pictures of situations; Kuper, von Garrel, et al., 2024). Since different raters respond to identical situations, this design allows the study of situation construal in particular by differentiating it from normative perceptions.
In addition to the study design, an underexplored factor is the instructions participants are given when rating situations. Typically, participants are simply asked to indicate how much an item applies to or describes a situation (Rauthmann & Sherman, 2016). Instead, participants could be asked to indicate how they personally perceive a situation (e.g., to enhance construal) or how they think that most other people would perceive a situation (e.g., to increase consensual variance). Rauthmann and Sherman (2021) proposed different instructions, three of which are similar to those applied here: (1) global situational judgment (unspecific; termed here standard instruction), (2) self-referential perspective (how one would perceive the situation if one was actually in it – to which we added an emphasis on personal perceptions; termed here personal instruction), and (3) normative perspective-taking (inferring how people in general would perceive the situation; termed here consensual instruction). These instructions have not been empirically contrasted yet. Examining such instructions is important from a conceptual and applied perspective. Conceptually, it can elucidate whether people have insight into normative situation perceptions above and beyond their own perceptions. Practically, both for research and applied settings (e.g., in the work context), it is valuable to know whether instructions could be used to shift the degree to which ratings reflect subjective construal versus consensual perceptions.
Some work in other areas has examined the effects of instructions. For example, work on person perception has asked raters to differentiate how they perceive a target, how they think the target sees themselves, and how they think others see the target (Solomon & Vazire, 2016). Work on trait ratings has examined reference group effects through instructions and item wording (Lenhausen et al., 2024). Moreover, research on situational judgment tests differentiated what people would do (behavioral tendency instructions) and what people think one should do (knowledge instructions; McDaniel et al., 2007). While some of this work has observed differences between instructions, the effects of instructions are still generally poorly understood. Importantly, for situation perception, it is unexplored whether instructions affect key properties of situation characteristic ratings (e.g., agreement, average ratings, nomological correlates), which we addressed here.
To this end, we examined the effects of three instructions (standard, personal, consensual) for situation characteristic ratings in a pre-registered between-subject design. We focused on DIAMONDS situation characteristic ratings (Duty, Intellect, Adversity, Mating, pOsitivity, Negativity, Deception, and Sociality; Rauthmann et al., 2014) of standardized situation stimuli (pictures) and examined three research questions (RQs).
How do instructions affect inter-rater agreement concerning situation characteristic ratings? Relatedly, how do instructions affect the magnitude of the following variance components: (a) situation: differences between situations in how they are normatively perceived, (b) person: individual differences in the tendency to rate situations highly on a given situation characteristic, and (c) residual: measurement error plus person × situation variance reflecting situation-specific idiosyncratic construal?
How do instructions affect average situation characteristic ratings (i.e., are there mean-level differences between instructions)?
How does the effect of self-reported personality traits on situation characteristic ratings (i.e., trait-dependent situation construal) differ across instructions?
For RQ1, we hypothesized inter-rater agreement (ICCs) to be smallest for the personal instruction, largest for the consensual instruction, and in-between for the standard instruction. Specifically, if people can accurately approximate normative perceptions (beyond their own perceptions), this should lead to higher agreement. In turn, if the personal instruction enhances subjective construal, this should be reflected in lower agreement. Corresponding differences in variance components were examined without hypotheses. Given the scarcity of prior work, effects on average ratings (RQ2) were also examined without hypotheses. However, work from other areas suggests that such effects are plausible. For example, people may underestimate the degree to which other people experience negative emotions (Jordan et al., 2011). Similar phenomena could be evident in people’s beliefs about others’ situation perceptions (consensual instruction) as compared to their own perceptions, especially for situation characteristics related to valence. RQ3 allowed us to examine whether nomological correlates of situation characteristic ratings at the person level differed across instructions. We distinguished theoretically expected and unexpected trait–situation characteristic combinations (see Kuper et al., 2021b): We expected Extraversion to be linked to Mating, pOsitivity, Negativity (–), and Sociality; Agreeableness to Adversity (–), Deception (–), and Sociality; Conscientiousness to Duty; Neuroticism to Adversity, pOsitivity (–), Negativity, and Deception; and Openness to Intellect. Further, we expected the effects of traits on ratings to be strongest for the personal, weakest for the consensual, and in-between for the standard instruction. Specifically, if people accurately approximate normative perceptions in the consensual instruction, this should reduce trait-dependent construal, which should in turn be enhanced by the personal instruction.
Method
Sample and Procedure
Participants from the US aged 18 years or older were recruited using Prolific (≥98% approval rate) for an online study and compensated £6.40. We included two attention checks and two language comprehension checks, and participants were automatically excluded if they responded incorrectly to two of them. After the baseline survey, participants completed the main task, which consisted of rating 31 stimuli on the DIAMONDS situation characteristics. We excluded participants who did not finish the study. To complete cases (N = 658), we applied the following exclusion criteria to counteract careless responding: SD = 0 across all BFI-2 items; SD = 0 for any DIAMONDS dimension across all stimuli; SD = 0 across all 16 DIAMONDS items within five or more stimuli; and unrealistic completion times of 15 min or less. We retained N = 631 participants aged 18–79 years (M = 40.75, SD = 13.25), 48.34% of which were female. Participants were randomly allocated to instructions (standard: N = 203, personal: N = 214, consensual: N = 214) and to one of two stimuli sets.
Materials and Measures
Stimuli
Stimuli were 62 pictures depicting everyday life situations which were selected for relevance to the Big Five traits and variance on the DIAMONDS. For a detailed description of the stimulus pool generation, see Kuper, von Garrel, et al. (2024). We created two sets of 31 pictures each. Sets were created to be relatively parallel in terms of means and standard deviations in Big Five personality states, DIAMONDS ratings, and an indicator of whether the situation was social (0/1), using data from Kuper, von Garrel, et al. (2024).
Instructions
We used three instructions for situation characteristic ratings:
- •Standard: “View the depicted situation from the perspective from which the photograph was taken. Take a moment to think about the situation […]. Please indicate to what degree these statements apply to the depicted situation.”
- •Personal: “Imagine that you are in this situation, viewing it from the perspective from which the photograph was taken. Take a moment to imagine personally being in this situation […]. Imagining yourself in this situation, please indicate to what degree you personally think that these statements apply to the depicted situation.”
- •Consensual: “Put yourself in the shoes of other people viewing this situation from the perspective from which the photograph was taken. Take a moment to imagine how most other people would probably perceive the situation […]. Please indicate to what degree most other people would think that these statements apply to the depicted situation.”
These instructions (cursive text in bold) were repeated for each stimulus. Further, a more detailed instruction was given for an example picture in the beginning, which was repeated once in case participants failed a comprehension check.
Situation Characteristics
Participants rated each situation on the DIAMONDS situation characteristics using a 16-item measure shortened from the S8* (Rauthmann & Sherman, 2016) and applied previously by Kuper, von Garrel, et al. (2024). Items were presented on a 7-point Likert-type scale (1 = not at all to 7 = totally). Example items are “A job needs to be done” (Duty) and “The situation could elicit stress” (Negativity). For full item texts and descriptives, see Table S6 and S1–S3. Inter-item correlations for the same dimensions were high, ranging from .64 (Sociality) to .84 (Negativity).
Big Five Traits
Participants completed the Big Five Inventory 2 (BFI-2; Soto & John, 2017). Items were presented on a 5-point Likert-type scale. Reliabilities ranged from ωtotal = .88 (Agreeableness) to ωtotal = .94 (Neuroticism). For details, see Table S2.
Statistical Analyses
Analyses were conducted using Bayesian multivariate cross-classified multi-level models with the R package brms (Bürkner, 2017). Situation characteristic ratings were dependent variables. All variables were z-standardized (situation characteristics across persons and situations, traits on the person level). For RQ1 and RQ2, we simultaneously fitted three multi-level models for the three instructions with random intercepts for persons and situations. This allowed the isolation of three variance components: situation (normative perceptions), person (construal generalizing across situations), and residual (measurement error plus person × situation variance reflecting situation-specific idiosyncratic construal). Moreover, we computed intra-class correlations (ICCs: situation variance/total variance). We calculated variance components and ICCs within each instruction as well as instruction differences. For RQ2, we compared averages of situation characteristic ratings across conditions. For RQ3, we further included the effects of a given personality trait in the prediction of a given situation characteristic (including random slopes). These effects purely represent trait-dependent situation construal since traits only predict the person variance in ratings, which is by design independent of and statistically isolated from situation variance. We computed an overall trait effect, instruction-specific trait effects, and instruction differences in trait effects. Across analyses, relevant metrics were computed using posterior draws. For instance, instruction differences in variance components were computed by subtracting each posterior draw for a given variance (e.g., situation random intercept variance) in a given instruction (e.g., personal) from each posterior draw of that variance in another instruction (e.g., consensual). This allowed us to obtain credible intervals and Bayesian p-values (twice the proportion of the posterior on the other side of zero than the estimate) for newly calculated metrics. Further details are available in Table S7 and the openly shared R code. We fit separate models for all DIAMONDS dimensions (8; RQ1 and RQ2) and for combinations of DIAMONDS and Big Five traits (40; RQ3). For RQ3, we compared effect sizes for expected and unexpected variable combinations (see above). We used a significance threshold of α = .01.
Results
Descriptives and detailed results can be found in Tables S1–S3 and Tables S4–S5, respectively. For RQ1, we examined inter-rater agreement (see Table 1). Average ICCs were highly similar across instructions (standard = .398, personal = .410, consensual = .419). Differences across instructions were only significant in three cases: Higher agreement for Deception in the personal compared to the standard instruction (unexpected), and higher agreement for pOsitivity in the consensual compared to the standard and personal instructions (expected). Overall, differences in ICCs across instructions were small and rarely significant, not supporting our hypothesis. This pattern of results was similar for the situation and person variance. Notably, stimulus differences in ratings (normative perceptions) were nearly perfectly correlated across instructions (latent, model-based: r ≥ .99, see Table S4; manifest using situation means: r ≥ .97, see Table S1). Regarding residual variation, several instruction differences were significant, suggesting higher variance in the personal compared to the standard instruction (average difference: .021; 5.56% higher residual variance) and the consensual instruction (average difference: .034; 9.32% higher residual variance).
For RQ2, we examined averages in situation characteristic ratings across instructions (see Table 2). No differences across instructions were significant. For instance, pOsitivity and Negativity ratings were not significantly different in the personal compared to the consensual instruction.
For RQ3, we first examined the effects of personality traits on situation characteristic ratings across all conditions (see Table 3). Several associations were significant (19/40), although effect sizes were relatively small on average with = 0.046 (see also Table S2 for person-level correlations). Effect sizes were only slightly higher for expected variable combinations, with = 0.052 (expected) and = 0.043 (unexpected). For instance, Extraversion was linked to pOsitivity and Sociality; and Agreeableness was linked to Adversity (–) and Sociality (expected). In addition, Extraversion was associated with Duty and Intellect; and Agreeableness with Mating (–) and pOsitivity (unexpected). Overall, traits were associated with situation construal, although effects were often small, and the pattern of results was more complex than hypothesized. Importantly, average effect sizes were highly similar across instructions with = 0.046 (personal), = 0.049 (standard), and = 0.050 (consensual), which contradicts our expectations (i.e., largest and smallest effects in the personal and consensual instruction, respectively). Differences were very small, and no interaction effects between traits and instructions were significant (see Table 3). Overall, instructions did not significantly moderate relations between traits and situation characteristic ratings.
Discussion
Using a large sample of participants and standardized situation stimuli, we examined different instructions for situation characteristic ratings. For RQ1, contrary to our predictions, inter-rater agreement was highly similar across instructions, with differences being small and rarely significant. Thus, the extent to which situation characteristic ratings reflect consensual perceptions versus subjective construal could not be strongly shifted by simply asking people to adopt these different perspectives. Hence, people may not be able to approximate normative situation perceptions beyond their own perceptions. This is in line with work by Solomon and Vazire (2016) on person perception suggesting that people may not differentiate between their own views of a given person and how they think others view this person. Practically, this implies for research and applied purposes that differentiating aspects of situation perception (normative perceptions vs. construal) cannot be achieved through instructions but requires suitable study designs (e.g., with multiple raters; Rauthmann, Sherman, & Funder, 2015; Rauthmann, Sherman, Nave, et al., 2015). Interestingly, residual variance in situation characteristic ratings was highest in the personal instruction. This variance comprises measurement error and person × situation interactions (i.e., idiosyncratic perceptions of a situation beyond how it is perceived in general and beyond one’s own general tendencies). This provides tentative evidence that situation-specific idiosyncratic construal may be enhanced through personal instructions.
For RQ2, no differences in average situation characteristic ratings were observed across instructions. Such differences might be considered plausible given prior work in other areas comparing different perspectives. For instance, people may systematically under- and overestimate the degree to which others experience negative and positive emotions, respectively (Jordan et al., 2011). To name a different example, people view their own personality traits more positively than they are viewed by strangers (Kim et al., 2019). We did not find such perspective differences in how people view situations personally and how they think most others would view situations. Practically, this again implies that differences in instructions may have few consequences.
Concerning RQ3, we observed several relations between personality traits and situation characteristic ratings. Given our use of standardized situation stimuli and modeling approach, these effects purely reflect relations between traits and situation construal. Compared to other designs (e.g., Abrahams et al., 2021; Rauthmann, Sherman, Nave, et al., 2015), our approach has the advantage of a very large number of raters for identical situations. Our results support the general finding that traits and construal are linked (e.g., Hong et al., 2020; Rauthmann, Sherman, Nave, et al., 2015), although effect sizes were often small. Moreover, effects were not restricted to hypothesized relations between conceptually close trait and situation characteristic dimensions, suggesting more complex patterns of personality-driven situation construal. Notably, the effects of traits on ratings did not differ between instructions. Given the small average effects of traits on situation perceptions, interaction effects with instructions may be difficult to detect (see below). Nevertheless, average trait–situation characteristic associations were highly similar across conditions. Thus, the findings do not support that trait-dependent situation construal can be enhanced through instruction. Notably, results for RQ1 highlighted that the personal instruction enhanced residual variance (including person × situation interactions) rather than person variance (that could in principle be explained by traits).
Strengths, Limitations, and Future Directions
Our study had several strengths, including the large sample size; controlled design; use of appropriate models to compare variance components, means, and trait effects across instructions; and implementation of open science practices, including pre-registration. Nevertheless, some limitations and directions for future research should be noted. First, as in many psychological studies, we cannot rule out potential influences of response styles or careless responding. Careless responding may increase the person and residual variance while decreasing the situation variance. Moreover, it is possible that individual participants did not pay strong attention to the instructions, despite them being highlighted in-text for each stimulus. However, we took several steps to counteract undue effects of careless responses (see exclusion criteria). Moreover, some instances of careless responding should attenuate but not erase the effects of instructions if they existed.
Second, based on our findings, it appears that the power of instructions to affect situation characteristic ratings is quite limited. While some instruction differences were observed for the residual variance, we generally found few differences (i.e., regarding ICCs, person and situation variance, average ratings, correlations with traits; generalization to other properties would need to be examined). Importantly, however, our findings only pertain to the specific instructions and their wording applied here, and examining other instructions might be required to draw more general conclusions about the (lack of the) power of instructions. Relatedly, it could be examined whether results are similar when explicitly incentivizing accurate approximations of normative perceptions in the consensual instruction (e.g., people could be able to form accurate approximations if they are particularly motivated to do so). Finally, results may not generalize to comprehensive trainings for raters. Work on trainings to improve person perception accuracy suggests that especially practice and feedback can be effective (Blanch-Hartigan et al., 2012), which could be examined for situation characteristics in future work.
Third, our findings suggest that people’s beliefs about unspecified, unknown others’ (normative) situation perceptions may not be accurate. Nevertheless, such beliefs may still exist and represent relevant phenomena. Other designs are required to further examine such beliefs. For example, within-subject designs could be employed to estimate (a) to what extent participants’ subjective perceptions and their beliefs about others’ (consensual) perceptions of the same situations differ, (b) to what extent this differentiation varies between participants, and (c) whether subjective perceptions and beliefs about others’ perceptions are differentially related to momentary psychological states.
Fourth, while our sample size was large, especially the estimation of interaction effects between traits and instructions may not have had optimal precision since trait main effects (and hence also realistic interaction effects) tended to be small. This is reinforced by the credible interval widths for interactions (see Table S5), suggesting that these results should be viewed as preliminary. Nevertheless, even descriptively, average trait effects were not larger in the personal condition, and the results are congruent with the general pattern that instructions had few effects. Future work using even larger samples to examine potential interactions between instructions and traits would be required. In addition, within-subject manipulations of instructions could increase precision and are conceptually interesting (see above) but could also evoke artifactual contrast effects.
Overall, future work could examine the generalizability of our findings to different instructions, study designs (e.g., within-subject designs, designs in daily life, designs incentivizing accurate perceptions), situational information measures (e.g., other characteristic taxonomies; cues), nomological correlates of situation characteristics at different levels (e.g., person-level, person × situation level), and different samples of persons and situations.
Conclusion
To conclude, our findings suggest high similarity across three different instructions for situation characteristic ratings. Practically, if our findings generalize, each of the three instructions could be used in future work – ideally conceptually aligned with the research question or applied purpose at hand. However, the specific choice of instruction will likely have little consequence.
References
2021). Person-situation dynamics in educational contexts: A self-and other-rated experience sampling study of teachers’ states, traits, and situations. European Journal of Personality, 35(4), 598–622. https://doi.org/10.1177/08902070211005621
(2012). The effectiveness of training to improve person perception accuracy: A meta-analysis. Basic and Applied Social Psychology, 34(6), 483–498. https://doi.org/10.1080/01973533.2012.728122
(2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80, 1–28. https://doi.org/10.18637/jss.v080.i01
(2020). Pathological personality traits and the experience of daily situations. Clinical Psychological Science, 8(2), 333–342. https://doi.org/10.1177/2167702619894902
(2011). Misery has more company than people think: Underestimating the prevalence of others’ negative emotions. Personality and Social Psychology Bulletin, 37(1), 120–135. https://doi.org/10.1177/0146167210390822
(2019). Self–other agreement in personality reports: A meta-analytic comparison of self-and informant-report means. Psychological Science, 30(1), 129–138. https://doi.org/10.1177/0956797618810000
(2024, August 18). Instructional sets for situation characteristic ratings [Data, Output]. https://osf.io/pt2a7
(2021a). The dynamics, processes, mechanisms, and functioning of personality: An overview of the field. British Journal of Psychology, 112(1), 1–51. https://doi.org/10.1111/bjop.12486
(2021b). The situation during the COVID-19 pandemic: A snapshot in Germany. PLoS One, 16(2), Article
(e0245719 . https://doi.org/10.1371/journal.pone.02457192024). Distinguishing four types of Person × Situation interactions: An integrative framework and empirical examination. Journal of Personality and Social Psychology, 126(2), 282–311. https://doi.org/10.1037/pspp0000473
(2024). Effects of reference group instructions on Big Five trait scores. Assessment, 31(3), 669–677. https://doi.org/10.1177/10731911231175850
(2007). Situational judgment tests, response instructions, and validity: A meta‐analysis. Personnel Psychology, 60(1), 63–91. https://doi.org/10.1111/j.1744-6570.2007.00065.x
(2016). Measuring the situational eight DIAMONDS characteristics of situations. European Journal of Psychological Assessment, 32(2), 155–164. https://doi.org/10.1027/1015-5759/a000246
(2020). The situation of situation research: Knowns and unknowns. Current Directions in Psychological Science, 29(5), 473–480. https://doi.org/10.1177/0963721420925546
(2021).
(Conceptualizing and measuring the psychological situation . In D. WoodS. J. ReadP. D. HarmsA. SlaughterEds., Measuring and modeling persons and situations (pp. 427–463). Academic Press. https://doi.org/10.1016/B978-0-12-819200-9.00009-02014). The Situational Eight DIAMONDS: A taxonomy of major dimensions of situation characteristics. Journal of Personality and Social Psychology, 107(4), 677–718. https://doi.org/10.1037/a0037250
(2015). Principles of situation research: Towards a better understanding of psychological situations. European Journal of Personality, 29(3), 363–381. https://doi.org/10.1002/per.1994
(2015). Personality-driven situation experience, contact, and construal: How people’s personality traits predict characteristics of their situations in daily life. Journal of Research in Personality, 55, 98–111. https://doi.org/10.1016/j.jrp.2015.02.003
(2016). Knowledge of identity and reputation: Do people have knowledge of others’ perceptions? Journal of Personality and Social Psychology, 111(3), 341–366. https://doi.org/10.1037/pspi0000061
(2017). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143. https://doi.org/10.1037/pspp0000096
(