Skip to main content
Open AccessReplication

What Does It Take to Activate Stereotypes? Simple Primes Don’t Seem Enough

A Replication of Stereotype Activation (Banaji & Hardin, 1996; Blair & Banaji, 1996)

Published Online:


According to social cognition textbooks, stereotypes are activated automatically if appropriate categorical cues are processed. Although many studies have tested effects of activated stereotypes on behavior, few have tested the process of stereotype activation. Blair and Banaji (1996) demonstrated that subjects were faster to categorize first names as male or female if those were preceded by gender congruent attribute primes. The same, albeit smaller, effects emerged in a semantic priming design ruling out response priming by Banaji and Hardin (1996). We sought to replicate these important effects. Mirroring Blair and Banaji (1996) we found strong priming effects as long as response priming was possible. However, unlike Banaji and Hardin (1996), we did not find any evidence for automatic stereotype activation, when response priming was ruled out. Our findings suggest that automatic stereotype activation is not a reliable and global phenomenon but is restricted to more specific conditions.

The idea that stereotypes are activated automatically upon encountering a member of a social category is taken for granted by many researchers and has also found its way into standard Social Psychology textbooks (Fiske, 1998; Schneider, 2004). This notion is by no means new, as Gilbert and Hixon (1991) point out:

Many theorists have assumed that the activation of a stereotype is an automatic and inevitable consequence of encountering the object of that stereotype. For instance, Allport (1954, p. 21) argued that “Every event has certain marks that serve as a cue to bring the category of prejudgment into action…. A person with dark brown skin will activate whatever concept of Negro is dominant in our mind.” (p. 509)

More specifically, activation of stereotypes and their subsequent impact on judgment and behavior can be conceptualized as a three-step process (Fiske, 1998; Moskowitz, Li, & Kirk, 2004; Schneider, 2004). First, a person is categorized as a member of a social group. Second, traits associated with this category are activated. And, third, judgment of and behavior toward the target person are influenced by these activated traits. While not immune to moderating factors (Blair, 2002) this sequence is thought to proceed in an automatic fashion largely beyond control (Bargh, Chen, & Burrows, 1996; Devine, 1989).

Indeed a considerable number of studies have shown effects of stereotype activation on attitudes and evaluations (Banaji & Greenwald, 1995; Banaji, Hardin, & Rothman, 1993; Devine, 1989; Dovidio, Evans, & Tyler, 1986; Lepore & Brown, 1997), thus addressing the final phase of this three-step model.

However, only few studies investigated the second step, the very process of stereotype activation itself. Gilbert and Hixon (1991) found an increase in participants’ stereotype congruent word completions after perceiving a member of a stereotyped category. However, the considerable delay between category activation and word completion does not rule out controlled processes (Wentura & Rothermund, in press). Perdue and Gurtman (1990) found that participants were faster evaluating negative trait words if those were preceded by an old prime (vs. a young prime), whereas they were faster evaluating positive trait words if those were preceded by a young prime (vs. old prime). While this might be taken as evidence for an automatic activation of stereotypic traits, the results can also be explained by response priming based on the valence dimension inherent in both primes and targets (Wentura & Degner, 2010).

Finally, in frequently cited studies, Blair and Banaji (1996) and Banaji and Hardin (1996) investigated activation of stereotypes. Blair and Banaji (1996) found that first name targets were categorized faster as being male or female if those were preceded by gender congruent attribute primes (Exps. 1 & 2). In a similar vein, Banaji and Hardin (1996) showed that participants were faster to categorize pronouns as male or female if they were preceded by gender congruent primes (Exp. 1). This effect held even if response priming was ruled out (Exp. 2).

While these influential studies are among the few that actually investigated the very process of stereotype activation, a few issues deserve mentioning. First, priming effects in Blair and Banaji (1996) were strongest for “non-trait” words; that is, attributes related to gender by designating typically male or female activities, roles, objects, or professions (e.g., ballet, master, flowers, mechanic). Priming effects were not significant for stereotypic trait attributes (e.g., courageous, logical, sentimental, warm).

Second, the priming effects in Blair and Banaji (1996) are prone to an alternative explanation in terms of response priming (Wentura & Degner, 2010). Thus, congruency effects in a gender categorization task can be explained by the fact that both primes and targets can be categorized as typically male or female, leading to response facilitation or interference in case of congruent or incongruent prime/target pairs, without having to assume an automatic spreading of activation from primes to targets. This issue was addressed in Banaji and Hardin (1996, Exp. 2). Participants were asked to categorize targets as pronouns or non-pronouns, which rules out response priming as an explanation. Only small effects were found, with priming effects present only for attribute primes related to gender by definition (e.g., mother, father), but not for attribute primes related to gender by normative base-rates (e.g., secretary, doctor). Considering the findings by Blair and Banaji (1996) it appears reasonable to assume that trait attributes are even less likely to produce these effects.

Given the implications and the importance of the discussed studies, it appears paramount to test (a) whether category activation via attribute priming can be replicated (Blair & Banaji, 1996), (b) whether those effects encompass attribute primes related to gender not only by definition, and (c) whether priming effects still hold if response priming is ruled out (Banaji & Hardin, 1996).

To accomplish these goals we conducted a study consisting of two experiments. The aim of the first experiment was a direct replication of the gender congruency effects reported by Blair and Banaji (1996). The aim of the second experiment was a replication of the gender congruency effects if response priming is ruled out (Banaji & Hardin, 1996). Here the gender categorization task was replaced by a name vs. town categorization task. This semantic priming design rules out response priming, a replication of Banaji and Hardin (1996, Exp. 2). This condition provides the crucial test of the assumed automatic gender stereotype activation effect in the absence of response priming. Both experiments were combined in a within-subjects design, with order of experiments counterbalanced across participants.


The goal of the current study was to replicate the finding that activation of stereotypically male or female attributes facilitates processing of targets denoting category membership (Banaji & Hardin, 1996; Blair & Banaji, 1996). To disentangle stereotype activation from response priming effects, each participant completed two tasks. A gender classification task (male vs. female names) corresponding to the Banaji and Hardin (1996, Exps. 1 & 2) study, and a semantic classification task (name vs. town) that was orthogonal to gender, corresponding to the semantic priming design used by Banaji and Hardin (1996, Exp. 2). The variation in the target task constitutes the within-subject factor Task Type (gender categorization vs. semantic catgorization). The complete study was implemented using the Psychopy software package1 (Peirce, 2007, 2009) and run on standard PC hardware (Microsoft Windows XP) connected to a 17″ CRT monitor displaying XGA resolution at 85 Hz.



In order to detect Prime × Target interaction effects from Banaji and Hardin (1996) and Blair and Banaji (1996) with a power (1 − β) of .95 a sample of N = 135 was needed for each experiment.2 To ensure sufficient sample size in case that order effects emerge (rendering only the experiment run first for each participant available for analyses) and to guard against dropout a total of 300 participants were recruited. Six participants who did not finish the experiment were excluded from further analyses, resulting in a final sample size of N = 294, 51% female, age: M(SD) = 23.84 (5.9), range: 18–55.


Upon arrival at the laboratory participants were seated at a computer in individual, soundproof cubicles. They were told to follow the instructions provided to them on the computer screen and to contact the experimenter if they had any questions. Participants learned that they were going to work on two different tasks, each requiring a binary categorization of words via button press. A detailed description was provided at the beginning of each experimental task, followed by a short practice block (eight trials) to familiarize participants with the upcoming task. To ensure fast and correct responses the practice block was repeated if participants’ mean reaction time exceeded 1,000 ms or if their accuracy was below 80%. To guard against order effects, sequence of tasks was counterbalanced, constituting the between-subject factor Experiment Order (gender categorization first vs. semantic categorization first). Additionally, key assignment in both tasks was counterbalanced. Both tasks differed only with respect to the employed target stimuli and the required response.

Gender Categorization Task (GCT)

After a fixation cross (500 ms) participants were presented with a stereotypically male or female attribute prime (200 ms). After an Inter-Stimulus-Interval (ISI) of 100 ms (blank screen; SOA = 300 ms), a male or female first name was presented as target stimulus until a response was registered. Participants had to indicate as quickly and accurately as possible whether the name was female or male by pressing the assigned button on the keyboard, with button assignment counterbalanced across participants.

Semantic Categorization Task (SCT)

The procedure was identical to the GCT, with the following exception: After the ISI, either a first name (25% male, 25% female) or the name of a well-known city (50%) was presented as a target stimulus until a response was registered. Participants had to indicate as quickly and accurately as possible whether the target was the name of a person (regardless of gender) or the name of a town by pressing the assigned button on the keyboard. Again, button assignment was counterbalanced across participants.



A total of 62 prime words were used, one half of them being stereotypically male, the other half being stereotypically female. Of those 54 comprised male or female stereotypes counterbalanced on valence (positive, neutral, negative) and word type (noun, verb, adjective), with three exemplars representing each combination. The remaining eight primes were related to gender by definition and were adapted from Banaji and Hardin (1996). Their inclusion offers an additional test whether our procedure in general is suited to detect priming effects (see Table 1 ).

Table 1. Overview of gender-related words used as primes


In the GCT, a total of 62 first names were used, 50% male and 50% female. Care was taken to only select names that are easily and unambiguously recognized as male or female. In the SCT, an additional 62 city names were employed (see Table 2 ).

Table 2. Overview of first names and city names uses as targets


In the GCT, each prime was randomly paired with a male and a female target name, yielding a total of 62 primes × 2 names = 124 trials. In the SCT, each prime was additionally paired with two city names, yielding a total of 248 trials.

Known Differences From Original Studies

In the current experiments we opted to exclude control or nonword primes, for two reasons: First, those primes are not essential for the focal research question, in fact those trials are not part of any analyses of interest. Second, being able to use all completed trials for the analysis increases the power to detect even small effects.

Also we employed a new set of prime and target stimuli. This was mainly because the current experiments were conducted in Germany and the original studies are now 17 years old. Thus it was questionable whether our subjects would endorse the original stereotypic stimuli to the same extent as the original subjects. As in Blair and Banaji (1996), attributes were balanced with regard to valence, with an additional set of attribute stimuli of neutral valence. Also we systematically varied the word type of the attributes to estimate whether effects vary depend on the type of attributes (adjectives = personality traits, nouns and verbs refer to behavior and appearance).

Finally, we opted to employ the aforementioned name vs. town classification task instead of the original pronoun versus non-pronoun classification (Banaji & Hardin, 1996). This is because the latter allows only a limited set of different target stimuli (he, she, it, me vs. is, do, as, all in Banaji & Hardin, 1996) resulting in a huge number of target repetitions during the task. Employing the name versus town classification allowed us to eliminate target repetitions completely.


While all trials from the GCT entered into the analyses, only trials featuring a first name target in the SCT could be analyzed with regard to stereotype congruency. Thus, each experiment yielded a total of 124 trials for the analyses. RTs from trials with incorrect responses (4.8%) or exceeding the third quartile of the respective intraindividual distribution by more than 1.5 interquartile ranges (4.3%; outlier values according to Tukey, 1977) were removed from the analyses.

Including the control factors Experiment Order and Participant Gender did not reveal any higher order interactions with the priming factors (all p’s > .20). We thus report findings in which we aggregate across conditions for these balancing factors. Average RTs within conditions were subjected to a 2 (Prime Gender: male vs. female) × 2 (Target Gender: male vs. female) × 2 (Experiment Type: GCT vs. SCT) repeated measures ANOVA, yielding a significant Prime Gender × Target Gender interaction, F(1, 293) = 39.68, p = 1.09 × 10−9, ηp2 = 0.12, that was further qualified by the three-way interaction of Prime Gender × Target Gender × Experiment Type, F(1, 293) = 25.74, p = 6.95 × 10−7, ηp2 = 0.08. Inspection of Figure 1 reveals that responses to targets are facilitated by gender congruent primes only in the GCT but not in the SCT. Conducting separate ANOVAs for each experimental task confirmed a significant Prime Gender × Target Gender interaction in the GCT, F(1, 293) = 75.54, p = 2.59 × 10−16, ηp2 = 0.20, indicating that responses to male and female names were faster after stereotypically congruent primes (M = 551 ms) than after stereotypically incongruent primes (M = 564 ms). No such congruency effect was obtained for the SCT, F(1, 293) = 1.03, p = .31, ηp2 = 0.003, indicating that prime stereotypicality did not affect responding to male and female names in a semantic categorization task unrelated to gender.

Figure 1. Reaction Times depending on Prime Gender and Target Gender. Priming effects were only found in the response priming condition (GCT).

To ensure the robustness of our results, analyses were repeated using log-transformed values (after removal of outliers as mentioned above) and by using values that were standardized on participants individual standard deviation of response latencies (i.e., using the D measure, see Greenwald, Nosek, & Banaji, 2003, p. 201). The same pattern of findings emerged for these analyses as well (for details consult supplementary materials).

Effects of Prime Valence

Including the factor Valence (negative, neutral, positive) revealed no interactions of Valence with stereotype priming, all p’s > .13 (Greenhouse Geisser corrected).

Effects of Word Type

Including the factor Word Type (noun, verb, adjective) revealed a significant four-way interaction of all factors, F(2, 586) = 3.65, p = 0.03, ηp2 = 0.01. Separate ANOVA’s for each experimental task confirmed a significant Prime Gender × Target Gender × Word Type interaction in the GCT, F(2, 586) = 5.21, p = .006, ηp2 = 0.02, but not in the SCT, F(2, 586) = 0.36, p = .69 (Greenhouse Geisser corrected). ANOVAs conducted for each value of Word Type in the GCT yielded a more pronounced Prime Gender × Target Gender interaction for nouns, F(1, 293) = 48.25, p = 2.43 × 10−11, ηp2 = .14, versus verbs, F(1, 293) = 9.49, p = .002, ηp2 = .03, and adjectives, F(1, 293) = 15.64, p = 9.61 × 10−5, ηp2 = .05.

Effects of Stereotype Relation

Including the factor Stereotype Relation (stereotypically related primes vs. gender-defining primes) revealed a significant four-way interaction of all factors, F(1, 293) = 15.2, p = 1.19 × 10−4, ηp2 = 0.05. Separate ANOVAs for each experimental task confirmed a significant Prime Gender × Target Gender × Word Type interaction in the GCT, F(1, 293) = 24.97, p = 1 × 10−6, ηp2 = 0.08, but not in the SCT, F(1, 293) = 0.59, p = .44. Finally, ANOVAs conducted for each value of Stereotype Relation within the GCT revealed a more pronounced Prime Gender × Target Gender interaction for primes related to gender by definition, F(1, 293) = 67.0, p = 8.42 × 10−15, ηp2 = 0.19, compared to primes that were part of a gender stereotype, F(1, 147) = 50.41, p = 9.43 × 10−12, ηp2 = 0.15.


The current study sought to replicate the finding that activation of stereotypically male or female attributes facilitates processing of targets denoting category membership (Banaji & Hardin, 1996; Blair & Banaji, 1996). The response priming paradigm in a gender classification task successfully replicated the effects reported by Blair and Banaji (1996). However, these effects are readily explained by a response priming mechanism and thus do not provide strong support for claims of an automatic stereotype activation. If automatic stereotype activation does take place effects should also be obtained with a semantic priming paradigm. In contrast to the findings reported by Banaji and Hardin (1996), our semantic priming paradigm (Semantic categorization task: names vs. cities) revealed no trace of stereotype congruency effects (see Figure 2 for a comparison of the observed effects; also Table 3 ).

Figure 2. Priming Effects compared across studies depending on type of priming.
Table 3. Mean (SD) of reaction time (ms) for each condition separately for each experiment (N = 294)

Because our experiment had sufficient power (1 – β > .95) to detect effects of the magnitude reported by Banaji and Hardin (1996), this failure to replicate the original results cannot be attributed easily to insufficient sample size. These results might be less surprising though if one considers that Banaji and Hardin (1996) also found only small priming effects and these effects were limited to those words that relate to gender by definition (e.g., mother, king, sister). This is reminiscent of similar findings from the domain of affective priming: Affective congruency effects typically do not emerge in semantic priming designs if effects of response priming are controlled (e.g., De Houwer, Hermans, Rothermund, & Wentura, 2002; Eder, Leuthold, Rothermund, & Schweinberger, 2012; Klauer & Musch, 2002; Klinger, Burton, & Pitts, 2000; Voss, Rothermund, Gast, & Wentura, 2013; Werner & Rothermund, 2013).

However, the failure of finding global semantic activation effects for stereotypic primes does not rule out the possibility that stereotypes can be activated automatically under more specific conditions. Recent experiments using semantic priming paradigms support the notion that the activation of specific subsets of stereotypic attributes does take place if category and context information are combined in a compound prime (Casper, Rothermund, & Wentura, 2010, 2011). This context-dependent activation of stereotypes also was confirmed in studies on self-stereotyping (Casper & Rothermund, 2012) and on the activation of stereotype-related behaviors (Müller & Rothermund, 2012).

1Version 1.75.01,

2Calculation of the necessary sample size was based on the smallest effect that was reported in the original studies, which was found for the Prime Gender × Target Gender interaction in Banaji and Hardin (1996, Exp. 2) with F(1, 56) = 4.63. The sample size that is necessary to detect an effect of this size with α = β = .05 was computed with G*Power (Faul, Erdfelder, Lang, & Buchner, 2007).


We report all data exclusions, manipulations, and measures, and how we determined our sample sizes. The research reported in this article was supported by Grant No. 2013-011 from the Center for Open Science. The authors declare no conflict-of-interest with the content of this article. Designed research: F. M., K. R.; Performed research: F. M.; Analyzed data: F. M., K. R.; Wrote paper: F. M., K. R. All materials, data, video of the procedure, and the preregistered design are available at

Florian Müller, Friedrich-Schiller-Universität Jena, Institut für Psychologie, Am Steiger 3/Haus 1, 07743 Jena, Germany, +49 3641 945123,