Skip to main content
Free Access

Does Object Size Matter With Regard to the Mental Simulation of Object Orientation?

Published Online:


Abstract. Language comprehenders have been arguing to mentally represent the implied orientation of objects. However, compared to the effects of shape, size, and color, the effect of orientation is rather small. We examined a potential explanation for the relatively low magnitude of the orientation effect: Object size moderates the orientation effect. Theoretical considerations led us to predict a smaller orientation effect for small objects than for large objects in a sentence–picture verification task. We furthermore investigated whether this pattern generalizes across languages (Chinese, Dutch, and English) and tasks (picture-naming task). The results of the verification task show an orientation effect overall, which is not moderated by object size (contrary to our hypothesis) and language (consistent with our hypothesis). Meanwhile, the preregistered picture–picture verification task showed the predicted interaction between object size and orientation effect. We conducted exploratory analyses to address additional questions.

Theories of mental simulation propose that language comprehension involves the reactivation of perceptual experiences (e.g., Barsalou, 1999). In the past few decades, these theories have acquired empirical evidence from a number of tasks, including the sentence–picture verification task. Studies using this task have found that reading a probe sentence that implies a particular perceptual feature facilitates verification responses for pictures representing the target objects with that particular feature (Connell, 2007; de Koning, Wassenburg, Bos, & van der Schoot, 2017a; Stanfield & Zwaan, 2001; Zwaan & Madden, 2005; Zwaan & Pecher, 2012; Zwaan, Stanfield, & Yaxley, 2002).

Consider Object Shape

The probe sentence “He saw the eagle in the sky” implies the shape of a flying eagle, whereas the sentence “He saw the eagle in the nest” implies the shape of a perched eagle. Importantly, these sentences do not explicitly state that the eagle was flying or perched, respectively. The shape can be inferred from the eagle’s location.

Moreover, the task was to indicate if the depicted entity was mentioned in the sentence, whereby a sentence could be followed by a picture that matched the shape implied by the sentence (e.g., a picture of a flying eagle after the sky sentence) or by a picture that mismatched the shape implied by the sentence (e.g., a picture of a flying eagle after the nest sentence), which means that yes responses were required irrespective of the picture. Nevertheless, responses were faster to shape-matching pictures than to mismatching ones (Zwaan et al., 2002). This match advantage is consistent with the idea of mental simulation, arguing that the sentence could reactivate the particular perceptual experiences during language comprehension.

Comparisons of Match Advantages

Besides shape, three other perceptual features (orientation, color, and size) have been investigated in the literature. Orientation has shown the weakest effects compared to the other features. For object color, the initial findings suggested a mismatch advantage (Connell, 2005, 2007). However, using the same materials, Zwaan and Pecher (2012) obtained a match advantage of object color in an experiment with a larger number of participants. Compared to Connell, who tested 40–60 participants, Zwaan and Pecher recruited 152 participants in their online study. Recently, the match advantage of color was confirmed in two separate laboratory studies using different sets of materials (de Koning et al., 2017a; Hoeben Mannaert, Dijkstra, & Zwaan, 2017).

Two lines of studies have investigated mental simulation of object size. The first line of research used probe sentences that described the distance between the reference point and the object (Vukovic & Williams, 2015; Winter & Bergen, 2012). These probe sentences implied far distance and near distance, and the size of picture presentations was respectively small and large. Winter and Bergen’s probe sentences implied the absolute distance between the observer and the object (e.g., “... the milk bottle in the fridge” vs. “... the milk bottle on the end of the counter”) and found a match advantage, whereas Vukovic and Williams’ probe sentences implied the relative distance between the observer and the object (e.g., “In front of you, ...” vs. “In the distance, ...”) and found a mismatch advantage. The second line of research manipulated the physical appearance of the object (de Koning, Wassenburg, Bos, & van der Schoot, 2017b).

The target pictures of one object had the same appearance but differed in size. The probe sentence implied a large object, such as “...the bone of a dinosaur,” or a small object such as in “...the bone of a rabbit.” Consistent with the study by Winter and Bergen, de Koning et al. found a match advantage of object size. Thus, the studies of object size show a robust match advantage for sentences that imply the absolute object size such as the physical appearance and the absolute distance.

The match advantages for orientation were smaller than those for the other perceptual features, and the findings were less consistent across studies, although it is unclear how these differences have come about (de Koning et al., 2017a). Stanfield and Zwaan (2001) reported the first finding about the match advantage of object orientation. Their study used sentences that implied horizontal and vertical orientations of objects. For example, the sentence “Frank placed the iron onto the shelf, hoping he wouldn’t be late” implies a vertically oriented iron, whereas the sentence “Frank pressed the iron onto his pants, hoping he wouldn’t be late” implies a horizontal iron. In use of materials such as this, Stanfield and Zwaan found a 44-ms match advantage of orientation in their laboratory study involving 40 participants.

Subsequent studies have investigated object orientation with slightly modified designs and obtained an inconsistent pattern of results. Using identical materials as Stanfield and Zwaan (2001) and a larger number of participants recruited on the internet (n = 336), Zwaan and Pecher (2012) obtained a roughly equal match advantage (35 ms). Using a memory task and Dutch materials, Pecher, van Dantzig, Zwaan, and Zeelenberg (2009) found match advantages for object shapes and orientations. However, other studies failed to obtain significant match advantages for object orientation. With the same Dutch materials as in Pecher et al. (2009), Rommers, Meyer, and Huettig (2013) found a nonsignificant 1-ms match advantage of object orientation in their study of sentence–picture verification task.

Recently, de Koning et al. (2017a) also used Dutch sentences and also failed to find a significant orientation match advantage with a different set of materials: Participants were only 7 ms faster on matching items than on mismatching items.

Studies involving primary school children (8–12 years old) also showed inconsistent findings between object shape and orientation. Similar to Rommers et al. (2013) and de Koning et al. (2017a), Engelen, Bouwmeester, de Bruin, and Zwaan (2011) investigated the match advantages of object orientation and shape by intermixing orientation trials and shape trials. Engelen et al. showed an averaged match advantage of 74 ms, but the intermixed-trial design made it difficult to infer whether the match advantage was due to the orientation trials or the shape trials.

Two things set the above-mentioned studies (e.g., Engelen et al., 2011; de Koning et al., 2017a) apart from the original study and its replications. First, the latter did not employ a task that kept participants focused on the meaning of the sentences. Stanfield and Zwaan (2001) had participants recall the target sentence after a certain number of trials, while Zwaan and Pecher (2012) used a sentence comprehension test. It is thus possible that the small effect sizes in the other studies are due to the lack of a task focusing on sentence comprehension. This is important because the effect is assumed to occur as a result of sentence comprehension. If participants are not prompted to comprehend the sentences, a reduction of effect size is plausible (Zwaan, 2014).

The second difference between these studies, on the one hand, and the studies by Stanfield and Zwaan (2001) and Zwaan and Pecher (2012), on the other hand, is that the former studies presented the sentences in Dutch, whereas the original study and its direct replications presented the stimuli in English.

These differences among orientation studies notwithstanding, the question remains why orientation yields a smaller effect than shape, size, or color on the same task used.

Consistent with the summary of Zwaan and Pecher (2012), Cohen’s d of orientation (0.13) is less than half of that of shape (0.31). de Koning et al. (2017a) also indicated that, using a direct comparison in a within-subjects design, the smallest effect sizes were related to object orientation (0.07) and size (0.07) compared to color (0.48) and shape (0.27).

We hypothesize that the relatively small effect of orientation is due to the nature of the objects being used in the orientation experiments. All of these studies used the visual stimuli from the original study (Stanfield & Zwaan, 2001) or stimuli that were based on or Dutch translations from the original ones. A common feature of the objects described and depicted in the stimuli used in these studies is that they can be manipulated by a single hand. Most (but not all) of these stimuli represent easily manipulable objects such as hair brush and pencil. Thus, for these items, the critical visual feature (orientation) can be changed by a simple manipulation during real-world action. This stands in contrast with the objects used in the shape (e.g., perched vs. flying eagle), color (e.g., red vs. green stoplight), and size (e.g., dinosaur vs. rabbit bone) experiments, where such a featural change by manipulation is impossible.

That the past orientation studies employed small objects are relevant for at least two reasons. First, in the real world, such small objects are seen as constantly changing orientation. Take for instance a pen, which is frequently seen lying flat on a desk, standing upright in a pen holder or being held by someone writing a note, or even being twiddled idly in someone’s hand. This means that we have a great deal of experience seeing small objects rapidly change from one orientation to the next and back. Nonmanipulable objects or those that can only be manipulated by two hands usually are not observed changing orientations in close temporal succession (e.g., a street lantern is often seen standing upright, unless it is being transported on a truck to a new location but does not rapidly change orientation). As a result, our visual experience with small manipulable objects is different from that with larger objects that are more difficult or impossible to manipulate.

Second, for objects that are manipulable with one hand, it is relatively easy to obtain a different orientation by either physically or mentally transforming the object to another orientation. There is behavioral and neuroimaging evidence that visual mental rotation and manual rotation rely on overlapping neural substrates (e.g., Parsons et al., 1995; Wexler, Kosslyn, & Berthoz, 1998; Windischberger, Lamm, Bauer, & Moser, 2003; Wohlschläger & Wohlschläger, 1998). It might, therefore, be hypothesized that visual mental rotation is facilitated by motor mental rotation. As a result, participants should be able to quickly recover from seeing a mismatching picture. This would reduce the advantage of a matching picture. For example, a pen can easily be turned from a vertical to a horizontal position with the hand or one can imagine doing so. These lines of thinking lead us to predict a shorter simulation time for the orientation of small objects compared to large objects.

Exploration of a Mental Rotation Account

A second aim of the current study is to explore to what extent mental rotation can account for not finding a match advantage of object orientation. de Koning et al. (2017a) suggested that mental rotation as an alternative process to mental simulation could quickly erase the mismatched orientation, replacing it with the orientation that matches the one described in the sentence (Cohen & Kubovy, 1993; Yaxley & Zwaan, 2007). However, so far, this suggestion is speculative at best given that de Koning et al. did not directly test this in their study and their findings barely supported their assumption because they had only the results in terms of the small and manipulable objects. To put the suggestion of de Koning et al. to the test, we draw upon mental rotation research.

Among the various mental rotation paradigms (Cohen & Kubovy, 1993; Shepard & Metzler, 1971; Zwaan & Taylor, 2006), the reaction times to verify whether two figures presented in a variety of different orientations are the same or different have been typically used to measure mental rotation. In Study 2 reported in this paper, we used a version of this standard mental rotation paradigm (Shepard & Metzler, 1971) and employed the real-world objects used in the sentence–picture verification task as our stimuli. The task was thus a picture–picture verification task in which participants verified whether the two simultaneously side-by-side presented pictures were the same or different in a same orientation (horizontal–horizontal; vertical–vertical) or in a different orientation (horizontal–vertical; vertical–horizontal). If the account of mental rotation is plausible, we expect that the large and nonmanipulable objects would require more rotation time than the small and manipulable objects. Empirical support for this hypothesis would provide converging evidence for the hypothesis that mental simulation effects will be smaller for small items than for large items.

The Present Study

The present study aims to extend prior research on the mental simulation of object orientation during language comprehension. In three experiments, we tested the preregistered predictions outlined above using the sentence–picture verification task and the picture–picture verification task. We tested these predictions in three languages: English, Dutch, and Chinese, whereas English (Stanfield & Zwaan, 2001; Zwaan & Pecher, 2012) and Dutch (de Koning et al., 2017a, 2017b; Rommers et al., 2013) have been studied before.

However, while the shape effect has been replicated and extended in Chinese (Gao & Jiang, 2018), we know of no research on the orientation effect in Chinese. Thus, including Chinese allows us to further investigate the generalizability of the orientation effect across languages.

In Study 1, we investigated whether verification times of pictures depicting horizontal and vertical objects in the sentence–picture verification task were shorter for large objects than for small objects and whether this was consistent across the three languages. In Study 2, we investigated the mental rotation account using a picture–picture verification task and tested whether verifying if two pictures match in orientation produce a larger match advantage for the large than for the small objects (again using speakers of the three languages). In response to the suggestion of an anonymous reviewer of the preregistration, in Study 3, we used a sentence–picture naming task, similar to Study 1, where participants had to make a vocal response. A picture-naming task was used as it provides a stronger test of the mental simulation hypotheses than the verification task in that it does not call for a comparison between the sentence and the picture.

General Method

To test the plausibility of our hypothesis, we constructed a stimulus set including small manipulable objects (e.g., pen) as well as large nonmanipulable objects (e.g., a boat or a missile) and tested it in an initial study. In this initial study, we examined the match advantage for orientation across large and small objects and between three languages: English, Dutch, and Chinese (see the data and the summary in Electronic Supplementary Material, ESM 1 and 2). Effects for orientation were found in English but not in Dutch. As per our prediction, the results of a meta-analysis on this initial study showed a significant match advantage for large objects but no match advantage for small objects. In the present study, we built on this initial study regarding the design, materials, and experimental procedure.

Experimental Procedures

Three groups of native speakers (English, Dutch, English) participated in the sentence–picture verification task (Study 1) and the picture–picture verification task (Study 2) in a single experimental session. The sentence–picture naming task (Study 3) took place in a separate experimental session and involved participants from two groups of native speakers (English and Chinese). We used the same design and materials that were employed in the sentence–picture verification task and the picture–picture verification task in our initial study, except that we revised the probe sentences to eliminate the orientation implications of Dutch verbs. Specifically, the initial study yielded two unforeseen issues. First, the Dutch participants surprisingly showed a marginal significant match advantage for small objects. Closer inspection of the items in it exhibited that some of the Dutch sentences contained verbs implying a particular object orientation. Orientational verbs such as liggen (to lie) and staan (to stand) provide an additional, and more explicit, clue about the object’s orientation than does the object’s location. Although the use of these verbs is natural in Dutch (in fact more natural than using the less specific is), their use in the present experiment undermines the original goal of letting the orientation of the target object be determined by its location (described in a prepositional phrase). It stands to reason that providing an additional orientation cue would increase the size of the match advantage. We addressed this issue in the present study by replacing the orientation verbs with orientation-neutral verbs such as is.

In the sentence–picture verification task, each experimental session began after six practice trials. A trial started with a left-justified and vertically centered fixation point for 1,000 ms, which was immediately followed by the probe sentence, presented in the same location as the fixation point. Participants pressed the space bar when they had read the sentence. Immediately thereafter, a horizontally and vertically centered fixation point appeared for 500 ms, after which the target picture was presented. Participants pressed the j key if they thought the depicted object was mentioned in the preceding sentence or the f key if they thought the object was not mentioned in the sentence. They were instructed to verify the target picture as quickly and accurately as possible.

The picture–picture verification task was based on the sentence–picture verification task but did not include the test sentences. There were two further exceptions. First, there was only one horizontally and vertically centered fixation point that appeared before the target pictures. Second, the two target pictures appeared next to the fixation point (on both sides) until a response was made or until 2 s had passed.

The picture-naming task was identical to the sentence–picture verification task except for the mode of responding. Instead of pressing a key on the keyboard, the participant read the object name aloud as quickly as possible within the span of 3 s after the object picture had been presented. Upon completion of the recording, an evaluation screen appeared that presented the object picture and required participants to evaluate their response by using one of four options: right, wrong, no response, and recording failed. We managed all the tasks in (Anwyl-Irvine, Massonnié, Flitton, Kirkham, & Evershed, 2019). All the materials are available in Gorilla open materials (see the public link in Appendix A).

Material and Design

A total of 128 pairs of sentences and gray-colored pictures (scaled in 240 × 240 pixels) were used. Sixty-four pairs were the critical items, and the other 64 sentence–picture pairs were fillers. Sixteen pictures of large objects were obtained from various sources, and the pictures of 16 small objects and 64 fillers were obtained from the standardized stimuli pools (Bonin, Peereman, Malardier, Méot, & Chalard, 2003; Brodeur, Guérard, & Bouras, 2014) and from the internet. The fillers consisted of a sentence accompanied by an unrelated picture. The critical items were created by crossing three within-participant variables: size (large vs. small), orientation (horizontal vs. vertical), and match (matching vs. mismatching of probe sentence and target picture). Of the 64 critical items, 32 described large objects and the other 32 described small objects. Object size was determined according to whether the object can be manipulated by a single hand (small) or not and only able to by heavy machinery or force of nature (large). The object terms were the grammatical subject of every critical sentence. The critical sentences implied the object orientations in the form of a prepositional phrase at the end of the sentence. For example, “The pen is on the table” implies a horizontal orientation of the pen, and “The pen is in the container” implies a vertical orientation; “The missile was flying over the sea” implies a horizontal orientation, and “The missile was launched from the submarine” implies a vertical orientation. The sentences were written in Chinese, Dutch, and English, respectively. For each critical item, one target picture presenting the horizontal or vertical orientation matched one of the two critical sentences but mismatched the other sentence. For reasons of counterbalancing, we created two stimulus lists wherein each list contained two of the four sentence–picture combinations. Each participant was randomly exposed to only a single horizontal or a vertical picture of a specific target object.

The picture–picture verification task included two within-participant variables: object size (large vs. small) and match (identical vs. different). All the target objects were presented in the default orientation and had two companions. One companion was the picture presented in the identical orientation, and the other was the picture presented in a different orientation. Each target object was presented twice with its companions in this task. In order to balance yes–no responses, 64 filler items involving pairs of different objects were selected from the stimuli set with the same object size. A total of 256 trials were presented in randomized sequences for the participants.

Study 1: Sentence–Picture Verification

The first aim of this study was to test the hypothesis that the match advantage for orientation items is larger for the large objects than for the small objects. The second aim was to test whether there were similar results across the languages Chinese, Dutch, and English. Earlier studies had found effects for English but not for Dutch stimuli. Chinese had not been tested to our knowledge. We modified some items for our Chinese participants.

Whisk was replaced by chopsticks. Although a whisk is a common kitchen utensil for English and Dutch participants, it is unfamiliar to Chinese participants. Four large objects, such as drawbridge, wine barrel, steel barrel, and bottle, were also deemed unfamiliar to Chinese participants. We revised the Chinese sentences describing these objects to make them comprehensible for Chinese participants.

To foreshadow, with regard to the second hypothesis, the results of the Dutch participants deviated from those of the English and Chinese participants. We suspected that language-specific knowledge might have affected Dutch participants’ performance. Therefore, we collected additional data in a group of Dutch native speakers, but fluent in English, who were presented with the English version of the sentence–picture verification task. This additional data collection followed the preregistered data collection plan but was not preregistered in itself.


Sampling Plan

We used G*Power (Faul, Erdfelder, Buchner, & Lang, 2009) to estimate the expected effect size and sample size based on a one-tailed t-test in terms of 0.05 significance level and 80–90% power. According to the preregistered plan, we conducted a sequential analysis based on Bayes factor estimations along with the data collection (Morey, Rouder, Love, & Marwick, 2015; Schönbrodt, Wagenmakers, Zehetleitner, & Perugini, 2017). We decided to use BF10 = 10 as the criterion to stop data collection for two reasons: (1) the collective evidence (de Koning et al., 2017a; Zwaan & Pecher, 2012) moderately supports the mental simulation of object orientation, BF10 = 20.944; (2) a BF10 between 10 and 30 implies strong evidence (Jeffreys, 1961). For each participant group (i.e., language), we computed a Bayesian t-test on the data of the sentence–picture verification task after each set of 40 participants. If the resulting BF was smaller than 10, we continued data collection with the next 40 participants. Once one of the match advantages had BF10 beyond 10, data collection for that group ended. For each group, data collection continued until BF10 beyond 10 or when the predetermined maximum of 160 participants was reached. The permanent link for access to the preregistered plan is


This study recruited native speakers of English, Chinese, and Dutch. English-speaking participants were recruited via Prolific Academic, Chinese participants from Bounty Workers, and Dutch participants from the Erasmus University Rotterdam psychology participant pool. The age range of English and Chinese participants was 18–40 years, and the age range of Dutch participants was 18–30 years. In January 2019, Prolific Academic had 136 eligible Dutch participants and 344 eligible Chinese participants. The Chinese participants were from various Asian countries where people usually intermix Traditional and Simple Chinese. Taiwaneses are used to read Traditional Chinese. Therefore, we decided to use Bounty Workers ( for recruiting participants as many registered participants are native Taiwanese speakers.

Preanalysis Processing

After data collection, we first removed the data from participants who did not complete the task or had no correct responses for at least one condition from the dataset.

This resulted in removal of 22 (of 104) participants from the English sample, 24 (of 117) participants from the Chinese sample, 8 (of 121) from Dutch sample, and 22 (of 121) participants from the Dutch sample doing the English task.


Overall Accuracy of Verification and Comprehension

Table 1 summarizes the average accuracy on the sentence–picture verification task and the intertrial comprehension task for each language group. A between-participant analysis of variance shows that there was a significant difference among the groups in the verification task, F (3, 459) = 14.04, MSE = 148.36, p < .001, , and in the comprehension task, F (3, 459) = 8.44, MSE = 185.90, p < .001, .

Table 1 Average accuracy (in %) on the sentence–picture verification task and comprehension task (standard errors in parentheses)

Specifically, Dutch participants had a higher accuracy on the sentence–picture verification task than did the Chinese and English participants. Additionally, the Dutch participants’ comprehension responses were more accurate than those of English and Chinese participants. Based on this, we excluded participants in the English and Chinese groups who had accuracy scores <75% and 70%, respectively. For the Dutch group, participants whose accuracy was <80% were excluded. The remaining dataset contained data of 82 English participants, 93 Chinese participants, 113 Dutch participants, and 104 Dutch participants doing the English task.

Sequential Analysis of Verification Response Times

We conducted the preregistered sequential analyses on the response time data of sentence–picture verification by language groups. The results showed that the English and Chinese samples met the preregistered criterion of BF10 > 10 at half the preregistered maximum sample size. Figure 1 illustrates the sequential analysis of each group separately for large and small objects.

Figure 1 Bayesian sequential analysis of sentence–picture verification response times. Chi = Chinese group; E = excluded low-accuracy data; Eng = English group; I = included low-accuracy data; L = large objects; NL-Dut = Dutch group, Dutch study; NL-Eng = Dutch group, English study; S = small objects.

Verification Responses by Conditions

Table 2 summarizes the mean reaction times and accuracy percentages as a function of language group, objects size, and match. In addition to Dutch participants, English and Chinese participants showed match advantages on the mean reaction times. Chinese participants performed worse than the other participants.

Table 2 Averaged reaction times and error percentages (in parentheses) of the sentence–picture verification task

We conducted three sets of statistical analyses as per the preregistered plans. They were (1) a three-way mixed analysis of variance with language group as between-participants factor and object size and match as within-subject factors; (2) a meta-analysis of the match advantage across language groups and object size; and (3) linear mixed-effect models with participants and items as intercepts. Because we ultimately completed two English studies, there were four language groups in each analysis.

Three-Way Mixed ANOVA

The preregistered mixed ANOVA on the verification times showed that, at a significance level of <.05, there were main effects of language group, F (3, 383) = 12.14, MSE = 84, 138.96, p < .001, ; object size F (1, 383) = 216.50, MSE = 6, 053.32, p < .001, ; and match advantage F (1, 383) = 13.40, MSE = 6, 166.75, p < .001, . The main effects of object size and match advantage were consistent with our preregistered prediction. The main effect of language group was also beyond the significance level but requires further exploration in light of the interactions. The interaction between object size and language group shows that Chinese participants required more time to verify the large objects than did the other groups, F (3, 383) = 4.24, MSE = 6, 053.32, p = .006, . There was also a significant interaction between language group and match, F (3, 383) = 2.90, MSE = 6, 166.75, p = .035, . The interaction between object size and match advantage was not significant, F < 1. Also, the three-way interaction of the language group with match and object size was not significant, F (3, 383) = 0.89, MSE = 5, 892.41, p = .448, .

A three-way mixed ANOVA on the accuracy scores showed significant main effects of the language group, F (3, 383) = 20.76, MSE = 3, 621.59, p < .001, , and match F (1, 383) = 63.55, MSE = 951.25, p < .001, . Unlike the response time data, the accuracy scores showed no effect of object size, F (1, 383) = 0.29, MSE = 629.80, p = .592, . Consistent with the verification times, the interaction of object size and language group indicated that the Chinese participants made more errors for the large objects than did the other language groups, F (3, 383) = 8.08, MSE = 629.80, p < .001, . The interaction of language group and match advantage also indicated that the Chinese participants made more errors for the mismatching object orientation, F (3, 383) = 27.63, MSE = 951.25, p < .001, . This analysis indicated no interaction of orientation and match and no three-way interaction, Fs < 1.

Meta-Analysis on the Match Advantage

Following the preregistered analysis plan, we compared the match advantages across language groups. The meta-analysis showed that the English group showed the largest effect size for large objects, M = 29.13, 95% CI [10.48, 47.78], while the Chinese group had the largest effect size for small objects, M = 37.09, 95% CI [11.18, 62.99]. Additionally, as planned, we tested if the effect size of large objects is larger than that of small objects. The meta-analysis showed a match advantage for large objects, M = 16.54, 95% CI [3.1, 29.97], but a null effect for small objects, M = 8.53, 95% CI [−4.2, 21.27]. Figure 2 shows the results of this meta-analysis in a forest plot.

Figure 2 Meta-analysis on the mean differences of match advantages.

The preregistered hypothesis for this study predicted that object size should moderate the orientation match advantage. To test this hypothesis, we conducted a moderator analysis on the effect size of match advantage as the dependent measurement, language groups as the independent variable, and object size as the moderator. We ran this analysis without and with the heterogeneous residuals. By the analysis without the heterogeneous residuals, the coefficient for the object size was estimated to be b = −8.09 (SE = 9.42) and was above the preregistered significance level, p = 0.39. The analysis with the heterogeneous residuals returned a similar result: The estimated coefficient was b = −8.01 (SE = 9.45) and was above the preregistered significance level, p = 0.40.

Linear Mixed-Effect Model on the Verification Times

The preregistered analysis plan aimed to explore the interactions of the three fixed effects (language group, object size, and matching of sentence and object). We decided to use the model including the trial sequence and the correlation of trial sequence and items based on the lowest Akaike information criterion (Akaike, 1974; Burnham & Anderson, 2010). Appendix B summarizes the selection of models.

Table 3 summarizes the fixed effects in the linear mixed-effect model. The mixed-effect model shows the main effects of matching and language group below the preregistered significance level of p < .05, but the main effect of object size and all the interactions were significant. The mixed-effect model did not show an interaction of match advantage and object size or language, as did ANOVA and meta-analysis. Table 4 summarizes the coefficients of random effects in the linear model based on the suggestion of Barr, Levy, Scheepers, and Tily (2013). The variances of random effects were critical for controlling Type 1 error and the statistical power.

Table 3 Fixed effects of critical variables: sentence–picture verification task and picture–picture verification task
Table 4 Random effects in sentence–picture verification task and picture–picture verification task

Study 2: Picture–Picture Verification

The aim of Study 2 was to examine the mental rotation hypothesis. We assumed that large, nonmanipulable objects require longer rotation times than small, manipulable objects.

Therefore, we predicted that participants would take more time to verify larger objects that are presented in different orientations.



Same as in Study 1.


The analysis includes data from participants who had high accuracy (English: >75%, Chinese: >70%, Dutch: >80%) in the sentence–picture verification task. Table 5 summarizes the descriptive statistics as a function of language group, object size, and match. Each language group showed a longer mean reaction time and a lower accuracy for large objects. Participants made faster and more accurate responses for object pairs presented in the same orientation, especially for large objects.

Table 5 Averaged reaction times and error percentages (in parentheses) of the picture–picture verification task

Three-Way ANOVA

Consistent with the preregistered plan, we conducted a mixed ANOVA on the correct reaction times as a function of language group, object size, and match. This analysis showed significant main effects of language group, F (3, 457) = 7.32, MSE = 28, 383.87, p < .001, ; object size, F (1, 457) = 411.90, MSE = 870.22, p < .001, ; and match, F (1, 457) = 1, 208.09, MSE = 854.53, p < .001, . Of more importance to our hypotheses, the interaction between object size and match was significant, F (1, 457) = 46.16, MSE = 673.50, p < .001, . This interaction indicated that the verification time was longer for large objects (55 ms) than for small objects (39 ms). Additionally, the interaction between language group and object size was not significant, F (3, 457) = 1.37, MSE = 870.22, p = .252, , nor was there a significant interaction between language group and match, F (3, 457) = 5.77, MSE = 854.53, p = .001, . The nonsignificant interactions suggest that the human perception of object size and orientation generalizes across the three languages.

The mixed ANOVA on the accuracy scores shows significant main effects of object size, F (1, 457) = 729.85, MSE = 26.26, p < .001, , and match, F (1, 457) = 646.19, MSE = 45.65, p < .001, . Unlike the analysis of response times, the main effect of language group was not significant, F (3, 457) = 1.26, MSE = 117.05, p = .289, . Consistent with the reaction time analysis, the analysis of response accuracy showed a significant interaction of object size and match, F (1, 457) = 929.43, MSE = 22.55, p < .001, . This interaction indicated that participants made more errors for the large objects presented in different orientations. The other interactions were above the predefined significance level of .05.

Meta-Analysis on the Verification Times

Along with the preregistered hypothesis for the first study, this study investigated whether object size moderates the mental rotation of target objects. We conducted the moderator analysis on the effect size of match as the dependent measurement, language groups as the independent variable, and object size as the moderator. We ran this analysis without and with the heterogeneous residuals. In the analysis without the heterogeneous residuals, the coefficient for the object size was estimated to be b = −16.38 (SE = 4.89 and was below the preregistered significance level, z = −3.35, p = .001. The analysis with the heterogeneous residuals returned a similar result: The estimated coefficient was b = −16.42 (SE = 4.89 and was below the preregistered significance level, z = −3.36, p = .001.

Mixed-Effect Model

As in Study 1, we decided to use a model including the trial sequence as the intercept and the correlation of trial sequence and participants as the best fitting model. Table 3 summarizes the coefficients of the fixed effects. In addition to the main effects of match, object size, and languages, this mixed-effect model confirmed the larger effect of match for the large objects, β = 93.14, t = 3.67, p < .01. This model also indicated that there was an interaction between match and language, indicating a larger effect for Dutch participants in comparison with Chinese participants, β = 59.65, t = 2.65, p < .01.

Study 3: Naming Study

Similar to Study 1, the aims of Study 3 were to test whether there was a larger match advantage for large than for small orientation item and whether this was similar across languages. Study 3 focused on the language groups that showed the orientation effects in Study 1: English and Chinese. If the match advantage obtained in Study 1 is also found in this study, this would provide corroborating evidence that mental simulations are performed when processing large objects.



The procedure was similar to that in Study 1 except that participants told the object name of the target picture aloud instead of pressing a response key. Before participants commenced with the experiment, they tested their audio recording function on a calibration page. In each practice and experimental trial, participants named the object aloud within 3 s. The Gorilla website, which was used for presenting the stimuli, stored each vocal response in mp3 format. Then, participants verified their voice response with four options: right, wrong, no response, and recording failed. After half of the filler trials, participants completed comprehension questions about the probe sentences, which were used to check whether participants were reading these sentences for meaning. As for the sentence–picture verification task, the participants filled in the postsurvey about this study at the end.

Naming Latency Coding

The sound files were archived in mp3 format in monochannel and 128 kb/s. We extracted the naming latency of each voice response in two phases. In the first phase, we used the Praat (Boersma, 2001) script to successfully determine the naming latency of sound files without noise. In the second phase, we checked the participants’ evaluation of their own recordings. Fifteen English participants indicated that more than 50% of their responses failed recording, but it appeared that these sound files contained accurate recordings of participants’ voice in the trials. Ten Chinese participants’ sound files included background noise. The noise caused the Praat script to calculate incorrect naming latencies.

Prior to the data analysis, we reset the 15 English participants’ evaluations, and we manually coded the 10 Chinese participants’ naming latencies.


English and Chinese participants from the same participant pools as in Study 1 (English: ProA; Chinese: BW) participated. Thirty-two participants whose responses contained over 50% of unrecognizable sounds and who were not native speakers were removed from the dataset. The data analysis included the responses of 76 English participants and 87 Chinese participants.


Table 6 summarizes the mean reaction times and accuracy from the participants. Both English and Chinese participants showed a weak match disadvantage in the picture-naming task, and they required a longer time to read the large objects than the small objects. The preregistered plan addressed the exploratory goals for the analysis of picture-naming responses. There was a significant difference between large objects and small objects in a mixed ANOVA with language, object size, and match as factors. We compared the responses to large objects and small objects by using a mixed-effect model approach.

Table 6 Averaged reaction times and accuracy percentages of the sentence–picture naming task

Three-Way Mixed ANOVA on the Naming Latency

The analysis on naming latencies yielded a significant main effect of object size, F (1, 138) = 166.84, MSE = 8, 399.33, p < .001, . All other main effects and interactions had p > .1 and .

Linear Mixed-Effect Model

The fixed effect of object size was the only coefficient beyond the preregistered significance level of p < .05. The confidence intervals of naming times indicated that large objects require more time for pronunciation, M = 973.91, 95% CI [904.45, 1,042.78], than small objects, M = 833.28, 95% CI [780.43, 886.14]. Tables 3 and 4 summarize the coefficients of the fixed effects and random effects, respectively.


We set out to investigate to what extent inconsistent findings previously reported for the mental simulation of object orientation are related to the size of objects. In addition, we examined whether our findings would generalize across languages (Chinese, Dutch, and English). We conducted three preregistered studies in which we manipulated the size of objects to be either small or large. Participants made verification judgments (Studies 1 and 2) or named the depicted objects (Study 3). We hypothesized that a larger match advantage would be obtained for large objects than for small objects. We performed a confirmatory test on the verification task (based on earlier research using this task) and explored the effects in the picture-naming task. Additionally, we hypothesized that the match advantage should be similar across languages. Moreover, we tested a mental rotation account predicting that large and nonmanipulable objects should take longer to mentally rotate than small and manipulable objects, which would result in a smaller match advantage for small objects than for large ones. In the following discussion, we separate conclusions about our confirmatory analyses from exploratory analyses and comments.

Confirmatory Analyses

Our first preregistered hypothesis was that the orientation effect should be larger for large than for small objects. Contrary to this hypothesis, the preregistered mixed ANOVA showed a null interaction of object size and object orientation in the sentence–picture verification task. The meta-analysis on the sentence–picture verification times (Study 1) showed that while there was a larger match advantage for large than for small objects, the difference between these meta-analytic effects were not significant (see Study 1 Result and Meta-analysis on the match advantage).

The meta-analysis suggests that the pattern is more complex than we assumed. We will discuss this in the exploratory section of this discussion. Additionally, contrary to the sentence–picture verification experiment (Study 1), the picture–picture verification experiment (Study 2) did show the predicted interaction between object size and match.

Thus, without a sentence context, pictures of large objects do yield a larger orientation effect than do pictures of small objects.

To summarize, the evidence regarding our first preregistered hypotheses is mixed. Only the task that is the most purely visual (Study 2) shows the predicted interaction between size and orientation, but the lack of such an interaction in the language-based task (Study 1) suggests that object size cannot be used to explain the relatively small size of the orientation effect relative to other perceptual features such as color, actual object size (per se), or shape.

Our second hypothesis is pertained to the generalizability of the orientation effect across languages, specifically in the picture–sentence verification task, as this is what was used in virtually all previous studies. Although the meta-analysis of Study 1 results shows a main effect of match, the pattern of orientation effects differed across languages in a rather complex manner. The interaction itself is not very strong (p = .035), and object size is not a moderator in the meta-analysis.

As a supplement to our first hypothesis, we investigated in Study 2 whether differences could be found in picture verification times for large and small objects in an attempt to examine the role of mental rotation in the orientation effect. The mixed ANOVA showed longer verification times for larger objects as we predicted. Also, the meta-analysis indicated that object size moderated the verification times to target pictures. This suggests that mental rotation is a mental process that participants engage in during a picture–picture verification task.

Exploratory Analyses

As shown in the meta-analysis of Study 1, there was an effect of orientation across languages and object size. This extends the original finding by Stanfield and Zwaan (2001).

Nevertheless, the meta-analytic effect is small (12 ms) and considerably smaller than the original effect of 44 ms or the Zwaan and Pecher (2012) replication of 35 ms. Importantly, the present study is not a direct replication. First, different materials were used. Second, three language groups were included. However, even if one takes only the native English speakers, the effect (across large and small objects) is 15 ms. The match advantage for the Chinese participants is 35 ms. The Dutch participants showed no orientation effect when the stimuli were presented in Dutch (2 ms across large and small objects). This is consistent with earlier findings of Rommers et al. (2013) and de Koning et al. (2017a), who also tested the orientation effect using Dutch stimuli.

Combined with earlier findings, a pattern seems to be arising. In English, a small but reliable match advantage is found. In Dutch, however, no effect is found, while in Chinese a larger effect is found than in either English or Dutch (in the present study). As alluded to earlier, the lack of an effect in Dutch might be due to the nature of the stimuli. In Dutch, it is common, at least according to the second and third authors’ intuitions as native speakers of Dutch, to use a verb that describes the orientation of an object when describing the location of that object. Thus, Het boek staat op de plank (The book stands on the shelf) is more common than Het boek is op de plank (The book is on the shelf). It is possible that the lack of an orientation verb in our stimulus sentences has thrown our Dutch speakers off in that it violated their expectations. It is conceivable that Dutch speakers are used to using the orientation verb as the primary source of information on how to represent the orientation of an object rather than relying on the prepositional phrase to infer the object’s orientation from, as was necessitated by our stimuli. Perhaps the lack of an orientation effect in Dutch is due to the relative unusualness of our Dutch stimulus sentences.

There are two aspects of our data that are consistent with this tentative explanation.

First, as noted before, in our pilot study, we obtained an effect in a Dutch–Dutch sample for small objects. When we checked the stimuli, we noticed that there were many stimulus sentences with an orientation verb. These sentences were constructed by a Dutch research assistant, who had translated these sentences from English examples. This means that the research assistant automatically used orientation verbs when translating from sentences without an orientation verb. Apparently, the assistant thought this was the best translation rather than using a more literal one (i.e., using the orientation-neutral), which was deliberately used in the original study by Stanfield and Zwaan (2001) precisely because of this characteristic. Moreover, we obtained an orientation effect with these sentences containing orientation verbs. Thus, the lack of an orientation effect in Dutch sentences might be attributed to the absence of orientation verbs.

The meta-analytical data provide a second hint that is consistent with this idea. When tested in English, the Dutch participants showed a pattern that is somewhere between English and Dutch data, at least with regard to the large objects. The Dutch–Dutch sample has an orientation effect of 1 ms, while the English sample has 29 ms. The Dutch–English sample has a difference of 13 ms. This is to ensure that this analysis is highly speculative (hence its presence in a section titled exploratory analyses), but it paves avenues for further research.

The Chinese sample shows the largest effects of the three language groups and is, in fact, the only sample showing an effect for small objects. We are not sure why this is the case. One thing to note, however, is that, as we noted earlier, this sample shows by far the largest variability in response times. One plausible factor is that many Taiwanese participants were likely not as experienced as English and Dutch participants in online psychological experiments. Nevertheless, even though the Chinese data showed the largest variability in this study, our best-fitting mixed-effect model did not show a significant interaction between language group and match advantage.

A further observation concerns the putative role of mental rotation in the various tasks.

The meta-analysis of Study 1 suggests that object size did not moderate the match advantage of object orientation, although we found significant match advantages of large objects in English and a match advantage of small objects in Chinese. On the other hand, the meta-analysis of Study 2 indicates that object size indeed moderated the mental rotation speed of objects. These results demonstrate that the mental rotation hypothesis provides a better account for the perception task (i.e., picture–picture verification) than for the reading task (i.e., sentence–picture verification). The preregistered hypothesis was supported only by the results from the picture–picture verification task. The results suggest that mental rotation might play a role in the picture–picture verification task, but not in the sentence–picture verification task. Hence, the present study does not confirm the suggestion of de Koning et al. (2017a) that the failure to find an orientation effect in a sentence–picture verification task might be due to participant’s engagement in mental rotation processes where a mismatched orientation is quickly transformed into an orientation that matches the object as described in the sentence.

According to our preregistered plan, the findings of the sentence–picture naming task (Study 3) provide clues to explore theoretical and methodological issues beyond the confirmatory analysis. Our picture-naming task showed a main effect of object size but null effect of object orientation. This result speaks to the theoretical distinction between extrinsic and intrinsic properties (Scorolli, 2014). Embodied cognition researchers have classified object orientation as an extrinsic property and object size as an intrinsic property (e.g., de Koning et al., 2017a). This classification depends on the assumption that simulating an intrinsic property requires only the visual system, whereas simulating extrinsic property requires the visual and motor systems. The finding that supports this distinction is that English readers, in feature-generation tasks, provided more intrinsic property than extrinsic property of target objects (McRae, Cree, Seidenberg, & McNorgan, 2005; Wu & Barsalou, 2009). One account to be examined is that the object name would sufficiently initiate the simulation of the extrinsic property. Simulating the intrinsic property, on the other hand, would require the understanding of the sentence context.

It is premature to claim that our findings support the theoretical distinction between extrinsic and intrinsic properties, but our analysis provides clues for further research. One potential topic is to isolate the distinction between extrinsic and intrinsic properties. In the study which measured two kinds of object properties, an orthogonal manipulation of properties would be the appropriate method. Take object size and orientation as an example.

Researchers could test the target picture rocket in four scenarios: large object and vertical “The rocket that will carry the satellite has been placed on the launching platform”; large object and horizontal “The rocket that is being transferred to the base will carry the satellite”; small object and vertical “In that diorama, a rocket has been placed on the launching platform”; small object and vertical “The rocket on that table will be placed in the diorama on the launching platform.” With a number of sentence–picture sets like in this example, researchers could examine if there is an interaction between size and orientation and evaluate to which extent the sentence context influenced the retrieval of object names.


In this preregistered report, we tested two hypotheses about the orientation effect in the sentence–picture verification task. The first hypothesis was that the match advantage should be larger for large nonmanipulable objects than for small manipulable objects. The second hypothesis was that this effect should generalize across the languages we tested, Chinese, Dutch, and English. Neither hypothesis was supported by the data. Although the match effect was optically larger for large objects than for small objects, there was no significant interaction between object size and match. Therefore, we cannot conclude that object manipulability is a factor in the orientation effect and the fact that the orientation effect is typically smaller than the effects for shape, color, and size is due to manipulability. Contrary to our second hypothesis, the orientation effect did vary across languages. The effect was largest in Chinese, smaller in English, and absent in Dutch.

In exploratory analyses, we attempted to explain all our findings and suggest a design to explore further theoretical questions. We tentatively explain the lack of an orientation effect in Dutch by noting that the experimental sentences are somewhat unusual in Dutch as compared to English. A future (preregistered) experiment could examine this idea further. These results suggest the orientation effect may not always be as straightforward as researchers think to generalize effects from one language to other languages and from one task to other tasks.

We should note that there are several limitations to our approach. First, we explored the cognitive aspects of simulating object orientation and size in terms of specific tasks, namely, the sentence–picture verification task (our primary experiment), the picture–picture verification task, and the sentence–picture naming task. These tasks did not show similar patterns of results, suggesting that the results are, to some extent, task-specific. The sentence–picture verification task, our primary task, showed the most complex pattern of results, which we have tried to explain above. The sentence–picture naming task showed mainly null results. We are not sure whether this is because we collected the naming data online or simply because the picture-naming task may not be sensitive enough to detect an orientation effect, given that the orientation effect is small to begin with, and naming effects tend to be smaller than verification effects (Zwaan, 2014). The picture–picture verification task showed the clearest results. This is perhaps not surprising given that it is the only language-independent task and does not rely on inferential (or knowledge-activation) processes. Remarkably, this task showed that an interaction between object size and the match advantage might have to do with the size of the depicted objects but could also yield to a more mundane explanation, for example, the pictures depicting the larger objects had more visual features to be verified than the pictures depicting the small objects.

The current study set out to investigate key issues that have been unexplored in embodied cognition research (see Ostarek & Huettig, 2019). Although not fully conclusive, our results point out to two aspects that could help advance research and theorizing regarding the mental simulation of object properties. First, we made an attempt to investigate whether and how variation in the characteristics of objects (i.e., size) within one object property (i.e., orientation) affects the mental simulation of objects. Second, we investigated whether the mental simulation of object orientation is language independent. Our findings suggest that differences exist in mental simulation across languages, which is important to take into account when predicting, comparing, or interpreting mental simulation findings from various languages. Together, the findings of this study provide a first step in furthering our understanding of the mental simulation of object properties and hopefully inspire other researchers to contribute to further developments. Along this line, Chen et al. (2018) assessed the replicability of the match advantage of object orientation in more than 14 languages. These and other initiatives can build on our findings to explore novel questions and further refine theories of mental simulation in language comprehension.

Electronic Supplementary Materials

The electronic supplementary material is available with the online version of the article at

We thank David Feinberg of McMaster University for providing assistance and suggestions for extracting the naming latencies (see the contact of Feinberg’s laboratory at


Appendix A

Guideline for Replication and Reproduction

Researchers who are planning to replicate the studies can download the experimental scripts, sentence sheets, and picture files from the open materials repository (permanent link:

The data files and analytical scripts are accessible in this project repository ( Readers who want to reproduce the data analysis can at first download the files in this repository. In the guideline file (packaged in, we summarized the content of files to reproduce the data analysis of this project.

Considered the massive numbers and volume of voice files, we packaged the recorded voice of English and Chinese participants in Study3_voice.7z with the Praat script. This file is available in this project repository (direct access link:

Appendix B

Selections of Mixed-Effect Models

The mixed-effect models in each study considered inclusion of trail sequence and the constituents of random effects. There were four types of models, and each included the three fixed effects (language groups, object size, and matching setting) and the two random effects (participants, target objects). In each study, we evaluated the fitness of four types of models: (a) the model does not include the trial sequence as the intercept, (b) the model includes the trial sequence as the intercept, (c) the model includes the trial sequence as the intercept and the correlation of trial sequence and participants, and (d) the model includes the trial sequence as the intercept and the correlation of trial sequence and items. Table B1 summarizes the statistical information of each model. Model (c) had the smallest AIC in each study.

Table B1 Statistical information of mixed-effect models
Sau-Chin Chen, Department of Human Development and Psychology, Tzu-Chi University, No. 67, Jie-Ren St., Hualien 97004, Taiwan,