Skip to main content
Free AccessEditorial

Lost in Translation: Thoughts Regarding the Translation of Existing Psychological Measures Into Other Languages

Published Online:https://doi.org/10.1027/1015-5759/a000167

Over the last decades, psychological science has become more and more international (Alonso-Arbiol & van de Vijver, 2010). In order to be able to compare research findings from different countries and in different languages, it is important to ensure the comparability of the assessment methods used. Thus, translations of standard measures have now appeared in a multitude of different languages. The increased availability of different language versions of the same measurement tools would seemingly create a need to internationally publish articles introducing these versions. An often-heard argument is that, in order for the translated measure to be useful, an English language publication has to test its psychometric properties. Let us scrutinize this argument more closely.

An argument often used to support the need for an English-language publication concerning a translated instrument is that the international research community would not accept research findings based on translated instruments whose psychometric properties have not also been published in English. This is a dangerous thought, on the one hand, because it implies a certain mistrust: A peer-reviewed publication would otherwise suffice. On the other hand, besides this theoretical consideration, we will address some more practical issues below.

When a new measurement tool is constructed, among the first questions to be answered are the following:

  1. 1.
    For what measurement purpose is the instrument designed (e.g., personnel selection, clinical assessment)?
  2. 2.
    What is the target population (e.g., adolescents, adults, patients)?
  3. 3.
    Who will employ the instrument (e.g., researchers, practitioners)?
When translating an existing measurement tool into another language, clearly one must answer these questions as well. Obviously, the original authors have already defined the measurement purpose of the instrument. The same usually holds true for the target population and the instrument user. However, during translation of a measurement tool these aspects should be reconsidered and expressed specifically. Sometimes a translation goes hand in hand with a changed (or changing) target population or measurement purpose. Thus, measures are usually translated with a specific goal.

Most translations probably aim for one of the following three goals:

  1. 1.
    to make a particular instrument available in a different language,
  2. 2.
    to provide a means for cross-cultural research,
  3. 3.
    to conduct research on the specific instrument itself.
Not all of these translation goals have the same readership in mind. Consequently, depending on the translation goal, an English-language publication of the instrument’s psychometric properties may be more or less useful.

When making a standard instrument available to a different language community, practitioners usually represent the targeted readership. This in itself is a valuable goal because practitioners probably make up the majority of test-users. However, it can be assumed that most practitioners have better access to journals published in their own language. Moreover, practitioners most likely find it easier to comprehend the ever-increasing complexity of data analyses if they can read it in their native language. Consequently, publishing findings in national journals should in fact positively affect the acceptance of the translated instrument. For this reason, this should be the preferred outlet of the article if the translation goal is to make the instrument available to local practitioners. This is especially true if testing the psychometric properties closely mirrors the original publication, so that the translated instrument and its inherent qualities are introduced to the practitioner at the same time.

There are several examples where such a publication strategy was successful. Rammstedt and John (2005) as well as Lang, Lüdtke, and Asendorpf (2001) published German versions of the Big Five Inventory (BFI), albeit differing in length. Both papers are cited 8–9 times per year on average since their appearance. Moreover, there are German- as well as English-language papers that apply these translated BFI versions. Thus, the publications reached the intended audience and sparked new research that was accepted internationally.

The translation goals of cross-cultural research and research on the instrument itself have a stronger focus on a research-minded audience. Consequently, an English-language publication might address exactly the intended readership. Obviously, a journal such as the European Journal of Psychological Assessment would be interested in both kinds of translation efforts.

However, all three goals (but especially the latter two) also imply that the translation guarantees measurement invariance. Otherwise, practitioners cannot be assured that they are truly assessing the same construct they intended to measure. Likewise, researchers cannot safely conduct cross-cultural comparisons. Nor is it reasonable to critique and develop an instrument without having shown that the same underlying construct is being measured. Chen (2008) clearly pointed out the effects of lacking measurement invariance. Her findings show that mean comparisons as well as comparisons of correlation coefficients are distorted if measurement invariance is not given. Especially distorted correlation coefficients have the potential to harm comparisons of psychometric properties derived from different language versions of the same instrument. After all, many methods we use to estimate the psychometric properties of an instrument rely on correlations, e.g., factor analyses, test-retest reliability, criterion and construct validity, internal consistency, and so on.

Thus, any translation process should include some evidence for measurement invariance (e.g., Grygiel, Humenny, Rebisz, Kwitaj, & Sikorska, 2013); otherwise, empirical evidence for the psychometric soundness of a translation is hard to interpret. Of course, if a translated test version reaches comparable test score reliabilities and comparable validity findings, we often feel assured that the translation was successful. Nevertheless, Chen’s findings should put some question marks behind this belief. Moreover, if the findings do not deviate from those of the original version, this is probably mostly of interest to the new language community. The international community can gain few new insights – besides the fact another language version of the instrument now exists. Of course, if the translation offers some new insights, e.g., new criterion validity-related evidence, new norms, new age groups, etc., then international readers might become interested. International readers might be even more inclined to read and cite such translations if the new findings deviate from the original findings. But without some test of measurement invariance, both the deviations and the new findings are hard to interpret. Thus, testing measurement invariance becomes an interesting and indispensable issue within the translation process. Failing to achieve invariance can have many reasons, translation issues being only one of them (Sass, 2011). Exploring possible reasons for failed measurement invariance might be of greater interest to readers from all languages interested in the specific measurement tool or underlying construct.

Once measurement invariance has been demonstrated, cross-cultural comparisons can be made. Moreover, it then becomes possible to compare psychometric properties based on correlations. Providing evidence for measurement invariance sounds difficult because it requires gathering data from different language populations. However, a translation is usually based on an existing, empirically tested instrument. Thus, such data are available and should be made available by the original authors.

Having said this, it should be stressed that a translated instrument without evidence for measurement invariance is by no means useless. On the contrary, such an adaption to a new language offers many useful applications. The limiting factor is comparability with findings from other languages and, thus, the value for an international readership. Such studies should include aspects that broaden the knowledge for all users of the measure, both nationally and internationally.

Summing up, we emphasized two issues. First, each translation process of an existing instrument should thoroughly consider the intended user of the translation and how to best address that user. There are many cases in which a publication within the new language is more appropriate because the intended readership has better access to it.

Second, translations directed toward a research-minded, international readership should provide some evidence for measurement invariance. This in itself is highly interesting, and even failed attempts potentially offer important insights for all readers.

References

  • Alonso-Arbiol, I. , & van de Vijver, F. J. R. (2010). A historical analysis of the European Journal of Psychological Assessment . European Journal of Psychological Assessment, 26, 238–247. First citation in articleLinkGoogle Scholar

  • Chen, F. F. (2008). What happens if we compare chopsticks with forks? The impact of making inappropriate comparisons in cross-cultural research. Journal of Personality and Social Psychology, 95, 1005–1018. First citation in articleCrossrefGoogle Scholar

  • Grygiel, P. , Humenny, G. , Rebisz, S. , Kwitaj, P. , Sikorska, J. (2013). Validating the Polish adaptation of the 11-item De Jong Gierveld Loneliness Scale. European Journal of Psychological Assessment, 29, 129–139. First citation in articleLinkGoogle Scholar

  • Lang, F. R. , Lüdtke, O. , Asendorpf, J. B. (2001). Testgüte und psychometrische Äquivalenz der deutschen Version des Big Five Inventory (BFI) bei jungen, mittelalten und alten Erwachsenen [Test quality and psychometric equivalence of the German Big Five Inventory (BFI) for young, middle-aged, and old adults]. Diagnostica, 47, 111–121. First citation in articleLinkGoogle Scholar

  • Rammstedt, B. , & John, O. P. (2005). Short version of the Big Five Inventory (BFI-K): Development and validation of an economic inventory for assessment of the five factors of personality. Diagnostica, 51, 195–206. First citation in articleLinkGoogle Scholar

  • Sass, D. A. (2011). Testing measurement invariance and comparing latent factor means within a confirmatory factor analysis framework. Journal of Psychoeducational Assessment, 29, 347–363. First citation in articleCrossrefGoogle Scholar

Matthias Ziegler, Institut für Psychologie, Humboldt University Berlin, Rudower Chaussee 18, 12489 Berlin, Germany, +49 30 2093-9447, +49 30 2093-9361,
Doreen Bensch, Institut für Psychologie, Humboldt University Berlin, Rudower Chaussee 18, 12489 Berlin, Germany, +49 30 2093-9447, +49 30 2093-9361,