Skip to main content
Free AccessPre-registered Report

Effects of Subtitles, Complexity, and Language Proficiency on Learning From Online Education Videos

Published Online:https://doi.org/10.1027/1864-1105/a000208

Abstract

Abstract. Open online education has become increasingly popular. In Massive Open Online Courses (MOOCs) videos are generally the most used method of teaching. While most MOOCs are offered in English, the global availability of these courses has attracted many non-native English speakers. To ensure not only the availability, but also the accessibility of open online education, courses should be designed to minimize detrimental effects of a language barrier, for example by providing subtitles. However, with many conflicting research findings it is unclear whether subtitles are beneficial or detrimental for learning from a video, and whether this depends on characteristics of the learner and the video. We hypothesized that the effect of 2nd language subtitles on learning outcomes depends on the language proficiency of the student, as well as the visual-textual information complexity of the video. This three-way interaction was tested in an experimental study. No main effect of subtitles was found, nor any interaction. However, the student’s language proficiency and the complexity of the video do have a substantial impact on learning outcomes.

Introduction

Open online education has rapidly become a highly popular method of education. The promise – global and free access to high-quality education – has often been applauded. With a reliable Internet connection comes free access to a large variety of massive open online courses (MOOCs) found on platforms such as Coursera and edX. MOOC participants indeed come from all over the world, although participants from Western countries are still overrepresented (Nesterko et al., 2013). In all cases there are many non-native English speakers in English courses. This raises the question as to what extent can non-native English speakers benefit from these courses, compared with native speakers. Open online education may be available to most, the content might not be as accessible for many owing to language barriers. It is important to design online education in such a way that it minimizes detrimental effects of potential language barriers to increase its accessibility for a wider audience.

MOOCs typically feature a high number of videos that are central to the student learning experience (Guo, Kim, & Rubin, 2014; Liu et al., 2014). The central position of educational videos is reflected by students’ behavior and their intentions: Most students plan to watch all videos in a MOOC, and also spend the majority of their time watching these videos (Campbell, Gibbs, Najafi, & Severinski, 2014; Seaton, Bergner, Chuang, Mitros, & Pritchard, 2014). In this study, we investigate the impact of subtitles on learning from educational videos in a second language. Providing subtitles is a common approach to cater to diverse audiences and support non-native English speakers. The Web Content Accessibility Guidelines 2.0 (WCAG, 2008) prescribe subtitles for any audio media to ensure a high level of accessibility. Intuitively there seems nothing wrong with this advice and many studies have indeed found a positive effect of subtitles on learning (e.g., Markham et al., 2001). However, a different set of studies provides evidence that subtitles can also hamper learning (e.g., Kalyuga, Chandler, & Sweller, 1999). In the current study the effects of subtitles will be further examined.

This paper is organized as follows: First, conflicting findings on the effects of subtitles on learning will be discussed. Second, a framework will be proposed that can explain these conflicting findings by considering the interaction between subtitles, language proficiency, and visual-textual information complexity (VTIC). In turn, an experimental study will be described that tests the main hypothesis of the framework.

Although this study is situated in an online educational setting, the results may also be of relevance for other media-orientated fields, such as film studies and video production.

Subtitles: Beneficial or Detrimental for Learning?

Research on the effects of subtitles typically differentiates between subtitles in someone’s native language, called L1, versus subtitles in one’s second language, or L2. A meta-analysis of 18 studies showed positives effects of L2 subtitles for language learning (Perez, Noortgate, & Desmet, 2013). Specifically, enabling subtitles for language-learning videos substantially increases student performance on recognition tests, and to a lesser extent on production tests. Other studies have found similar positive effects of subtitles on learning from videos and there appears to be a consensus that subtitles are beneficial for learning a second language (e.g., Baltova, 1999; Chung, 1999; Markham, 1999; Winke, Gass, & Sydorenko, 2013). However, these are all studies that focus on learning a language and not on learning about a non-linguistic topic in a second language. There are important differences between language learning and what we will call content learning. When learning a language, practicing with reading and understanding L2 subtitles is directly relevant for this goal. By contrast, when learning about a specific topic, apprehending L2 subtitles is not a goal in itself but only serves the purpose of better understanding the actual content. As such, we would argue that findings from studies focusing on language learning are by themselves not convincing enough to be directly applied to content learning, as subtitles have a different relationship with the content and the learning goals.

In contrast to studies on language learning, there are only a few studies that have investigated the effects of subtitles for content learning. These studies have shown positive effects for subtitles for content learning in a second language. For example, when watching a short Spanish educational clip, English-speaking students benefited substantially from Spanish subtitles, but even more so from English subtitles (Markham et al., 2001). Another study, focused on different combinations of languages, similarly showed that students performed better at comprehension tests when watching an L2 video with subtitles enabled (Hayati & Mohmedi, 2011).

Although several studies did find positive effects of subtitles, a range of other studies yielded contradictory findings. For example, Kalyuga et al. (1999) found that narrated videos without subtitles are better for learning than are videos with subtitles. In this study, subtitles were shown to lead to lower performance, an increased perceived cognitive load, and more reattempts during the learning phase (i.e., rewatching videos). This is in contrast to the earlier discussed studies, which showed positive effects of learning from videos with subtitles. In a different study on learning from narrated videos, two experiments showed that enabling subtitles led to lower knowledge retention and transfer (Mayer, Heiser, & Lonn, 2001). With Cohen’s d effect sizes ranging from 0.36 to 1.20, the detrimental effects of subtitles in these studies were quite substantial. A range of other studies found similar evidence that for content and language learning alike, narrated explanations are typically better than showing only subtitles, or narration combined with subtitles (Harskamp, Mayer, & Suhre, 2007; Mayer, Dow, & Mayer, 2003; Mayer & Moreno, 1998; Moreno & Mayer, 1999). Finally, some studies showed neither a positive nor a negative effect of subtitles on learning (e.g., Moreno & Mayer, 2002a).

Explaining Conflicting Findings on the Effects of Subtitles

The previously discussed literature provides a confusing paradox for instructional designers: Are subtitles beneficial, detrimental, or irrelevant for learning? Here we will present an attempt to explain the conflicting findings using a framework built on theories of attention and information processing. In short, we propose that the conflicting findings can be integrated by considering the interaction between subtitles, language proficiency, and the level of visual-textual information complexity (VITC) in the video.

Working Memory Limitations

An essential characteristic of the human cognitive architecture is that not every type of information is processed in an identical way. Working memory is characterized by having modality-specific channels, one for auditory and one for visual information (Baddeley, 2003). Both have a limited capacity for information, which can only hold information chunks for a few moments before they decay (Baddeley, 2003). During learning tasks, working memory acts as a bottleneck for processing novel information; as more cognitive load is imposed on the learner, less cognitive resources are available for the integration of information into long-term memory, effectively impairing learning (Ginns, 2006; Sweller, Van Merrienboer, & Paas, 1998). For novel information, the cognitive resources required for processing appears to be primarily dictated by measurable attributes of the information-in-the-world such as the amount of words and their interactivity (Sweller, 2010). As each channel has its own capacity it is generally more effective to distribute processing load between both channels, instead of relying only on one modality (Mayer, 2003). When two sources of information are presented in the same modality this can (more) easily overload our limited processing capacity (Kalyuga et al., 1999). This provides an explanation of why a range of studies found negative effects of subtitles when learning from videos, as both are sources of visual information.

Textual Versus Nontextual Visual Information

Up to now, we did not distinguish between textual and nontextual visual information. As previously discussed, auditory and visual information are initially processed in separate channels. However, after this initial processing, any language presented either visually or verbally will be processed in the same working memory subcomponent: the phonological loop (Baddeley, 2003). By contrast, nontextual visual information is processed in a different component, the visuospatial sketchpad. This notion can further clarify the earlier presented findings. Specifically, the presence of textual visual information, as compared with nontextual visual information becomes an important variable to account for. A video that contains three different sources of language – narration, subtitles, and in-video text – is likely to induce cognitive overload. If the visual information in the video does not have a language component we can expect a reduced or non-detrimental effect. More precisely, subtitles are expected to be detrimental to learning when a video already has a high level of VTIC. However, when a video has a relatively low level of VTIC, adding subtitles will not necessarily lead to cognitive overload. Should the addition of subtitles be desired, it then becomes necessary to ensure that the VTIC of a video is low enough to prevent detrimental effects due to cognitive overload. We propose two ways of how the VTIC of an educational video can be manipulated while maintaining the educationally relevant content.

Amount of Visual–Textual Information

The first, and most straightforward, aspect of VTIC is the amount of visual–textual information shown in a video. That is, a video in which much more text is shown is arguably more complex to process than a video with much less text. However, while removing information that is vital to understand the topic of the video might reduce the complexity, it will also harm the educational value of the video. However, removing or adding visual–textual information that is not strictly relevant for the learning goals can be used to, respectively, decrease or increase the VTIC of a video. Given that such information does not benefit the student in mastering the learning goal, the validity of the video as an educational tool is fully maintained. For example, take a complex image such as a schematic representation of the human eye, with many labels referring to each individual part of the eye. Labels that are not relevant to the learning goals can be effectively removed, possibly greatly limiting the amount of visual–textual information presented to the student. Evidence for the beneficial effect of removing irrelevant information has been found by several studies and is typically referred to as the “coherence effect” (Butcher, 2006; Mayer et al., 2001; Moreno & Mayer, 2000).

Presentation Rate of Visual–Textual Information

The second proposed component of VTIC is the presentation rate of the visual–textual information. As discussed earlier, working memory is limited in how much information it can hold and process at any given time. Therefore, introducing many concepts simultaneously risks overloading a student with more information than (s)he can effectively handle. This can be prevented by spreading the information over time, while maintaining the same overall amount of information. For example, detrimental effects of subtitles disappear when verbal and written text explanations are presented before the visual information is shown (Moreno & Mayer, 2002b). With such a sequential presentation the student does not need to process the spoken word, written word, as well as the visual information simultaneously. Instead, first the narration and subtitles are processed, and only afterward is the visual information shown. This effectively removes the role of split-attention effects as well as spreading out cognitive load over time, thus reducing the risk of cognitive overload. However, while this form of information segmentation makes videos easier to process and understand, it also increases the video duration, which is often not a desired consequence. Visual–textual information can be segmented without affecting the video duration by only showing new information from the moment it is mentioned in the narration and becomes relevant. Using the previous example of a complex schematic image with many labels, at the start of a video segment the complex image can be shown without any labels, with labels becoming visible from the moment they are verbally discussed. In this format the total duration as well as the narration remain unchanged, while decreasing overall VTIC through a segmentation presentation style.

Split-Attention Effects

As discussed, subtitles add an additional source of information that needs to be processed, leaving less cognitive resources for learning processes. Additionally, subtitles also draw visual attention, such that less attention is spent on other – possibly important – aspects of the video. Like other cognitive resources, attention is limited. That subtitles can cause a so-called split-attention effect has been made clear by several eye-tracking studies. In general, viewers spend a substantial amount of time paying attention to subtitles (Schmidt-Weigand, Kohnert, & Glowalla, 2010). In a video with a lecturer and subtitles, non-native speakers spent 43% of the time looking at the subtitles (Kruger, Hefer, & Matthew, 2014). The finding that subtitles draw so much attention further signifies their importance. Even when in certain circumstances subtitles are beneficial for learning, it should be taken into account that students will have less attention for other visual information. In situations where subtitles do not significantly aid the learner, a substantial amount of attention will have been wasted. We propose two additional factors contributing to the VTIC of a video.

Attention Cuing

When presented with novel information, it can be difficult to immediately understand where to look. Profound differences in visual search and attention anticipation have been reported for expertise differences in many areas, such as in chess, driving, and clinical reasoning (Chapman & Underwood, 1998; Krupinski et al., 2006; Reingold, Charness, Pomplun, & Stampe, 2001). Given the already high attentional load present in visually complex videos, the presence of subtitles can be expected to have detrimental effects. However, to lower the attentional load, attention can be guided by using attentional cues such as arrows pointing to the most relevant area in a video, or by underling or highlighting these sections. Such attentional cues help novice learners to more effectively direct their attention when and where it is necessary (Boucheix & Lowe, 2010; Ozcelik, Arslan-Ari, & Cagiltay, 2010), possibly lowering detrimental effects of subtitles.

Physical Distances

The final proposed factor of VTIC relates to the physical organization of related information in a video. Specifically, the physical distance between a header (such as a label) and its referent. Nontrivial physical distances between headers and referents are detrimental for learning, as longer distances require more cognitive resources to hold and process information (Mayer, 2008). Additionally, longer distances can induce a split-attention effect, as the increased distances require more attention, which can thus not be spent on other, more relevant parts of the video (Mayer & Moreno, 1998). A split-attention effect can further explain the contradictory findings: Subtitles will cause a split-attention in the presence of other visual information, such as graphics, texts, annotated pictures, or diagrams with textual explanations. Furthermore, physical distances can be manipulated to increase or decrease the VTIC without affecting the educational content itself. Using the earlier example of the complex image with labels, the physical distances between the labels and the position in the image can be changed to manipulate the VTIC of a video.

The Possible Role of Language Proficiency

It is argued that subtitles (whether L1 or L2) are beneficial for the comprehension of L2 video content because they help students bridge the gap between their language proficiency and the target language (Chung, 1999; Vanderplank, 1988). More specifically, it is often easier to understand L2 written text over spoken word, as reading comprehension skills are typically more developed in students (Danan, 2004; Garza, 1991). Perez et al. (2013) report different learning gains based on L2 proficiency, although this study provides insufficient evidence to verify a moderating role of L2 proficiency. Furthermore, L2 subtitles typically draw more attention than subtitles in one’s native language, presumably because L1 subtitles can be processed more automatically and require only peripheral vision (Kruger et al., 2014). A final reason to consider L2 proficiency as an influential factor is because information that is known by a person requires much less, or possible no, cognitive resources to operate in working memory (Diana, Reder, Arndt, & Park, 2006; Sweller et al., 1998). As such, processing L2 subtitles can be expected to require less cognitive resources when a student has a higher L2 proficiency. At first sight, this appears to be in conflict with the argument that subtitles specifically help students with a lower L2 proficiency to bridge the language barrier. A possible integration of these findings would be that with a lower L2 proficiency subtitles do indeed require more effort to process, but can also aid learning only if no other visual information is present.

Putting the Pieces Together

Based on the discussed literature, we would argue that to better understand the effects of subtitles on learning it is essential to consider both language proficiency and the complexity of visual–textual information. For example, consider videos showing a teacher explaining a topic simultaneously with a written summary, annotated pictures, or diagrams with textual explanations. The inclusion of subtitles in such videos can be detrimental for learning, especially for students with a low English proficiency. The amount of different visual sources of information will put more strain on the limited capacity of the visual working memory channel. Not only do the subtitles potentially cause a cognitive overload, but also a split-attention effect.

Hypothesis

The main hypothesis of the proposed model is a three-way interaction effect, specifically:

  • There is a three-way interaction effect between English proficiency, subtitles, and VTIC on test performance. Additionally, we predict specific directions in this three-way interaction:
  • For low-VTIC videos, lower English proficiency is related to a higher performance gain when subtitles are enabled, and
  • For high-VTIC videos, lower English proficiency is related to a higher performance loss when subtitles are enabled.

Note that the hypotheses concern relative differences in performance change. No claim is made about absolute difference between students with different levels of English proficiency, or between videos with different levels of visual–textual information. The underlying reasoning is that the presented framework predicts different effects of subtitles depending on the amount of additional visual information and level of English proficiency, but it does not necessarily predict absolute differences.

Method

Videos

Four types of videos were used: videos with high/low VTIC, and with/without subtitles. To ensure ecological validity, actual videos from MOOCs from the Coursera platform were used as base material; however, to make the videos usable for this experiment they were extensively edited as will be further described. Four videos were used as raw material, which were manipulated to create the four versions of each video, resulting in 16 videos. To manipulate the complexity of the videos, the four proposed VTIC components were used as a guideline, as summarized in Table 1.

Table 1 Overview of the manipulations to create more and less complex versions

All other video characteristics were kept the same for each video. The duration of each video is approximately 7 min, with no differences between the versions of each video. None of the videos in any version show the narrator or teacher. Each video was narrated by the same person to exclude a narrator effect. In the versions with subtitles, the subtitles are shown in the bottom part of the screen where they do not overlap with any other content. The narration and subtitles are verbatim identical. The video topics are: “The Kidney,” “History of Genetics,” “The Visual System,” and “The Peripheral Nervous System.”

English Proficiency Test

To test the hypothesis of the proposed model it was necessary to estimate the English proficiency of the participants. As the goal of this study is to generate results that can be easily implemented in online education, a short and easy-to-implement test was preferred. With this in mind, the English proficiency placement test made by TrackTest was used (TrackTest, 2016). TrackTest is a placement test used to estimate the user’s English proficiency at the level of the widely used Common European Framework of Reference (CEFR) scales (Council of Europe, 2001). The CEFR identifies six levels, from A1 to C2, signifying beginner to advanced proficiency levels. The TrackTest is adaptive, meaning that subsequent questions are based on the performance on earlier questions. In total, each participant was presented 15 multiple-choice questions, from a pool of 90. The test takes less than 10 min to complete. In a pilot test with 800 users who took the test twice, the test–retest reliability was satisfying with a Spearman’s ρ of .736.

Procedure

Upon registration, each participant was randomly allocated to one of four counterbalance lists, which are presented in Table 2.

Table 2 Counterbalance list

The annotations C− and C+ refer to the video versions with decreased and increased levels of VTIC, respectively. Likewise, S+ and S− refer to videos with and without English subtitles, respectively. As shown, each participant views one video in each condition. Before the study, the participants were asked to rate their prior knowledge about each of the four topics. For example, regarding the video on the organization of the human eye, the participants were asked how much they know about the different parts and the organization of the human eye. For these questions 5-point Likert scales were used; participants who scored a 3 or higher (e.g., who self-report a moderate amount of prior knowledge) were excluded from the study, to eliminate a confounding influence of expertise. The participants were allowed to watch a video only once. After every video, the participants were asked to rate how much mental effort they had to invest to understand the video, on a 9-point Likert scale. The difference in average mental effort ratings for the videos high and low in VTIC serves as a measure of manipulation success. Subsequently, participants were presented with a knowledge tests of 10 multiple-choice questions, containing factual questions about the content of the video. After completing the questions participants continued with the next video, until all videos and tests were completed. Afterward, the participants were asked to take the short English proficiency placement test. Finally, they completed some questions about possible technical issues while watching the videos; these questions were asked for quality assurance. No relevant technical issues were reported.

Participants

As this study focuses on online education, participants were recruited and tested online, using the Prolific platform (Prolific Academic, 2015). Participants were considered eligible if they are over 18 years of age and non-native English speakers. Upon completion of the study the participants received €6.50 per hour as compensation. Instead of doing a power analysis to a priori decide on a fixed sample size, the study started with an initial sample size of 50 participants, and sampling continued in batches of 25 until there was sufficient evidence present in the data, as explained in the next section.

Preregistered Analysis Plan

To test the hypothesis, a Bayesian repeated measures ANOVA was performed on the mean test scores with subtitles (yes/no) and visual information (yes/no) as within-subject variables, and English proficiency as between-subject variable (1–5). A Bayesian model comparison was used to decide on the model with the strongest evidence compared with the other models. This analysis was performed in JASP version 0.7.5 (Love et al., 2015), which uses a default Cauchy prior on effect sizes, centered on 0 with a scaling of 0.707, as argued for by Rouder, Morey, Speckman, and Province (2012). A Bayes factor of 3–10 of one model over another is interpreted as moderate evidence, 10–30 as strong, and above 30 as very strong. This analysis was performed after every batch of participants, and sampling continued until one model had a Bayes factor of at least 10 compared with every other model. This was the case after 125 participants. All the analyses described here were done using the data from all 125 participants.

Results

First the descriptive statistics will be shown, followed by the confirmatory analysis, and several exploratory analyses. The data and the analysis scripts are available on the Open Science Framework at: https://osf.io/n6zuf/.

Descriptive Statistics

A total of 125 participants successfully completed the entire study. As is shown in Table 3, the group of participants is well balanced in terms of gender and age.

Table 3 Descriptive statistics

The language proficiency is skewed, with 43% of the participants having a high level, but all levels of language proficiency are sufficiently represented in the sample. Note that all participants are non-native English speakers, including the students with the highest proficiency level of C.

In the study, the participants watched four videos, each in a different condition. Table 4 shows the within-subject differences in test scores and self-reported mental effort ratings for each condition pair.

Table 4 Within-subject differences between conditions

These descriptive results give a mixed image. The mean differences between the conditions with the same complexity but subtitles enabled or disabled are the smallest, both for the test scores and mental effort ratings. The differences between conditions with the same setting for subtitles but different levels of complexity are larger, suggesting a main effect of complexity. Furthermore, this difference appears larger when subtitles are disabled, which might mean there is an interaction between complexity and subtitles. Note that Table 4 does not consider a possible main effect or interaction of language proficiency. The analysis of the full model with all the main effects and interactions is reported in the next section.

Confirmatory Analysis

In accordance with the preregistered analysis plan, a Bayesian repeated measures ANOVA was performed on the test scores, with the following predictors: video complexity (high/low), subtitles (yes/no), and the participant’s language proficiency (1–5). These results are in the form of model comparison; all the possible combinations of main effects and interactions between the three predictors are compared in terms of how well they can explain the data. Note that in contrast to frequentist ANOVAs, multiple comparisons between all models can be performed without the need for corrections. The results of the Bayesian repeated measures ANOVA are displayed in Table 5, which shows all the models in descending order of evidence.

Table 5 Bayes Factors of all models relative to the null model

The results show that Model 1 has the most evidence, which consists only of the main effects of complexity and language proficiency, no main effect of subtitles, and no interactions between any of the factors. This model has nearly 108 times more evidence than the null model. Importantly, the evidence provided by this study favors the complexity + language proficiency model over complexity + subtitles + language proficiency model (which is the second-best model) by a factor of 10.30:1. In other words, there is 10.3 times more evidence for the C + L model than the C + S + L model. Furthermore, every model that does not contain a main effect of subtitles is stronger than its counterpart that includes an effect of subtitles.

The preregistered hypothesis was that the data would be best explained by the full three-way interaction model (Model 15). While there is more evidence for this model than for a null model, the data favor the simpler C + L model by a factor of 400,000:1.

In the last column of Table 5, the posterior probability of each model is shown. When considering only these models, and having no preference for any model before the study, the P(M|D) gives the probability that the model is true, given the data and priors.

Exploratory Analyses

While the Bayes factors quantify the amount of relative evidence provided for the models, they do not provide information about estimations of population parameters such as means and effect sizes. Using Markov chain Monte Carlo (MCMC) methods from the BayesFactor Package in R, we estimated population parameters using all the available data with all the factors and their interactions (Morey & Rouder, 2015; R Core Team, 2016). Chains were constructed with 106 iterations; visual inspection of the chains and auto-correlation plots revealed no quality issues. We put Cauchy priors on the effect size parameters with a scaling factor of 1/2, as further described by Rouder et al. (2012). A Cauchy with a scaling factor of 1/2 has half the probability mass between −0.5 and 0.5, and the remaining half on the more extreme values. In other words, we expect effect sizes of around (–)0.5, but the prior is diffuse enough to be sensitive to more extreme effects. Using much wider or more narrow scaling factors (from 1/6 to 4) does not affect the estimations in a consequential manner. As these priors cover all the effect sizes in the literature discussed, we consider the results to be insensitive to all plausible alternative priors. Complexity and subtitles were entered as factors (yes/no), while language was entered as a continuous variable (1–5). This analysis was done separately for effects on test scores (described in the next section) and on mental effort ratings (described in the subsequent section).

Effects on Test Scores

A visualization of the posterior probability densities of the three main effects on test scores is shown in Figure 1. While Figure 1 shows only the main effects, the entire model with all factors and interactions were used to generate these posterior distributions.

Figure 1 Posterior probability density plots for the effects on test score (1–10).

The density plot shows the most likely values of each effect size parameter, such that any point in the plot that is twice as high as another point is twice as likely. Note how the effect of subtitles is centered around 0, with higher and lower values becoming increasingly unlikely. By contrast, the effect size of complexity is much stronger, and most likely to be around –0.62 (compared with simple). Language has a positive effect, with an effect size slope of around 0.55. Note that these are unstandardized effect sizes measures in grade points, on a scale of 0 to 10. In other words, while the effect of subtitles is mostly likely to be (close to) zero, both complexity and language have a noticeable effect on test scores. Compared with complexity and subtitles, the effect of language proficiency can be estimated with relatively little uncertainty. The parameter estimations of the main effects and all interactions are shown in Table 6.

Table 6 Parameter estimations of intercept and factor effects

As can be seen in Table 6, the difference between two identical videos, which only differ in complexity, is 0.62 grade points (as the difference between low and high complexity is 0.62). Dividing this by the standard deviation results in a Cohen’s d effect size of 0.31. The slope of language proficiency (measured on a scale of 1–5) is 0.55 (Cohen’s d of 0.27), with a 95% credible interval of 0.43–0.68. However, the effect of subtitles is 0.04 grade points (Cohen’s d of 0.02), and we cannot even be certain about the direction of the effect as the credible intervals span both negative and positive values. This means that it is very likely to be (close to) zero. All the interaction effects are similarly centered around 0, with credible intervals that span both negative and positive values. These findings are fully consistent with the confirmatory analysis, which suggested that the best model includes only the main effects of complexity and language proficiency, but not the effect of subtitles or any of the interactions. Only complexity and language proficiency have 95% credible intervals that do not include zero, such that we can be confident about the direction of the effect, while the effects of the other factors are close to zero.

Effects on Mental Effort Ratings

In addition to the effects on test scores, we analyzed the effects of complexity, language proficiency, and subtitles on the participants’ self-reported mental effort ratings of the videos. This analysis is identical to the previous analysis in every aspect other than the different outcome variable. As described earlier, the participants were asked how much mental effort they had to invest in watching and understanding each video on a 9-point Likert scale, higher scores meaning more invested effort.

A visualization of the posterior probability densities of the three main effects on mental effort ratings is shown in Figure 2.

Figure 2 Posterior probability density plots for the effects on mental effort ratings (1–10).

As can be seen in Figure 2, the effects of subtitles and language proficiency on the participants’ mental effort ratings are both centered near 0. For subtitles, the effect is estimated at a 0.015 difference in mental effort ratings, 95% credible interval [− 0.16, 0.19]. The effect of language proficiency is estimated at 0.017, 95% credible interval [− 0.10, 0.14]. Of the three effects, complexity is the only one not centered around zero, and is estimated at 0.24, 95% credible interval [0.06, 0.41]. When transformed into standardized Cohen’s d effect sizes, the effect of subtitles is 0.008, for language proficiency it is 0.010, and complexity has an effect of 0.120. The (unstandardized) effects of the interactions are all smaller than 0.02, which – on a scale of 1 to 10 – is so small they will not be further discussed.

Discussion

Open online education plays an important role in the globalization and democratization of education. To ensure not only the availability but also the accessibility of open online education, it is vital to remove potential obstacles and biases that put certain students at a disadvantage, for example, students with lower levels of English proficiency. This is not yet a given, as MOOCs are still provided primarily in English. In this study, we investigated whether the presence of English subtitles has beneficial or possibly detrimental effects on students’ understanding of the content of English videos. Specifically, we tested the hypothesis that the effect of subtitles on learning depends on the English proficiency of the students and the VTIC of the video. Contrary to this hypothesis, we found strong evidence that there is no main effect of subtitles on learning, nor any interaction, but only a main effect of complexity and language proficiency. We will discuss these findings in that order.

No Main Effect of Subtitles

Contrary to a range of previous studies, we found strong evidence that subtitles neither have a beneficial nor a detrimental effect on learning from educational videos. In addition, the presence or absence of subtitles also appears to have no effect on self-reported mental effort ratings. This is surprising given an apparent consensus that enabling subtitles increases the general accessibility of online content, as is stated by the Web Content Accessibility Guidelines 2.0 (WCAG, 2008). These null findings contradict two lines of research, one showing beneficial effects of subtitles, the other showing detrimental effects.

Earlier research that has shown beneficial effects of subtitles are primarily studies on second language learning, which show that second-language subtitles help students with learning that language (e.g., Baltova, 1999; Chung, 1999; Markham, 1999; Winke et al., 2013). While this appears conflicting with the results of the present study, the important difference is that the current study did not use language-learning videos but content videos, and did not measure gains in second-language proficiency. Based on the current study it seems that for content videos there is little to no benefit of enabling subtitles, even for students with a low language proficiency and for visually complex videos.

A different body of research has shown detrimental effects of subtitles. This is often labeled the redundancy effect, as the reasoning is that because the subtitles are verbatim identical to the narration they are redundant and can only hinder the learning process (e.g., Mayer et al., 2001, 2003). This is in clear contrast to the findings of the current study, which estimates the effect of subtitles to be (close to) zero. Importantly, the English language proficiency of the students did not moderate the effect of subtitles, even though the study included participants with the full range of English proficiency levels. As noted before, it might be that the subtitles helped the students with lower proficiency levels to increase their understanding of English, but it did not affect their test performance. With the Bayesian analyses we showed that subtitles do not merely have an indistinguishable effect (e.g., a nonsignificant effect in frequentist statistics) but that there is strong evidence for the absence of a subtitle effect on learning and mental effort. While these conclusions are only based on the selection of videos used in the current study, it puts the generalizability of the redundancy effect in question by showing that it does not hold for these specific videos, but arguably also for a wider range of similar videos. More research is needed to further establish the potential (lack of) effects of subtitles on learning from videos; both in highly controlled settings as well as in real-life educational settings. Specifically, it is essential to study the generalizability of findings like the redundancy effect and establish boundary conditions. Even though the current study used four different videos, each with four different versions, this is not sufficient to be able to generalize to all kinds of educational videos. However, by manipulating the complexity of the videos, we were able to show that the null effect of subtitles cannot be explained by complexity or element interactivity (Paas, Renkl, & Sweller, 2003; Sweller, 1999). Furthermore, we compared the amount of evidence for a wide range of different models and found that every model that does not include a main effect of subtitles is stronger than its respective alternative model that does include subtitles. In addition, the within-subject design of the study severely reduces the plausibility of confounding participant characteristics. Finally, it is noteworthy that the current study only used second- language subtitles, meaning that providing subtitles in the native language of students can still have a positive effect on learning and accessibility (Hayati & Mohmedi, 2011; Markham et al., 2001).

Main Effect of Complexity

The effect of video complexity shows how video design can have a noticeable effect on test performance, either positively or negatively. In this study, the effect was estimated at 0.62 grade points (on a scale of 0–10), which translates to a Cohen’s d of 0.31. In addition, the self-reported mental effort ratings was 0.24 higher for complex videos (on a scale of 1–10), which is a Cohen’s d of 0.12. This study did not use measures of engagement such as video dwelling time. However, a recent study showed that the textual complexity of videos in open online education explains over 20% of the variance in dwelling time (Van der Sluis, Ginn, & Van der Zee, 2016). As the quizzes took place immediately after each video, the current study only provides insight into how VTIC affects short-term performance on tests. Effects on long-term learning are unknown, but it is plausible that the performance gap remains stable, or even worsens as the test delay increases, since initial (test) performance typically strongly predicts future (test) performance (e.g., Gow et al., 2011; Harackiewicz, Barron, Tauer, Carter, & Elliot, 2000; Karpicke & Roediger, 2007). Furthermore, the current study used individual videos while most online courses have multiple related videos that build on each other. Whether such inter-video dependency strengthens or weakens the effect of VTIC is yet unknown, but warrants further investigation.

In this study, the complexity of the videos was manipulated based on four principles extracted from the literature on multimedia learning. These are the segmentation effect, the signaling effect, the spatial contingency effect, and the coherence effect, all of which are further explained and discussed in the Introduction, as well as by Mayer and Moreno (2003). This resulted in two different versions of each video that differ only in the (mainly visual) complexity of the presentation of information. While the mentioned manipulations have each been investigated independently, this is – to the best of our knowledge – the first study that combined all four to experimentally manipulate the complexity of videos. Surprisingly, while the individual manipulations had effect sizes ranging from Cohen’s d values of 0.48 to 1.36, the combined effect is estimated at a Cohen’s d of 0.31. We note several plausible interpretations for this discrepancy: the effect of the manipulations varies with (a) video characteristics, (b) student characteristics, (c) different implementations, and/or varies with (d) study design. First, while the current study used multiple videos and different versions of each video, a moderating effect of video characteristics cannot be ruled out. For example, the size of the effect might partly depend on characteristics such as the video’s length, educational content, or other aspects that were not manipulated in this study. Should this be the case, this would mean that the generalizability of the four effects are limited by these moderating variables. Secondly, characteristics of the students in the different studies might partly explain the discrepancy in effect sizes. While many of the cited studies used the relatively homogeneous subpopulation of psychology students, the current study used participants from various countries, with varying levels of education as well as levels of English proficiency. Given the wider and less selective range of participants, one would typically expect a more accurate estimation of the size and generalizability of the studied effects. Furthermore, given the within-subject design of the current study, it seems unlikely that potentially relevant participant characteristics confounded the results, which would be more likely in a between-subject design. Thirdly, it is important to note that the current study necessarily employed a specific operationalization of the four effects. For example, there are many ways to operationalize the signaling effect by using attentional cues of different kinds, such as underlining, highlighting, or different kinds of arrows or circles. Given the wide range of possible operationalizations, the variation in the effects of these manipulations is to be expected. While this is likely to be of influence, it remains unclear whether it is a sufficient explanation. Finally, the fourth potential explanation of the difference in effect sizes is based on differences in study design and methodology. For example, the current study took place online, and not in an physical location such as a university. Another potential explanation is in the way that Cohen’s d is calculated, as well as different estimations of the standard deviation, or other choices in statistical procedures that can differ between studies (Baguley, 2009).

In sum, while there are many plausible reasons for the differences in effect sizes, it remains unclear what the exact causes are, and whether these are systematic or due to random variation. This further emphasizes the need to study these instructional design guidelines for videos using a wide range of videos, in different educational contexts, and with a representative sample of participants. While it is unrealistic to expect to be able to accurately predict the effect size of such manipulations with great precision across many different situations, it is paramount for better understanding moderating variables and boundary conditions to make better recommendations of how to make high-quality educational videos.

Main Effect of Language Proficiency

Students with a higher English language proficiency scored substantially higher than students with a lower proficiency. The slope of this effect was estimated to be 0.55 grade points. Given the range of language proficiency of 1–5, the grade point difference between the students with the highest and lowest proficiency levels will be over 2.5 grade points. This further signifies the issue that open online courses such as MOOCs are not equally accessible to everyone, as the majority of the courses are provided in English. By extension, this calls for research on investigating interventions or design strategies that might help close this performance gap. However, it is important to mention that the design of the current study does not directly translate to how non-native English speakers engage with online courses. For example, the participants in this study were not allowed to re-watch or pause videos, take notes, or use any other strategy that might be particularly helpful for non-native speakers in online courses. Students who experience trouble with understanding a video might choose to use such strategies to counteract their initial disadvantage. However, it is also plausible that non-native English speakers are put off by the mainly English online courses, and choose to not engage at all or drop out early in such courses, which should be prevented.

Summary and Consequences for Practice

To summarize, the visual–textual information of a video and especially the language ability of the student are both strong predictors of learning from content videos. By contrast, English subtitles neither increased nor decreased the student’s ability to learn from the videos. However, this does not lead to the conclusion that English subtitles should not be made available, as they are vital for students with hearing disabilities. Furthermore, students might prefer watching videos with subtitles for other reasons, even though this might not directly affect their learning. The extent to which subtitles in the students’ native language might help them cope with lacking English proficiency is as of yet unknown, and remains to be investigated. Another possibility would be to provide dubbed versions of each video to cater to more languages, but this is a costly intervention. Overall, we have shown that both the language proficiency as well as the video’s complexity can have a substantial effect on learning from educational videos, which deserves attention in order to increase the quality and accessibility of open online education.

These results have several consequences for educational practice. First, it is important to make sure that educational videos are designed in such a way that they do not hamper the learning process. Specifically, the visual-textual information complexity of educational videos should not be too high, such as by having too much irrelevant information, or a suboptimal physical organization of information. Secondly, educators of online courses should be aware of the possible detrimental effects of lower levels of English proficiency and aim to help these students as much as possible; merely providing English subtitles is not enough to guarantee accessibility.

Tim van der Zee is a PhD student at ICLON, Leiden University Graduate School of Teaching, The Netherlands. He studies how students learn from educational videos in online learning environments such as MOOCs. As an experimental psychologist, he strives to better understand learning processes and how we can enhance the quality of online education.

Wilfried Admiraal (Leiden University) is full professor Educational Sciences and Academic director of Leiden University Graduate School of Teaching. His research interest combines the use of technology in education and social psychology in secondary and higher education.

Nadira Saab (PhD) is an assistant professor at the ICLON, Leiden University Graduate School of Teaching, The Netherlands. Her research interests involve the impact of powerful and innovative learning methods and approaches on learning processes and learning results, such as collaborative learning, technology enhanced learning, (formative) assessment and motivation.

Fred Paas is a Professor of Educational Psychology at Erasmus University Rotterdam in the Netherlands and a professorial fellow at the University of Wollongong in Australia. His research focuses on cognitive load theory and instructional design.

Bas Giesbers is learning innovation consultant and researcher at the Learning Innovation Team of Rotterdam School of Management. His research interests involve synchronous and asynchronous online communication, motivation, collaborative learning, and the use of learning analytics to understand and improve learning.

References

Tim van der Zee, ICLON, Universiteit Leiden, Wassenaarseweg 62A, 2333 AL Leiden, The Netherlands,