The Science of Technology and Human Behavior
Standards, Old and New
Technology and Human Behavior – a title that could not be more generic for a Special Issue. On its face, the phenomena examined in this issue are not distinct from others published in the Journal of Media Psychology (JMP): Can immersive technology be used to promote pro-environmental behaviors? (Soliman, Peetz, & Davydenko, 2017); How do subtitles and complexity of MOOC-type videos impact learning outcomes? (van der Zee, Admiraal, Paas, Saab, & Gisbers, 2017); Does cooperative video game play foster prosocial behavior? (Breuer, Velez, Bowman, Wulf, & Bente, 2017); What is the role of video game use in the unique risk environment of college students? (Holz Ivory, Ivory, & Lanier, 2017); Do interactive narratives have the potential to advocate social change? (Steineman, Iten, Opwis, Forse, Frasseck, & Mekler, 2017).
What makes this issue special is not a thematic focus, but the nature of the scientific approach to hypotheses testing: It is explicitly confirmatory. All five studies are registered reports which are reviewed in two phases: First, the theoretical background, hypotheses, methods, and analysis plans of a study are peer-reviewed before the data are collected. If they are evaluated as sound, the study receives an “in-principle” acceptance, and researchers proceed to conduct it (taking potential changes or additions suggested by the reviewers into consideration). Consequently, the data collected can be used as a true (dis-)confirmatory hypothesis test. In a second step, the soundness of the analyses and discussion section are reviewed, but the publication decision is not contingent on the outcome of the study (see our call for papers; Elson, Przybylski, & Krämer, 2015). All additional, nonpreregistered analyses conducted are clearly labelled as exploratory and serve to discover alternative explanations or generate new hypotheses.
Further, the authors were required to provide a sampling plan designed to achieve at least 80% statistical power (or comparable criterion for Bayesian analysis strategies) for all of their confirmatory hypothesis tests, and to make all materials, data, and analysis scripts freely available on the Open Science Framework (OSF) at https://osf.io/5cvkr/. We believe that making these materials available to anyone increases the value of the research as it allows others to reproduce analyses, replicate the studies, or build on and extend the empirical foundation. As such, the five studies represent the first in JMP that employ these new practices. It is our hope that these contributions will serve as an inspiration and model for other media researchers, and encourage scientists studying media to preregister designs and share their data and materials openly.
All research proposals were reviewed by content experts from within the field and additional outside experts in methodology and statistics. Their reviews, too, are available on the OSF, and we deeply appreciate their contributions to meliorate each individual research report and their commitment to open and reproducible science: Marko Bachl, Chris Chambers, Julia Erdmann, Pete Etchells, Alexander Etz, Karin Fikkers, Jesse Fox, Chris Hartgerink, Moritz Heene, Joe Hilgard, Markus Huff, Rey Junco, Daniël Lakens, Benny Liebold, Patrick Markey, Jörg Matthes, Candice Morey, Richard Morey, Michèle Nuijten, Elizabeth Page-Gould, Daniel Pietschmann, Michael Scharkow, Felix Schönbrodt, Cary Stothart, Morgan Tear, Netta Weinstein, and additional reviewers who would like to remain anonymous.
Finally, we would like to extend our sincerest gratitude to JMP’s Editor-in-Chief Nicole Krämer and editorial assistant German Neubaum for their support and guidance from the conception to the publication of this issue.
Concerns have been raised about the integrity of the empirical foundation of psychological science, such as the average statistical power and publication bias (Schimmack, 2012), availability of data (Wicherts, Borsboom, Kats, & Molenaar, 2006), and the rate of statistical reporting errors (Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2015).
Currently, there is little information to which extent these issues also exist within the media psychology literature. Therefore, to provide a first prevalence estimate, and to illustrate how some of the practices adopted for this special issue can help reducing these problems, we surveyed research designs, availability of data, errors in the reporting of statistical analyses, and statistical power of studies published in the traditional format in JMP. We analyzed the research published in JMP between volume 20/1, when it became an English-language publication, and volume 28/2 (the most recent issue when this analysis was planned). The raw data, analysis code, and code book are freely available at https://osf.io/5cvkr/.
Sample of Publications
Publications in JMP represent a rich range of empirical approaches. Of the N = 146 original research articles1 identified, nemp = 131 (89.7%) report data from at least one empirical study (147 studies in total2). Of those, more than half are experiments (54.4%) or quasi-experiments (8.8%), followed by cross-sectional surveys (23.8%), and longitudinal studies (7.5%). The rest are content analyses, observational studies, or interview studies (5.4%).
Availability of Data and Materials
Recently, a number of open science initiatives including the Transparency and Openness Promotion Guidelines, the Peer Reviewers’ Openness Initiative, and the Commitment to Research Transparency3 have been successful in raising awareness of the benefits of open science and increasing the rate of publicly shared datasets (Kidwell et al., 2016). Historically the availability of research data in psychology has been poor (Wicherts et al., 2006). Our sample of JMP publications suggests that media psychology is no exception to this, as we were not able to identify a single publication reporting a link to research data in a public repository or the journal’s supplementary materials.4
Statistical Reporting Errors
Most conclusions in empirical media psychology, and psychology overall, are based on Null Hypothesis Significance Tests (NHSTs). Therefore, it is important that all statistical parameters and NHST results be reported accurately. However, a recent study by Nuijten et al. (2015) indicates a high rate of reporting errors in psychological research reports. The consequences of such inconsistencies are potentially serious as the analyses reported and conclusions drawn may not be supported by the data. Similar concerns have been voiced for published empirical studies in communication research (Vermeulen et al., 2015).
To make sure such inconsistencies were avoided for our special issue, we validated all accepted research reports with statcheck (version 1.2.2; Epskamp & Nuijten, 2015), a package for the statistical programming language R (R Core Team, 2016) that works like a spellchecker for NHSTs by automatically extracting reported statistics from documents and recomputing5p-values.
For our own analyses, we downloaded all nemp = 131 JMP publications as HTML files6 and scanned them with statcheck to obtain an estimate for the reporting error rate in JMP. Statcheck extracted a total of K = 1036 NHSTs7 reported in nnhst = 98 articles. Initially, 134 tests were flagged as inconsistent (i.e. reported test statistics and degrees of freedom do not match reported p-values), of which 27 were grossly inconsistent (the reported p-value is < .05 while the recomputed p-value is > .05, or vice-versa). For one paper, a correction had been published, now reporting a consistent p-value. A number of inconsistent tests were marked as being consistent with one-tailed testing. Therefore, we manually checked those papers for any indication that one-tailed tests instead of two-tailed tests were conducted. Four tests were explicitly one-tailed in the corresponding publications, reducing the number to 129 inconsistent NHSTs (12.5% of K), of which 23 (2.2% of K) were grossly inconsistent. Forty-one publications (41.8% of nnhst) reported at least one inconsistent NHST (range 1 to 21), and 16 publications (16.3% of nnhst) reported at least one grossly inconsistent NHST (range 1–4) (see Figure 1). Thus, a substantial proportion of publications in JMP seem to contain inaccurately reported statistical analyses, of which some might affect the conclusions drawn from them.
Types of Errors
Many of the inconsistencies are probably clerical errors that do not alter the inferences or conclusions in any way. For example, in 20 cases the authors reported p = .000, which is mathematically impossible (for each of these precomputed < .001). Other inconsistencies might be explained by authors not declaring that their tests were one-tailed (which is relevant for their interpretation). Of course, in many cases we could not determine the source of errors without being able to access the study data or analysis scripts.
Although nearly one in six of the papers with NHSTs contain gross inconsistencies potentially affecting reported conclusions, caution is advised when speculating about the causes. As with other inconsistencies, random human error certainly plays an important part. However, with some concern, we observe it is unlikely to be the only cause, as in 19 out of 23 cases, the reported p-values were equal to or smaller than .05 while the recomputed p-values were larger than .05, whereas the opposite pattern was observed in only four cases. Indeed, if incorrectly reported p-values resulted merely from clerical errors, we would expect inconsistencies in both directions to occur at approximately equal frequencies.
We acknowledge that before the development of valuable tools like statcheck, there was little awareness of the high prevalence of reporting errors in psychology generally (including media psychology). All of these inconsistencies can easily be detected using the freely available R package statcheck or via www.statcheck.io for those who do not use R. JMP will adopt this practice for all forthcoming papers prior to publication, and we recommend researchers use statcheck for their own manuscripts and for the works of others in their role as reviewers.
Sample Sizes and Statistical Power
High statistical power is paramount to reliably detect true effects in a sample and, thus, to correctly reject the null hypothesis when it is false. Further, low power reduces the confidence that a statistically significant result actually reflects a true effect (Button et al., 2013; Schimmack, 2012). A generally low-powered field is more likely to yield unreliable estimates of effect sizes and low reproducibility of results.
We are not aware of any previous attempts to estimate average power in media psychology. One obvious strategy for estimating average statistical power is to examine the reported power analyses in empirical research articles. For publications in JMP, however, this is difficult, as searching all papers for the word “power” yielded only a single article reporting an a priori determined sample size. This is not to say media psychologists are generally unaware of the concept of power. In 19 further articles, power is indeed mentioned, in many cases to either demonstrate observed or post-hoc power (which is redundant with reported NHSTs, see e.g. Lakens, 2014), to suggest larger samples should be used in future research, or to explain why an observed nonsignificant “trend” would in fact be significant had the statistical power been higher.
Another strategy is to examine the power for different effect sizes, e.g. using Cohen’s (1988) rule of thumb8, given the average sample size (I) found in the literature. The median sample size in JMP is 139 with a considerable range across all experiments and surveys (see Table 1). As in other fields, surveys tend to have healthy sample sizes apt to reliably detect medium to large relationships between variables. The median sample size for survey studies is 327, allowing researchers to detect small bivariate correlations of r = .1 at 44% power (rs = .3 and .5 both > 99%).9 Longitudinal research exhibits similar characteristics, with a median sample size of 378.50, allowing researchers to detect r = .1 at 49% power (rs = .3 and .5 at > 99%).
|Notes. n = Number of published studies; MDI = Median sample size; MI = Mean sample size; SDI = Standard deviation of MI; MinI/MaxI = Smallest/largest reported sample size.|
For experiments (including quasi-experiments), the outlook is a bit different, with a median sample size of 107. To determine average power in experimental designs, two further parameters must be considered: a) the study design (between-subjects or within-subjects), and b) the number of cells (or conditions) realized. Across all types of designs (see Table 2), the median cell size is 30.67. Thus, the average power of experiments published in JMP to detect small differences between conditions (d = .20) is 12%, 49% for medium effects (d = .50), and 87% for large effects (d = .80).
|Design||n||MDi/cell||Mi/cell||SDi/cell||Mini/cell||Maxi/cell||1 − βd=.2||1 − βd=.5||1 − βd=.8|
|Notes. n = Number of published studies; MDi/cell = Median cell size; Mi/cell = Mean cell size; SDi/cell = Standard deviation of Mi/cell; Mini/cell/Maxi/cell = Smallest/largest reported cell size; 1−βd=.2/1−βd=.5/1-βd=.8 = Power to detect small/medium/large differences between cells. For between-subjects, mixed designs, and total we assumed independent t-tests. For within-subjects designs we assumed dependent t-tests.|
Again, we currently do not have reliable estimates of the average true, expected, or even observed effect size in media psychology. But even when assuming the effects examined in the media psychological literature could be as large as to those in social psychology (average d = .43 according to Richard, Bond, & Stokes-Zoota, 2003), our results indicate that the chance that an experiment published in JMP will detect them is worse (at 38%) than flipping a coin – an operation that would also be considerably less expensive. We do not think this is a sustainable way of accumulating scientific knowledge and spending (public) resources.
Psychological and communication scientists use a wide range of methodologies to enhance our understanding of the role of media in human behavior. Unfortunately, like in other fields of social science (Pashler & Harris, 2012), much of what we think we know may be based on a tenuous empirical foundation. As a first estimation in this field, our analysis of JMP publications indicates that materials and data of few, if any, media psychology reports are openly available, many lack the statistical power required to reliably detect the effects they were set out to detect, and a substantial number contain statistical errors of which some might alter the conclusions the research draws. Although these observations are deeply worrying, they provide some clear guiding points on how to improve our field.
Our observations could lead readers to believe that we are concerned about the quality of publications in JMP in particular. If anything, the opposite is true, as this journal recently committed itself to a number of changes in its publishing practices to promote open, reproducible, high-quality research. The space provided by the editor-in-chief for our analysis is simply another step in this phase of sincere self-reflection. Similar analyses in other fields suggest that the issues we discuss here go far beyond media psychology (or the JMP): About half of all articles in major psychology journals report at least one inconsistent NHST (Nuijten et al., 2015) and at least one mean value that is inconsistent with the sample size and integer data (Brown & Heathers, 2016). Estimates of average statistical power in social psychology are similar to those in JMP (Fraley & Vazire, 2014), but as low as 18% in neuroscience (Button et al., 2013).
Thus, we would like these findings, troubling as they are, to be taken not as a verdict, but as an opportunity for researchers, journals, and organizations to reflect similarly on their own practices and hence improve the field as a whole.
Construction and Testing of Theories
One key area which would be improved in response to these challenges is how researchers create, test, and refine psychological theories used to study media. Like other psychology subfields, media psychology is characterized by frequent emergence of new theories which purport to explain phenomena of interest (Anderson, 2016). This generativity may, in part, be a consequence of the fuzzy boundaries between exploratory and confirmatory modes of social sciences research (Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012).
Exploratory work, research meant to introduce new ideas and generate hypotheses, often involves taking a look at the results of a study and flexibly listening to “what the data have to say”. This mode is fundamental to social science work in general and also creatively informs media psychology. However, conclusions drawn from this mode are of limited strength, as they may reflect chance variations in the data. Confirmatory work, by contrast, involves theory testing, a process that requires research questions and hypotheses to be clearly stated in advance of data collection. This mode allows researchers and audiences to trust the results of a study rigorously testing a specific prediction. A problem confronting theories in media psychology is that the boundaries between exploratory and confirmatory work are often blurred. This means young scholars, parents, and policymakers cannot know which studies present tentative results and which ones are worthy of investing limited resources to build on or implement in the real world.
Articles in this special issue provide a clear example of how exploratory and confirmatory modes of research can coexist and how science can thrive as a result. Most present both exploratory and confirmatory elements, each clearly labeled. As a result, it is easier to see and understand what the researchers were expecting based on knowledge of the relevant literature, and what they eventually found in their studies. It allows readers to build a clearer idea of the research process and the elements of the studies that came as inspiration after the reviews and the data collection were completed.
Both modes of research – studying previously observed phenomena and exploring uncharted territory – benefit from preregistration. Drawing this distinction helps the reader determine which hypotheses carefully test ideas derived from theory and previous empirical, and it liberates exploratory research from the pressure to present an artificial hypothesis-testing narrative. The Registered Reports model is also an effective countermeasure against psychology’s aversion to statistically nonsignificant or “null” results, as study protocols are reviewed and accepted before results are known. Further, it ensures that p-values can be meaningfully interpreted, given these lose their meaning in data exploration as the Type I error inflation is unknown (Wagenmakers et al., 2012). If adopted by media psychologists, this approach could allow us to rigorously test and extend promising theories, and to retire theories which do not reliably account for observed data.
Increasing the Value of Media Psychology With Open Science Tools
As technology experts, media psychology researchers are well positioned to use and study new tools which shape our science. A range of new Internet-based platforms have been built by scientists and engineers at the Center for Open Science, including their flagship, the OSF (http://www.osf.io), and preprint services like PsyArXiv (http://www.psyarxiv.com) and SocArXiv (http://www.socarxiv.com). Designed to work with scientists’ existing research flows, these tools can help prevent data loss due to hardware malfunctions, misplacement, or relocations of researchers, while enabling scientists to claim more credit by allowing others to use and cite their materials, protocols, and data.10
The High Stakes Facing Media Psychology
Like psychological science as a whole, media psychology faces a pressing credibility gap. Unlike some other areas of psychological inquiry, however, media research – whether concerning the Internet, video games, or film – speaks directly to everyday life in the modern world. It affects how the public forms their perceptions of media effects (Przybylski & Weinstein, 2016), and how professional groups and governmental bodies make policies and recommendations (Council on Communications and Media, 2016). In part because it is key to professional policy, empirical findings disseminated to caregivers, practitioners, and educators should be built on an empirical foundation with sufficient rigor. If policy makers and the public are to value our views as experts, we must take steps to demonstrate this trust is warranted. Such challenges and high stakes are by no means unique to media psychology.
Indeed, in medical and drug research, the study registration movement (Goldacre & Gray, 2016) has lead the way with publicly accessible registries that include all the studies, published or not, conducted in a research area. To build good faith for the general public, industry collaborators, and policy makers, we propose the creation of a registry (https://osf.io/registries/) for confirmatory media psychology research. Creative exploratory research (i.e., theory building) would continue as it does now, but confirmatory work (i.e., theory testing) could be registered in a central repository so that its results – positive, null, or negative – would be available for scrutiny. This would also allow media psychologists to educate the general public and journalists about the distinction between exploratory and confirmatory research. Through such a public registry, researchers and policy makers could quickly determine which evidence is promising (though tentative), and which conclusions are suitable as the basis for interventions, policy decision making, caregiver guidance, or new products.
We are, on balance, optimistic that media psychologists can meet these challenges and lead the way for psychologists in other areas. This special issue and the registered reports submission track present an important step in this direction and we thank the JMP editorial board, our expert reviewers, and of course, the dedicated researchers who devoted their limited resources to this effort. The promise of building an empirically-based understanding of how we use, shape, and are shaped by technology is an alluring one. We firmly believe that incremental steps taken towards scientific transparency and empirical rigor will help us realize this potential.
We sincerely thank Charleen Brand and Hannah Borgmann for their invaluable assistance with the data preparation for this editorial.
Malte Elson (PhD) is a behavioral psychologist and postdoc in the Educational Psychology Research Group at Ruhr University Bochum. He studies human learning in various contexts, such as the contingencies of behaviors in academic research (meta science), human interaction with technology, and effects of entertainment media.
Andrew Przybylski (PhD) is a senior research fellow based at the Oxford Internet Institute and Department of Experimental Psychology at the University of Oxford. His research focuses on applying motivational theory to understand the universal aspects of video games and social media that draw people in, the role of game structure and content on human aggression, and the factors that lead to successful versus unsuccessful self-regulation of gaming contexts and social media use.
1Editorials, calls for papers, volume information tables, meeting calendars, and other announcements were excluded.
2Pilot studies were excluded.
4It is, of course, entirely possible that some authors have made their data publicly available without clarifying this in the publication.
5p-values are recomputed from the reported test statistics and degrees of freedom. Thus, for the purpose of recomputation, it is assumed that test statistics and degrees of freedom are correctly reported, and that any inconsistency is caused by errors in the reporting of p-values. The actual inconsistencies, however, can just as well be caused by errors in the reporting of test statistics and/or degrees of freedom.
7Note that statcheck might not extract NHSTs from figures, tables, supplementary materials, or when their reporting style deviates from the APA guidelines. For further details on the extraction method see Nuijten et al. (2015).
9Naturally, when anticipating more complex relationships between multiple variables, those numbers are dramatically different.
2016). Communication descending. International Communication Gazette, 78(7), 612–620. doi: 10.1177/1748048516655708(
2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376. doi: 10.1038/nrn3475(
2017). “Drive the lane; together, hard!” An examination of the effects of supportive co-playing, and task difficulty on prosocial behavior. Journal of Media Psychology, 29, 31–41. doi: 10.1027/1864-1105/a000209(
2016). The GRIM Test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science. doi: 10.1177/1948550616673876(
2016). pwr: Basic functions for power analysis. Retrieved from http://cran.r-project.org/package=pwr(
1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.(
2016). Media and young minds. Pediatrics, 138(5), e20162591. doi: 10.1542/peds.2016-2591(
2015). Technology and human behavior: A preregistered special issue of the Journal of Media Psychology. Journal of Media Psychology, 27(4), 203–204. doi: 10.1027/1864-1105/a000170(
2015). statcheck: Extract statistics from articles and recompute p values. Retrieved from https://cran.r-project.org/package=statcheck(
2014). The N-pact factor: Evaluating the quality of empirical journals with respect to sample size and statistical power. PLoS One, 9(10), e109019. doi: 10.1371/journal.pone.0109019(
2016). OpenTrials: Towards a collaborative open database of all available information on all clinical trials. Trials, 17(1), 164. doi: 10.1186/s13063-016-1290-8(
2017). Video game use as risk exposure, protective incapacitation, or inconsequential activity among university students: Comparing approaches in a unique risk environment. Journal of Media Psychology, 29, 42–53. doi: 10.1027/1864-1105/a000210(
2016). Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biology, 14(5), e1002456. doi: 10.1371/journal.pbio.1002456(
2014). Observed power, and what to do if your editor asks for post-hoc power analyses. The 20% Statistician. Retrieved from http://daniellakens.blogspot.de/2014/12/observed-power-and-what-to-do-if-your.html(
2015). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods. doi: 10.3758/s13428-015-0664-2(
2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7(6), 531–536. doi: 10.1177/1745691612463401(
2016). How we see electronic games. PeerJ, 4, e1931. doi: 10.7717/peerj.1931(
2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.r-project.org. (
2003). One hundred years of social psychology quantitatively described. Review of General Psychology, 7(4), 331–363. doi: 10.1037/1089-26188.8.131.521(
2016). waffle: Create waffle chart visualizations in R.(
2012). The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods, 17(4), 551–566. doi: 10.1037/a0029487(
2017). The impact of immersive technology on nature relatedness and pro-environmental behavior. Journal of Media Psychology, 29, 8–17. doi: 10.1027/1864-1105/a000213(
2017). Interactive narratives affecting social change: A closer look at the relationship between interactivity and prosocial behavior. Journal of Media Psychology, 29, 54–66. doi: 10.1027/1864-1105/a000211(
2017). Effects of subtitles, complexity, and language proficiency on learning from online education videos. Journal of Media Psychology, 29, 18–30. doi: 10.1027/1864-1105/a000208(
2015). Blinded by the light: How a focus on statistical “significance” may cause p-value misreporting and an excess of p-values just below .05 in communication science. Communication Methods and Measures, 9(4), 253–279. doi: 10.1080/19312458.2015.1096333(
2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632–638. doi: 10.1177/1745691612463078(
2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726–728. doi: 10.1037/0003-066X.61.7.726(