Skip to main content
Open AccessOriginal Article

A Causal Replication Framework for Designing and Assessing Replication Efforts

Published Online:https://doi.org/10.1027/2151-2604/a000385

Abstract. Replication has long been a cornerstone for establishing trustworthy scientific results, but there remains considerable disagreement about what constitutes a replication, how results from these studies should be interpreted, and whether direct replication of results is even possible. This article addresses these concerns by presenting the methodological foundations for a replication science. It provides an introduction to the causal replication framework, which defines “replication” as a research design that tests whether two (or more) studies produce the same causal effect within the limits of sampling error. The framework formalizes the conditions under which replication success can be expected, and allows for the causal interpretation of replication failures. Through two applied examples, the article demonstrates how the causal replication framework may be utilized to plan prospective replication designs, as well as to interpret results from existing replication efforts.

References

  • Bareinboim, E., & Pearl, J. (2012). Transportability of causal effects: Completeness results. In J HoffmannB. SelmanEds., Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 698–704). Menlo Park, CA: AAAI Press. First citation in articleGoogle Scholar

  • Bollen, K., Cacioppo, J., Kaplan, R., Krosnick, J. A., & Olds, J. L. (2015). Social, behavioral, and economic sciences perspectives on robust and reliable science. Report of the Subcommittee on Replicability in Science Advisory Committee to the National Science Foundation Directorate for Social, Behavioral, and Economic Sciences. First citation in articleGoogle Scholar

  • Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., … Van’t Veer, A. (2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217–224. https://doi.org/10.1016/j.jesp.2013.10.005 First citation in articleCrossrefGoogle Scholar

  • Cartwright, N., & Hardie, J. (2012). Evidence-based policy: A practical guide to doing it better. Oxford, UK: Oxford University Press. First citation in articleCrossrefGoogle Scholar

  • Chang, A. C., & Li, P. (2015). Is economics research replicable? Sixty published papers from thirteen journals say “usually not” [Finance and Economics Discussion Series 2015-083]. Washington, DC: Board of Governors of the Federal Reserve System. https://doi.org/10.17016/FEDS.2015.083 First citation in articleGoogle Scholar

  • Clemens, M. A. (2017). The meaning of failed replications: A review and proposal. Journal of Economic Surveys, 31, 326–342. https://doi.org/10.1111/joes.12139 First citation in articleCrossrefGoogle Scholar

  • Dewald, W. G., Thursby, J. G., & Anderson, R. G. (1986). Replication in empirical economics: The Journal of Money, Credit and Banking project. The American Economic Review, 76, 587–603. Retrieved from http://www.jstor.org.proxy01.its.virginia.edu/stable/1806061 First citation in articleGoogle Scholar

  • Duvendack, M., Palmer-Jones, R., & Reed, W. R. (2017). What is meant by “replication” and why does it encounter resistance in economics? American Economic Review, 107, 46–51. https://doi.org/10.1257/aer.p20171031 First citation in articleCrossrefGoogle Scholar

  • Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on “Estimating the reproducibility of psychological science”. Science, 351, 1037. https://doi.org/10.1126/science.aad7243 First citation in articleCrossrefGoogle Scholar

  • Hansen, W. B. (2011). Was Herodotus correct? Prevention Science, 12, 118–120. https://doi.org/10.1007/s11121-011-0218-5 First citation in articleCrossrefGoogle Scholar

  • Hernán, M.A., & Robins, J.M. (2020). Causal inference: What if. Boca Raton, FL: Chapman & Hall/CRC. First citation in articleGoogle Scholar

  • Hussey, M. A., & Hughes, J. P. (2007). Design and analysis of stepped wedge cluster randomized trials. Contemporary Clinical Trials, 28, 182–191. https://doi.org/10.1016/j.cct.2006.05.007 First citation in articleCrossrefGoogle Scholar

  • Imbens, G. W., & Rubin, D. B. (2015). Causal inference for statistics, social, and biomedical sciences: An introduction. Cambridge, UK: Cambridge University Press. First citation in articleCrossrefGoogle Scholar

  • Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2, e124. https://doi.org/10.1371/journal.pmed.0020124 First citation in articleCrossrefGoogle Scholar

  • Ioannidis, J. P. A. (2008). Why most discovered true associations are inflated. Epidemiology, 19, 640–648. https://doi.org/10.1097/EDE.0b013e31818131e7 First citation in articleCrossrefGoogle Scholar

  • Kahneman, D. (2014). A new etiquette for replication. Social Psychology, 45, 310–311. https://doi.org/10.1027/1864-9335/a000202 First citation in articleGoogle Scholar

  • Kim, Y., & Steiner, P. M. (2019). Gain scores revisited: A graphical models perspective. Sociological Methods & Research, 1, 1–11. Advance online publication. https://doi.org/10.1177/0049124119826155 First citation in articleGoogle Scholar

  • Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., … Nosek, B. A. (2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45, 142–152. https://doi.org/10.1027/1864-9335/a000178 First citation in articleLinkGoogle Scholar

  • Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin, 70, 151–159. https://doi.org/10.1037/h0026141 First citation in articleCrossrefGoogle Scholar

  • Madden, C. S., Easley, R. W., & Dunn, M. G. (1995). How journal editors view replication research. Journal of Advertising, 24, 77–87. https://doi.org/10.1080/00913367.1995.10673490 First citation in articleCrossrefGoogle Scholar

  • Makel, M. C., & Plucker, J. A. (2014). Facts are more important than novelty: Replication in the education sciences. Educational Researcher, 43, 304–316. https://doi.org/10.3102/0013189X14545513 First citation in articleCrossrefGoogle Scholar

  • Morgan, S. L., & Winship, C. (2015). Counterfactuals and causal inference: Methods and principles for social research (2nd ed.). Cambridge, UK: Cambridge University Press. First citation in articleGoogle Scholar

  • National Science Foundation and Institute of Education Sciences. (2018). Companion guidelines on replication & reproducibility in education research. Retrieved from https://www.nsf.gov/pubs/2019/nsf19022/nsf19022.pdf First citation in articleGoogle Scholar

  • Nosek, B. A., & Errington, T. M. (2017). Making sense of replications. ELife, 6, e23383. https://doi.org/10.7554/eLife.23383 First citation in articleCrossrefGoogle Scholar

  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. https://doi.org/10.1126/science.aac4716 First citation in articleCrossrefGoogle Scholar

  • Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31, 463–479. https://doi.org/10.3102/0162373709343964 First citation in articleCrossrefGoogle Scholar

  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701. https://doi.org/10.1037/h0037350 First citation in articleCrossrefGoogle Scholar

  • Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90–100. https://doi.org/10.1037/a0015108 First citation in articleCrossrefGoogle Scholar

  • Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. Journal of the American Statistical Association, 103, 1334–1344. https://doi.org/10.1198/016214508000000733 First citation in articleCrossrefGoogle Scholar

  • Shadish, W. R., Cook, T. D., & Cambell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Cengage Learning. First citation in articleGoogle Scholar

  • Simons, D. J. (2014). The value of direct replication. Perspectives on Psychological Science, 9, 76–80. https://doi.org/10.1177/1745691613514755 First citation in articleCrossrefGoogle Scholar

  • Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12, 1123–1128. https://doi.org/10.1177/1745691617708630 First citation in articleCrossrefGoogle Scholar

  • Steiner, P. M., Kim, Y., Hall, C., & Su, D. (2017). Graphical models for quasi-experimental designs. Sociological Methods & Research, 46, 155–188. https://doi.org/10.1177/0049124115582272 First citation in articleCrossrefGoogle Scholar

  • Steiner, P. M., & Wong, V. C. (2018). Assessing correspondence between experimental and nonexperimental estimates in within-study comparisons. Evaluation Review, 42, 214–247. https://doi.org/10.1177/0193841X18773807 First citation in articleCrossrefGoogle Scholar

  • Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9, 59–71. https://doi.org/10.1177/1745691613514450 First citation in articleCrossrefGoogle Scholar

  • Stuart, E. A., Bradshaw, C. P., & Leaf, P. J. (2015). Assessing the generalizability of randomized trial results to target populations. Prevention Science, 16, 475–485. https://doi.org/10.1007/s11121-014-0513-z First citation in articleCrossrefGoogle Scholar

  • Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38, 239–266. https://doi.org/10.3102/1076998612441947 First citation in articleCrossrefGoogle Scholar

  • Valentine, J. C., Biglan, A., Boruch, R. F., Castro, F. G., Collins, L. M., Flay, B. R., … Schinke, S. P. (2011). Replication in prevention science. Prevention science, 12, 103–117. https://doi.org/10.1007/s11121-011-0217-6 First citation in articleCrossrefGoogle Scholar

  • Wong, V. C., & Steiner, P. M. (2018a). Designs of empirical evaluations of nonexperimental methods in field settings. Evaluation Review, 42, 176–213. https://doi.org/10.1177/0193841X18778918 First citation in articleCrossrefGoogle Scholar

  • Wong, V. C., & Steiner, P. M. (2018b). Replication designs for causal inference. EdPolicyWorks Working Paper Series. Retrieved from http://curry.virginia.edu/uploads/epw/62_Replication_Designs.pdf First citation in articleGoogle Scholar

  • Wong, V. C., Wing, C., Steiner, P. M., Wong, M., & Cook, T. D. (2012). Research designs for program evaluation. In W. VelicerJ. Schinka (Eds.), Handbook of psychology: Research methods in psychology (2nd ed., Vol. 2) (pp. 316–341). Hoboken, NJ: Wiley. First citation in articleGoogle Scholar

  • Zwaan, R., Etz, A., Lucas, R. E., & Donnellan, B. (2017). Making replication mainstream. Behavioral and Brain Sciences, 41, E120. https://doi.org/10.1017/S0140525X17001972 First citation in articleCrossrefGoogle Scholar