A Causal Replication Framework for Designing and Assessing Replication Efforts
Abstract
Abstract. Replication has long been a cornerstone for establishing trustworthy scientific results, but there remains considerable disagreement about what constitutes a replication, how results from these studies should be interpreted, and whether direct replication of results is even possible. This article addresses these concerns by presenting the methodological foundations for a replication science. It provides an introduction to the causal replication framework, which defines “replication” as a research design that tests whether two (or more) studies produce the same causal effect within the limits of sampling error. The framework formalizes the conditions under which replication success can be expected, and allows for the causal interpretation of replication failures. Through two applied examples, the article demonstrates how the causal replication framework may be utilized to plan prospective replication designs, as well as to interpret results from existing replication efforts.
References
2012).
(Transportability of causal effects: Completeness results . In J HoffmannB. SelmanEds., Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 698–704). Menlo Park, CA: AAAI Press.2015). Social, behavioral, and economic sciences perspectives on robust and reliable science. Report of the Subcommittee on Replicability in Science Advisory Committee to the National Science Foundation Directorate for Social, Behavioral, and Economic Sciences.
(2014). The replication recipe: What makes for a convincing replication? Journal of Experimental Social Psychology, 50, 217–224. https://doi.org/10.1016/j.jesp.2013.10.005
(2012). Evidence-based policy: A practical guide to doing it better. Oxford, UK: Oxford University Press.
(2015). Is economics research replicable? Sixty published papers from thirteen journals say “usually not” [Finance and Economics Discussion Series 2015-083]. Washington, DC: Board of Governors of the Federal Reserve System. https://doi.org/10.17016/FEDS.2015.083
(2017). The meaning of failed replications: A review and proposal. Journal of Economic Surveys, 31, 326–342. https://doi.org/10.1111/joes.12139
(1986). Replication in empirical economics: The Journal of Money, Credit and Banking project. The American Economic Review, 76, 587–603. Retrieved from http://www.jstor.org.proxy01.its.virginia.edu/stable/1806061
(2017). What is meant by “replication” and why does it encounter resistance in economics? American Economic Review, 107, 46–51. https://doi.org/10.1257/aer.p20171031
(2016). Comment on “Estimating the reproducibility of psychological science”. Science, 351, 1037. https://doi.org/10.1126/science.aad7243
(2011). Was Herodotus correct? Prevention Science, 12, 118–120. https://doi.org/10.1007/s11121-011-0218-5
(2020). Causal inference: What if. Boca Raton, FL: Chapman & Hall/CRC.
(2007). Design and analysis of stepped wedge cluster randomized trials. Contemporary Clinical Trials, 28, 182–191. https://doi.org/10.1016/j.cct.2006.05.007
(2015). Causal inference for statistics, social, and biomedical sciences: An introduction. Cambridge, UK: Cambridge University Press.
(2005). Why most published research findings are false. PLoS Medicine, 2, e124. https://doi.org/10.1371/journal.pmed.0020124
(2008). Why most discovered true associations are inflated. Epidemiology, 19, 640–648. https://doi.org/10.1097/EDE.0b013e31818131e7
(2014). A new etiquette for replication. Social Psychology, 45, 310–311. https://doi.org/10.1027/1864-9335/a000202
(2019). Gain scores revisited: A graphical models perspective. Sociological Methods & Research, 1, 1–11. Advance online publication. https://doi.org/10.1177/0049124119826155
(2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45, 142–152. https://doi.org/10.1027/1864-9335/a000178
(1968). Statistical significance in psychological research. Psychological Bulletin, 70, 151–159. https://doi.org/10.1037/h0026141
(1995). How journal editors view replication research. Journal of Advertising, 24, 77–87. https://doi.org/10.1080/00913367.1995.10673490
(2014). Facts are more important than novelty: Replication in the education sciences. Educational Researcher, 43, 304–316. https://doi.org/10.3102/0013189X14545513
(2015). Counterfactuals and causal inference: Methods and principles for social research (2nd ed.). Cambridge, UK: Cambridge University Press.
(2018). Companion guidelines on replication & reproducibility in education research. Retrieved from https://www.nsf.gov/pubs/2019/nsf19022/nsf19022.pdf
. (2017). Making sense of replications. ELife, 6, e23383. https://doi.org/10.7554/eLife.23383
(2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. https://doi.org/10.1126/science.aac4716
. (2009). Unbiased causal inference from an observational study: Results of a within-study comparison. Educational Evaluation and Policy Analysis, 31, 463–479. https://doi.org/10.3102/0162373709343964
(1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701. https://doi.org/10.1037/h0037350
(2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90–100. https://doi.org/10.1037/a0015108
(2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. Journal of the American Statistical Association, 103, 1334–1344. https://doi.org/10.1198/016214508000000733
(2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Cengage Learning.
(2014). The value of direct replication. Perspectives on Psychological Science, 9, 76–80. https://doi.org/10.1177/1745691613514755
(2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12, 1123–1128. https://doi.org/10.1177/1745691617708630
(2017). Graphical models for quasi-experimental designs. Sociological Methods & Research, 46, 155–188. https://doi.org/10.1177/0049124115582272
(2018). Assessing correspondence between experimental and nonexperimental estimates in within-study comparisons. Evaluation Review, 42, 214–247. https://doi.org/10.1177/0193841X18773807
(2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9, 59–71. https://doi.org/10.1177/1745691613514450
(2015). Assessing the generalizability of randomized trial results to target populations. Prevention Science, 16, 475–485. https://doi.org/10.1007/s11121-014-0513-z
(2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38, 239–266. https://doi.org/10.3102/1076998612441947
(2011). Replication in prevention science. Prevention science, 12, 103–117. https://doi.org/10.1007/s11121-011-0217-6
(2018a). Designs of empirical evaluations of nonexperimental methods in field settings. Evaluation Review, 42, 176–213. https://doi.org/10.1177/0193841X18778918
(2018b). Replication designs for causal inference. EdPolicyWorks Working Paper Series. Retrieved from http://curry.virginia.edu/uploads/epw/62_Replication_Designs.pdf
(2012). Research designs for program evaluation. In W. VelicerJ. Schinka (Eds.), Handbook of psychology: Research methods in psychology (2nd ed., Vol. 2) (pp. 316–341). Hoboken, NJ: Wiley.
(2017). Making replication mainstream. Behavioral and Brain Sciences, 41, E120. https://doi.org/10.1017/S0140525X17001972
(