Abstract
Abstract. Over the past few decades, psychology and its cognate disciplines have undergone substantial scientific reform, ranging from advances in statistical methodology to significant changes in academic norms. One aspect of experimental design that has received comparatively little attention is incentivization, i.e., the way that participants are rewarded and incentivized monetarily for their participation in experiments and surveys. While incentive-compatible designs are the norm in disciplines like economics, the majority of studies in psychology and experimental philosophy are constructed such that individuals’ incentives to maximize their payoffs in many cases stand opposed to their incentives to state their true preferences honestly. This is in part because the subject matter is often self-report data about subjective topics, and the sample is drawn from online platforms like Prolific or MTurk where many participants are out to make a quick buck. One mechanism that allows for the introduction of an incentive-compatible design in such circumstances is the Bayesian Truth Serum (BTS; Prelec, 2004), which rewards participants based on how surprisingly common their answers are. Recently, Schoenegger (2021) applied this mechanism in the context of Likert-scale self-reports, finding that the introduction of this mechanism significantly altered response behavior. In this registered report, we further investigate this mechanism by (1) attempting to directly replicate the previous result and (2) analyzing if the Bayesian Truth Serum’s effect is distinct from the effects of its constituent parts (increase in expected earnings and addition of prediction tasks). We fail to find significant differences in response behavior between participants who were simply paid for completing the study and participants who were incentivized with the BTS. Per our pre-registration, we regard this as evidence in favor of a null effect of up to V = .1 and a failure to replicate but reserve judgment as to whether the BTS mechanism should be adopted in social science fields that rely heavily on Likert-scale items reporting subjective data, seeing that smaller effect sizes might still be of practical interest and results may differ for items different from the ones we studied. Further, we provide weak evidence that the prediction task itself influences response distributions and that this task’s effect is distinct from an increase in expected earnings, suggesting a complex interaction between the BTS’ constituent parts and its truth-telling instructions.
References
2017). Bayesian markets to elicit private information. Proceedings of the National Academy of Sciences, 114(30), 7958–7962. 10.1073/pnas.1703486114
(2019). Noncompliant responding: Comparing exclusion criteria in MTurk personality research to improve data quality. Personality and Individual Differences, 143(6), 84–89. 10.1016/j.paid.2019.02.015
(2013). Truth, correspondence, and gender. Review of Philosophy and Psychology, 4(4), 621–638. 10.1007/s13164-013-0155-2
(2020). Recruiting method and its impact on participant behavior. In K. E. Karim (Ed.), Advances in accounting behavioral research (pp. 1–19). Emerald Publishing. 10.1108/S1475-148820200000023001
(2011). Amazon's Mechanical Turk. Perspectives on Psychological Science, 6(1), 3–5. 10.1177/1745691610393980
(1999). The effects of financial incentives in experiments: A review and capital-labor-production framework. Journal of Risk and Uncertainty, 19(1), 7–48. 10.1023/A:1007850605129
(2019). Knowledge-how, understanding-why and epistemic luck: An experimental study. Review of Philosophy and Psychology, 10(4), 701–734
(2014). How well do we report on compensation systems in studies of return to work: A systematic review. Journal of Occupational Rehabilitation, 24(1), 111–124. 10.1007/s10926-013-9435-z
(2013). Evaluating Amazon's Mechanical Turk as a tool for experimental behavioral research. PloS One, 8(3).
(e57410 . 10.1371/journal.pone.0057410.2013). The effect of what we think may happen on our judgments of responsibility. Review of Philosophy and Psychology, 4(2), 259–269. 10.1007/s13164-013-0133-8.
(2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160. 10.3758/brm.41.4.1149
(2017). Validating Bayesian Truth Serum in large-scale online human experiments. PloS ONE, 12(5),
(e0177385 . 10.1371/journal.pone.01773852018). Asking about social circles improves election predictions. Nature Human Behavior, 2(2), 187–193. 10.1038/s41562-018-0302-y
(2015). Public views on policies involving nudges. Review of Philosophy and Psychology, 6(3), 439–453. 10.1007/s13164-015-0263-2
(2019). Improving psychological science through transparency and openness: An overview. Perspectives on Behavior Science, 42(1), 13–31. 10.1007/s40614-018-00186-8
(2016). Attentive turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior Research Methods, 48(1), 400–407. 10.3758/s13428-015-0578-z
(2001). Experimental practices in economics: A methodological challenge for psychologists? Behavioral and Brain Sciences, 24(3), 383–403. 10.1017/s0140525x01004149
(2009). The Wisdom of many in one mind. Psychological Science, 20(2), 231–237. 10.1111/j.1467-9280.2009.02271.x
(2015, May). Incentivizing high quality Crowdwork. In Proceedings of the 24th International Conference on World Wide Web (pp. 419–429). 10.1145/2736277.2741102
(1985). Counterfactual reasoning and accuracy in predicting personal events. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11(4), 719–731. 10.1037/0278-7393.11.1-4.719
(2011). Predicting new product adoption using Bayesian Truth Serum. Journal of Medical Marketing, 11(1), 6–16. 10.1057/jmm.2010.19
(2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. 10.1177/0956797611430953
(2017). An analysis of data quality: Professional panels, student subject pools, and Amazon's Mechanical Turk. Journal of Advertising, 46(1), 141–155. 10.1080/00913367.2016.1269304
(2017). Systems perspective of Amazon Mechanical Turk for organizational research: Review and recommendations. Frontiers in Psychology, 8.
(Article 1359 . 10.3389/fpsyg.2017.013592007). The reporting of monetary compensation in research articles. Journal of Empirical Research on Human Research Ethics, 2(4), 61–67. 10.1525/jer.2007.2.4.61
(1980). Reasons for confidence. Journal of Experimental Psychology: Human Learning and Memory, 6(2), 107–118. 10.1037/0278-7393.6.2.107
(2014). The first cut is the deepest: Effects of social projection and dialectical bootstrapping on judgmental accuracy. Social Cognition, 32(4), 315–336. 10.1521/soco.2014.32.4.315
(2015). The relationship between motivation, monetary compensation, and data quality among US- and India-based workers on Mechanical Turk. Behavior Research Methods, 47(2), 519–528. 10.3758/s13428-014-0483-x
(1984). Considering the opposite: A corrective strategy for social judgment. Journal of Personality and Social Psychology, 47(6), 1231–1243. 10.1037//0022-3514.47.6.1231
(2011). How social influence can undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences, 108(22), 9020–9025. 10.1073/pnas.1008636108
(2014). Incentivizing responses to self-report questions in perceptual Deterrence studies: An investigation of the validity of Deterrence theory using Bayesian Truth Serum. Journal of Quantitative Criminology, 30(4), 677–707. 10.1007/s10940-014-9219-4
(1987). Ten years of research on the false-consensus effect: An empirical and theoretical review. Psychological Bulletin, 102(1), 72–90. 10.1037/0033-2909.102.1.72
(2009, June). Financial incentives and the performance of crowds. In Proceedings of the ACM SIGKDD Workshop on Human Computation (pp. 77–85). 10.1145/1600150.1600175
(2020). Folk intuitions and the conditional ability to do otherwise. Philosophical Psychology, 33(7), 968–996. 10.1080/09515089.2020.1817884.
(2014). Registered reports. Social Psychology, 45(3), 137–141. 10.1027/1864-9335/a000192
(2018). Preregistration becoming the norm in psychological science. APS Observer, 31(3). https://www.psychologicalscience.org/observer/preregistration-becoming-the-norm-in-psychological-science/comment-page-1?pdf=true
(2009). A truth serum for non-Bayesians: Correcting proper scoring rules for risk Attitudes. The Review of Economic Studies, 76(4), 1461–1489. 10.1111/j.1467-937x.2009.00557.x
(2018). Instructional manipulation checks: A longitudinal analysis with implications for MTurk. International Journal of Research in Marketing, 35(2), 258–269. 10.1016/j.ijresmar.2018.01.003
(2017). Beyond the Turk: Alternative platforms for crowdsourcing behavioral research. Journal of Experimental Social Psychology, 70(3), 153–163. 10.1016/j.jesp.2017.01.006
(2022). Data quality of platforms and panels for online behavioral research. Behavior Research Methods, 54(4), 1643–1662. 10.3758/s13428-021-01694-3
(2004). A Bayesian Truth Serum for subjective data. Science, 306(5695), 462–466. 10.1126/science.1102081
(2013). A robust Bayesian truth serum for non-Binary signals. In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI'13) (CONF, pp. 833–839). 10.1609/aaai.v27i1.8677
(2020). Crowdsourcing as a tool for research: Methodological, fair, and political considerations. Bulletin of Science, Technology & Society, 40(3-4), 40–53. 10.1177/02704676211003808
(1977). The "false consensus effect": An egocentric bias in social perception and attribution processes. Journal of Experimental Social Psychology, 13(3), 279–301. 10.1016/0022-1031(77)90049-X
(2015). A reliability analysis of Mechanical Turk data. Computers in Human Behavior, 43(2), 304–307. 10.1016/j.chb.2014.11.004
(2015). A penny for your thoughts: A survey of methods for eliciting Beliefs. Experimental Economics, 18(3), 457–490. 10.1007/s10683-014-9416-x
(2021). Experimental philosophy and the incentivisation challenge: A proposed application of the Bayesian Truth Serum. Review of Philosophy and Psychology, 1–26. 10.1007/s13164-021-00571-4
(2022). Data and materials for “Taking A Closer Look At The Bayesian Truth Serum: A Registered Report”. https://osf.io/5gnzu/
(2006). The double-edged sword of rewards for participation in psychology experiments. Canadian Journal of Behavioural Science / Revue canadienne des sciences du comportement, 38(3), 269–277. 10.1037/cjbs2006014
(2014). The ticking time Bomb: When the use of torture is and is not endorsed. Review of Philosophy and Psychology, 5(4), 543–563. 10.1007/s13164-014-0199-y
(2013). Creating truth-telling incentives with the Bayesian Truth Serum. Journal of Marketing Research, 50(3), 289–302. 10.1509/jmr.09.0039.
(2017). It's what's on the inside that counts… or is it? Virtue and the psychological criteria of modesty. Review of Philosophy and Psychology, 8(3), 653–669. 10.1007/s13164-017-0333-8
(2012). A robust Bayesian Truth serum for small populations. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI’12).
(2019). Long-term forecasts for energy commodities price: What the experts think. Energy Economics, 84,
(104484 . 10.1016/j.eneco.2019.104484