Skip to main content
Registered Report

Taking a Closer Look at the Bayesian Truth Serum

A Registered Report

Published Online:https://doi.org/10.1027/1618-3169/a000558

Abstract. Over the past few decades, psychology and its cognate disciplines have undergone substantial scientific reform, ranging from advances in statistical methodology to significant changes in academic norms. One aspect of experimental design that has received comparatively little attention is incentivization, i.e., the way that participants are rewarded and incentivized monetarily for their participation in experiments and surveys. While incentive-compatible designs are the norm in disciplines like economics, the majority of studies in psychology and experimental philosophy are constructed such that individuals’ incentives to maximize their payoffs in many cases stand opposed to their incentives to state their true preferences honestly. This is in part because the subject matter is often self-report data about subjective topics, and the sample is drawn from online platforms like Prolific or MTurk where many participants are out to make a quick buck. One mechanism that allows for the introduction of an incentive-compatible design in such circumstances is the Bayesian Truth Serum (BTS; Prelec, 2004), which rewards participants based on how surprisingly common their answers are. Recently, Schoenegger (2021) applied this mechanism in the context of Likert-scale self-reports, finding that the introduction of this mechanism significantly altered response behavior. In this registered report, we further investigate this mechanism by (1) attempting to directly replicate the previous result and (2) analyzing if the Bayesian Truth Serum’s effect is distinct from the effects of its constituent parts (increase in expected earnings and addition of prediction tasks). We fail to find significant differences in response behavior between participants who were simply paid for completing the study and participants who were incentivized with the BTS. Per our pre-registration, we regard this as evidence in favor of a null effect of up to V = .1 and a failure to replicate but reserve judgment as to whether the BTS mechanism should be adopted in social science fields that rely heavily on Likert-scale items reporting subjective data, seeing that smaller effect sizes might still be of practical interest and results may differ for items different from the ones we studied. Further, we provide weak evidence that the prediction task itself influences response distributions and that this task’s effect is distinct from an increase in expected earnings, suggesting a complex interaction between the BTS’ constituent parts and its truth-telling instructions.

References