Skip to main content
Free AccessShort Research Article

Collective Punishment and Cheating in the Die-Under-the-Cup Task

Published Online:https://doi.org/10.1027/1618-3169/a000543

Abstract

Abstract: A popular tool in the experimental research on dishonest behavior is the die-under-the-cup (DUTC) task in which subjects roll a die in private and report the outcome to the experimenter after being promised a payoff which increases with the die’s outcome. The present paper reports the results of incorporating collective punishment into the DUTC task. We ran two experiments, each involving two rounds of the task performed in a computer lab. Despite being asked not to cheat, the average reported outcome in the first round exceeded the statistical expectancy of 3.5. The second round of the first experiment involved the threat that if this happened again, each subject would be fined by the difference between the average reported outcome and 3.5. Nevertheless, the average reported outcome in the second round significantly exceeded that of the first round. Running a second experiment, this time without the punishment threat, we ruled out the possibility that the increased cheating in the second round of the first experiment was due to a feedback effect, concluding that the threat of collective punishment acted to encourage cheating rather than helped deterring it.

Collective punishment is sometimes imposed on a particular group when detecting the offender, known to be a member of this group (e.g., a class prankster), is difficult or impossible. While having the benefit of punishing the offender with certainty, it has the drawback of wrongfully punishing the innocents.1 A few studies have explored experimentally the effects of collective punishment and what shapes support for it: Dickson (2007) conducted a lab experiment of a public good game with an outside authority that wishes to either maximize or minimize public good production by some group members, concluding that in the latter case collective punishment is counterproductive, provoking more public good provision rather than inhibiting it; Chapkovskii (2018), experimenting with a public good game, found counterintuitively that sanctions applied to an entire group for the free-riding of one of its members did not increase the level of cooperation within that group; Pereira et al. (2015) and Pereira and van Prooijen (2018) ran several experiments to show, respectively, that perceptions of collective responsibility shaped support for collective punishment to a greater extent for democratic groups than for nondemocratic ones and that group entitativity facilitated collective punishment when a few group members committed an offense. A few studies have also addressed the effects of collective punishment in theoretical setups: Micely and Segerson (2007) formalized an economic model to examine the conditions under which collective punishment is preferred over individual punishment when social welfare depends on either fairness of punishment or deterrence; Leshem and Wickelgren (2019) presented a simple model featuring a leader and a group of players to examine whether collective punishment can help deter group leaders from organizing and facilitating norm violations by group members. Gao et al. (2015) and Bolle (2021) designed games to show, respectively, that collective punishment may work against criminals but would never deter freedom fighters and that it is more effective than collective reward for promoting cooperation.

The present paper reports the results of incorporating collective punishment into the die-under-the-cup (DUTC) task, which is a popular paradigm employed to detect cheating in experimental studies of dishonest behavior. In this task, introduced by Fischbacher and Föllmi-Heusi (2013), participants roll a six-sided fair die in private (under a cup or at some other hidden place) and are promised a payoff according to the outcome of the roll (e.g., 1, 2, 3, 4, 5, or 6 US dollars for the corresponding die number rolled) which they report to the experimenter. While the DUTC task provides incentives for dishonest overreporting of the actual die outcome, it only allows us to elucidate the aggregate (not the individual) level of cheating among participants as a group by comparing the average reported outcome to the statistical expectancy of 3.5 in multiple rolling of a fair die (e.g., Ruffle & Tobol, 2017; Schurr & Ritov, 2016; Tobol et al., 2020; and Yaniv et al., 2019).2

We report the results of two experiments, each involving two rounds of the DUTC task performed in a computer lab. In the first round, participants were instructed to click their die-rolling outcome on their computers and, contrary to the standard practice in DUTC tasks, were specifically asked not to cheat. At the end of this round, upon receiving the average reported outcome of the group on our computer, we announced that despite being requested not to cheat, we regrettably found that the group had cheated and explained that we figured this out by comparing the average reported outcome of the group to 3.5. The second round of the first experiment involved the threat that, if this happened again, we would fine each participant by the difference between the average reported outcome and 3.5. Nevertheless, the average reported outcome in the second round significantly exceeded that of the first round. Running a second experiment, this time without the punishment threat in the second round, we ruled out the possibility that the increase in cheating in the second round of the first experiment was due to the feedback we provided participants on the results of the first round. We thus conclude that the threat of collective punishment acted to encourage cheating rather than helped deterring it.

Experiment 1

Method and Participants

The first experiment took place in May 2021 at the College of Management Academic Studies which is the largest private college in Israel. Toward the end of the second semester, second-year economics students participating in the course of intermediate macroeconomics were informed that the next class would take place in a newly built computer lab which contains more than 70 seats with personal computers. Since at that time most of the mature population in Israel has already been vaccinated against COVID-19, there were no restrictions on filling up the lab.

About half an hour before the end of class, we invited students to participate in a 30-minute experiment with the promise of monetary payoffs. While three students preferred to leave the lab, for whatever reasons, 49 students remained (27 male students, 22 female students). Although the group size was relatively small, it was essential for ensuring that participants knew, or at least recognized, each other and may have a second thought before making a decision that would hurt their fellow students.

Each participant was handed a standard die and a sticker with a running number (starting at 10) which was attached to their shirt. We then announced as follows (translated from Hebrew):

“Please roll your die in private (under your palm, for example) and report the outcome in the designated area on your computer. That is, please key in one of the numbers 1, 2, 3, 4, 5, or 6 in accordance with the outcome shown on your die. Then, type in parentheses the number appearing on your sticker and click the SEND button. Each dot shown on your die will reward you with 10 NIS.3 For example, if your die shows five dots, you will gain 50 NIS. Payment will take place anonymously, by matching the number you typed in parentheses to the number shown on your sticker. But pay attention: you are requested to report the true outcome of your die-rolling and not to cheat.”

Upon receiving the die-outcome reports from all participants, our computer immediately calculated the average reported outcome of the group, which reached 4.18, confirming also that it was significantly higher than the statistical expectation of 3.5. We then readdressed the group as follows:

“We have received your die-rolling reports. We regret to say that despite asking you to report the true outcome of the die and not to cheat — many of you have cheated. How do we know that? When you roll a fair die many times, the average outcome will be 3.5. However, the average outcome of your reports is 4.18, which is significantly higher than 3.5. This implies that many of you have cheated by overreporting the true outcome of their die rolling. We cannot determine who has cheated individually, but we are able to determine that as a group you have definitely cheated. We do not feel right to compensate cheaters. Therefore, we will view this round as a trial round with no payoffs and invite you now to roll your die once more, only this time, if the average outcome of your reports happens again to exceed 3.5, we will fine each one of you (with the exception of those who report an outcome of 1 and evidently have not cheated) by the difference between the average reported outcome and 3.5. For example, if the average outcome reported by the group is again 4.18, we will fine each one of you by 0.68 dots, the worth of which is 6.8 NIS.”

After receiving participants’ reports of the second round, we announced as follows:

“We regret to say that despite the threat of paying a fine in case of cheating, the average outcome reported by the group is 5.02, which is even greater than the average outcome reported in the first round. Therefore, we will fine each one of you (with the exception of those reporting 1) by 1.52 dots, the worth of which is 15.2 NIS. Please line up to receive your payment.”

Results

As we announced in the experiment, the average die-rolling outcome reported by the group was 4.18 (SD = 1.67) and 5.02 (SD = 1.40) in the first and second rounds, respectively. The difference in the mean reported outcome between the second and first rounds is statistically significant [t(96) = 2.677; p = .008]. The difference between the mean reported outcome in the first round and the statistical expectancy of 3.5 is also statistically significant [t(48) = 2.850; p = .006], implying that participants as a group lied to us in both rounds. Figure 1 illustrates the frequencies of participants’ reported outcomes in the two rounds. As shown in this figure, while in the first round, the frequencies of reporting the outcomes 1–4 lie just a bit below the line of 0.167 (which represents the statistical expectancy of each outcome), in the second round they lie substantially below this line, compensated by a moderate increase in the frequency of overreporting the outcome of 5 (from 20% to 25%) and a substantial increase in the frequency of overreporting the outcome of 6 (from 30% to 55%).

Figure 1 Frequency of reported die outcome.

Experiment 2

Had the threat of collective punishment acted to discourage cheating, we would have rested. But observing an increase in the level of cheating raised a concern that the feedback we provided participants on the results of the first round might bear the blame: Having been informed that there is a significant number of cheaters could have made it acceptable to cheat to some participants who were honest in the first round. Hence, we did not know whether the increase in cheating in the second round was due to the threat of punishment or simply to repetition with feedback. This could only be addressed by running an additional experiment with two rounds and feedback, but without the punishment threat.

Method and Participants

The 2nd experiment was conducted in October 2021 at the same setting with a new group of intermediate macroeconomics students. After inviting students to participate in the experiment, 7 students left the lab, whereas 57 students remained (27 male students, 30 female students). Upon receiving the die-outcome reports from all participants, our computer immediately calculated the average reported outcome of the group, which reached 4.21, confirming also that it was significantly higher than the statistical expectation of 3.5. After disclosing this result to participants and pointing out that many of them had cheated, we continued as follows:

“We do not feel right to compensate cheaters. Therefore, we will view this round as a trial round with no payoffs and invite you now to roll your die once more. We are asking you again not to cheat but promise to pay you in any case.”

Results

The average die-rolling outcome reported by the group was 4.21 (SD = 1.68) and 4.35 (SD = 1.74) in the first and second rounds, respectively. The difference between the mean reported outcome in the first round and the statistical expectancy of 3.5 is statistically significant [t(56) = 3.199; p = .002], implying that participants as a group lied to us in both rounds. However, the average die-rolling outcome reported in the second round is statistically insignificant as compared to the first round [t(112) = 0.438; p = .661]. Furthermore, the difference in the average level of cheating between the two rounds in the 2nd experiment (0.14) is significantly lower [t(104) = 2.720; p = .008] than the difference in the average level of cheating between the two rounds in the 1st experiment (0.84). This implies that the threat of punishment, rather than the feedback, was the major factor in causing the observed rise in cheating in the second round of the 1st experiment.

Discussion

Collective sanctions are an instrumental method of promoting deterrence when the offender/s is/are unknown. The DUTC task, while incentivizing cheating, represents a case where the experimenter is unable to detect cheaters on an individual basis. We have reported the results of a study which incorporated collective punishment into the DUTC task. Specifically, we ran two DUTC experiments in a computer lab, consisting of two rounds each, where the first round involved a specific request (not practiced in standard DUTC tasks) not to cheat. After establishing that cheating had nevertheless took place, the second round of the first experiment involved the threat that if it happened again, each participant would be fined by the difference between the group’s average reported outcome and the statistical expectancy of rolling a standard die (3.5). We deliberately chose a small group of second-year intermediate macroeconomics students who knew each other and could be expected to consider the harmful effect of dishonest overreporting on their classmates. However, the average reported outcome in the second round significantly exceeded that of the first round. Running a second experiment, this time without the punishment threat in the second round, we ruled out the possibility that the increased cheating in the second round of the first experiment was due to the feedback we provided participants on the results of the first round. We thus conclude that it was the threat of collective punishment that acted to significantly encourage cheating in the second round of the first experiment.

Wondering about the potential mechanisms that could have generated our result, we distinguish between participants who in the first round of the first experiment adhered to our request not to cheat and those who dishonestly ignored it. Considering first the former group, some of whom would have found it acceptable to cheat in the second round, irrespective of the punishment threat, having received feedback about the magnitude of cheating in the first round. Other members of this group, whose moral restraints survived the feedback effect, nevertheless faced the threat of punishment even if reporting honestly. They figure out that by overreporting their die-rolling outcome by one dot, they impose a punishment on others of just 1/N dots (N denoting the number of participants), whereas by adhering to honesty they subject themselves to a punishment of (N − 1)/N dots (assuming that all other participants cheat by one dot). The harm they impose on others if they cheat is negligible as compared to the harm others impose on them if they do not cheat. Furthermore, if they cheat by one dot and others do the same, the punishment exactly offsets the extra payoff and they simply maintain the payoff they deserve if everybody were to report honestly. Such payoff-oriented calculation that might have triggered cheating even among the most moral participants who were immune to the feedback effect could have also been accompanied by a psychological effect: If they were going to be treated as cheaters, why not enjoy the benefit that comes with it.

Considering now the group of participants who cheated in the first round, they faced four possible ways of reacting to the threat of punishment in the second round: reporting honestly, cheating less (in terms of overreported dots), cheating more, or cheating just as much as before. Suspecting that other members of their group are no saints and expecting them to follow the same line of reasoning, they figure out that with the exception of cheating more, all other possible choices would leave them worse off than in the first round. While not necessarily being money maximizers (otherwise they would have all reported six dots in the first round), they apparently cared more about their monetary payoffs than about harming their honest classmates, consequently acted to offset the prospective punishment by inflating their die-outcome reports even more.

Every experimental study has limitations and ours is no exception. First, the study was based on two very small samples of students who knew each other and were expected to have a second thought before making a decision that would hurt their fellow students. It could be argued that the samples were too small to draw reliable conclusions. Enlarging the samples’ size is an obvious extension of our study, yet it does not bode well for inverting the antagonistic behavior observed among a more intimate group. Second, the punishment imposed collectively in case of cheating does not really hurt the pockets of honest participants but rather reduces their payoff. This is in line with tax evasion experiments where the fine in case of an audit is paid out of an initial endowment granted to participants. It is inconceivable that participants would end up losing money. Cheaters might therefore be less concerned about the harm they impose on others. Third, cheating opportunities were known and open to everybody. Noncheaters in the first round could thus bend a bit their moral codes to protect their payoff against the threat of punishment, and previous cheaters could adjust their level of cheating upward in anticipation of their classmates’ dishonesty. Had cheating opportunities been somehow restricted to a selected number of participants, their incentive to cheat more due to facing the threat of a collective punishment would have probably fallen considerably. All in all, our counterintuitive result has been obtained within the very specific setup of the DUTC paradigm. It is an open question to what extent this result can be carried over to the real world. It can, at least, serve as a warning light to law enforcement agents that threatening to implement collective punishment might sometimes backfire.

Most studies on dishonesty suggest that men are more dishonest than women. For example, Ward and Beck (1990), Azar et al. (2013), and Bucciol et al. (2013) found, respectively, that male students were more likely to engage in academic cheating, that male customers were less likely to return excessive change in a restaurant, and that male bus passengers were more likely to travel without a ticket. Other studies have found no gender differences (e.g., Abeler et al., 2014; Arbel et al., 2014).4 An interesting extension of the present study for future research is that of gender differences while making a distinction between the cheating gender and the punished gender. First, two groups of students who know each other, one of male students and one of female students, could be invited to perform the two-round DUTC task at the same time in separate computer labs, where cheating and punishment take place within each gender group. Second, in another session, cheating in one gender group could be threatened by punishing the other gender group. Would threatening to punish the other gender make a difference as compared to punishing the cheating gender?

We wish to thank the Editor and an anonymous reviewer for helpful comments and suggestions.

References

1Levinson (2003) provides numerous ancient and modern examples of collective punishment.

2For detailed reviews of the experimental literature on dishonest behavior, see Rosenbaum et al. (2014), Jacobsen et al. (2018), Abeler et al. (2019), and Gerlach et al. (2019).

3NIS denotes New Israeli Shekels, where 10 NIS equal about 3 US $.

4See Capraro (2018) and Gerlach et al. (2019) for recent meta-analyses on gender differences in dishonesty.