Skip to main content
Free AccessRegistered Report

In Search of the Preference Reversal Zone

Published Online:https://doi.org/10.1027/1618-3169/a000542

Abstract

Abstract: A preference reversal is observed when a preference for a larger-later (LL) reward over a smaller-sooner (SS) reward reverses as both rewards come closer in time. Preference reversals are common in everyday life and in the laboratory and are often claimed to support hyperbolic delay-discounting models which, in their simplest form, can model reversals with only one free parameter. However, it is not clear if the temporal location of preference reversals can be predicted a priori. Studies testing model predictions have not found support for them, but they overlooked the well-documented effect of reinforcer magnitude on discounting rate. Therefore, we directly tested hyperbolic and exponential model predictions in a pre-registered study by assessing individual discount rates for two reinforcer magnitudes. We then made individualized predictions about pairs of choices between which preference reversals should occur. With 107 participants, we found (1) little evidence that hyperbolic and exponential models could predict the temporal location of preference reversals, (2) some evidence that hyperbolic models had better predictive performance than exponential models, and (3) in contrast to many previous studies, that exponential models generally produced superior fits to the observed data than hyperbolic models.

A win of £1,000 today has more value than a win of £1,000 to be delivered 1 year from now because of a process termed delay (or temporal) discounting (Mazur, 1987). As a result of delay-discounting, people and nonhuman animals sometimes forego a larger-later (LL) reward to receive a smaller-sooner (SS) reward (see supplementary Table 3 in Glautier et al., 2022, for a summary listing of symbols and abbreviations used in this paper). The relationship between delay and value is not linear, and one approach is to describe value, V, as a hyperbolic function of delay, δ, and reward magnitude, α, with the rate of decline given by the delay-discounting parameter κ, as shown in Equation 1 (Mazur, 1987).

(1)

As can be seen in Equation 1, larger values of κ indicate that reward value is lost more quickly as delay increases and larger values of κ identify a tendency to choose more immediate impulsive options. This hyperbolic model (Equation 1) has been used to summarize delay-discounting in many laboratory studies, including studies of socially important behaviors such as addiction (Amlung et al., 2017), failure to save for the future (Janssens et al., 2017), and failure to conserve energy (Macaskill et al., 2019).

The value of a reward at a particular delay is typically estimated by locating the indifference point – the amount of reward received immediately that has the same value as the delayed amount. Finding indifference points for a range of delays allows least-squares estimates of κ to be obtained using nonlinear regression. Equation 1 generally describes delay-discounting well, with regressions producing R2 values over 0.9 (e.g., Kirby, 1997; Madden et al., 1999), but note that Equation 1 is a highly simplified model and does not take into account a wide range of variables that are relevant to delay-discounted decision making (e.g., see Cavagnaro et al., 2016). Estimates of κ are associated with important real-world behaviors such as smoking, drug use, and problem gambling (see MacKillop et al., 2011). The widespread use of the hyperbolic model as a tool for understanding, and possibly intervening in, applied problems makes a thorough test of its predictive accuracy essential.

A plot of V (δ) using Equation 1 is given in Figure 1 alongside a plot using Equation 2.

(2)

Figure 1 Two delay-discounting curves – a hyperbolic curve (Equation 1) with the solid line and an exponential curve (Equation 2) with the dashed line. The curves show the value of a large reward (α = 1,000) as a function of delay.

Equation 2 is an exponential model of delay-discounting (Samuelson, 1937). These hyperbolic and exponential delay-discounting curves are visually similar in form, both produce curves with slopes approaching zero at a negatively accelerated rate. However, the exponential model typically produces lower R2 values in model fitting than does the hyperbolic model for humans (e.g., Kirby, 1997; Madden et al., 1999; McKerchar et al., 2009; Myerson & Green, 1995) and nonhumans (e.g., Mazur & Biondi, 2009). More importantly, for the current paper, although not apparent in Figure 1, the exponential model of Equation 2 does not predict preference reversals. A preference reversal occurs when people initially prefer an LL reward but then switch to preferring an SS reward as both come closer in time. This behavior can be modeled with Equation 1, as will be explained in the Method section, but according to Equation 2, an individual will prefer either the SS reward or the LL reward and that choice will be consistent no matter when in time they make the choice (Kirby & Herrnstein, 1995).

This capacity to model preference reversals (e.g., Ainslie & Herrnstein, 1981; Green et al., 1994) is a key difference between Equations 1 and 2. Therefore, well-documented preference reversal effects would be of critical theoretical importance, allowing a choice to be made between two models of delay-discounting. Furthermore, existing demonstrations of preference reversals with humans (Green et al., 1994; Kirby & Herrnstein, 1995; Pope et al., 2019) and pigeons (Ainslie & Herrnstein, 1981; Rachlin & Green, 1972) suggest cross-species generality, so empirical tests of these models are of general interest. However, although there seems to be clear evidence of preference reversal effects, it is less clear that these demonstrations are as theoretically decisive as may be apparent on first glance. This is because there has been no direct link between individual delay-discounting rates and the temporal location of preference reversals.

For example, Pope et al. (2019) found that individual delay-discounting rates derived from the hyperbolic model broadly predicted preference reversals when considered at the group level. Smokers, who had larger κ values, required longer delays to SS (longer “front-end” delays), holding constant the difference between SS and LL before their preferences changed from SS to LL (see also Yi et al., 2016). This suggests some level of correspondence between κ and the delays at which preference reversals occur, but Pope et al. did not examine whether preference reversals occurred at delays specifically predicted using each individual’s estimated κ value.

The few other studies that have tested specific predictions based on estimated κ values have not consistently found that preference reversals occur at the delays predicted by Equation 1. Kable and Glimcher (2010) compared discounting with and without a 60-day front-end delay. Choices were not customized to the individual, so for many choices the participants saw, their κ value did not actually predict that they would shift their preference when the front-end delay was added. When Kable and Glimcher examined only the subset of choices in which the hyperbolic model predicted preference reversal, most participants did not show any change in preference and shifts that did occur were just as likely to be in the SS to LL direction as vice versa (see also Janssens et al., 2017; Luhmann, 2013).

One caveat is that tests of the hyperbolic model’s predictions regarding preference reversal have generally neglected to consider the well-demonstrated effect of reward magnitude on discounting rate. Larger reinforcers have been discounted less steeply than smaller reinforcers in a number of human studies (e.g., Green et al., 1997) and in some nonhuman studies (e.g., Grace et al., 2012), but not others (e.g., Green et al., 2004). We reviewed human studies of magnitude effects on delay-discounting (see supplementary Figure 3 in Glautier et al., 2022, and section Choice of x and d: The “ab” optimization criterion) and found a systematic negative relationship between reward size and delay-discounting, so this is clearly important to take into account. The models of Equation 1 and Equation 2 can easily be developed to allow κ to vary for different reward magnitudes (e.g., Kirby, 1997), and therefore, fair tests of preference reversal predictions should not only be based on individualized κ values but should also allow different κ values for the large and small rewards under consideration. An interesting consequence of introducing two free parameters into the modeling is the fact that the exponential model is also able to predict preference reversals if a larger discounting rate is assumed for the smaller than for the larger reward (Kirby, 1997; Madden et al., 1999).

In summary, the facts (1) that the hyperbolic model predicts preference reversals and (2) that preference reversals occur have been taken as evidence in favor of that model, but researchers have rarely directly tested specific hyperbolic model predictions for preference reversals based upon individual participant estimates of κ. Researchers who have tested specific predictions of the hyperbolic model have not found support for them, but have also overlooked the magnitude effect. Therefore, the current study directly tested specific, quantitative, individual-level predictions of the hyperbolic discounting model for preference reversals, taking into account the possibility of different discount rates for small and large rewards. Modeling with two discount rate parameters means that both the hyperbolic and exponential discounting models could predict preference reversals, so we undertook a comparison of these two models.

In the first of two main analyses, our model comparison used the generalization criterion method (GCM) originally proposed by Mosier (1951) (see also Ahn et al., 2008; Busemeyer & Wang, 2000; Shiffrin et al., 2008). Key features of this method are that it provides a strong test of the predictive power of a model and that it avoids problems of comparing models of different complexities which can arise due to functional form and due to the number of parameters (Myung & Pitt, 1997). The GCM involves calibration of model parameters using a calibration data set and then, without further parameter adjustment, testing model predictions using a test data set. The GCM differs from cross-validation approaches, which also employ partition of data into calibration and test sets, in that the GCM uses different experimental design parameters for the calibration and test data. Thus, GCM tests a model’s capacity to extrapolate beyond the calibration design. One of the features of our investigation, described in detail below, is that when we tested the two discount rate parameter hyperbolic and exponential model predictions, we used experimental designs optimized for each model. It is not generally feasible to use the same design to compare models with respect to preference reversal predictions because a design parameterized by one set of delays and values may be predicted to produce a preference reversal by one model, but not by another. To overcome this, we compared model performance using the best design for each model. Our experimental design optimization found delay-discounting test questions tailored to maximize preference reversals for each model, so the number of correctly predicted preference reversals for each model was used for model evaluation alongside maximum likelihood statistics as indices of model fit to the test data.

In our second main analysis, we assessed the descriptive rather than predictive adequacy of the two discount rate parameter hyperbolic and exponential models alongside single discount rate parameter versions of each model and alongside a baseline guessing model. In this analysis, we used Akaike weight analysis of maximum likelihood statistics (Wagenmakers & Farrell, 2004), which takes into account the number of parameters in a model as a proxy for model complexity.

Method

A two-stage experiment was conducted. In Stage 1, we estimated hyperbolic (Equation 1) and exponential (Equation 2) discounting rates for a small (S) and for a large (L) hypothetical monetary reward for each participant. These estimates were used to design questions for Stage 2. The Stage 2 questions formed a preference reversal zone test in which we asked each participant question pairs of the form “Would you like S in d or L in D?” and “Would you like S now or L in D1?”. In these question pairs, d is the delay to S, the front-end delay referred to in the introduction, and D is the delay to L. D1, the delay to L in the second question, is computed as Dd. The selection of values for S, L, d, and D is described below. The question pairs were one of three types based on the values of d and D. Based on the discounting rates established in Stage 1, we generated the following model predictions. For the LLC pairs, we predicted that the participant would choose L for both questions (an LL choice pattern). For the LSC pairs, we predicted that the participant would choose L when S was delayed and would show a preference reversal choosing S when it was immediately available (an LS choice pattern). Finally, for the SSC pairs, we predicted that the participant would choose S for both questions (an SS choice pattern). This process was repeated in parallel for the hyperbolic and exponential models, and our model evaluation involved a comparison of the choice patterns found in the data and the choice patterns predicted by each model.

Participants

Two hundred eighty-six participants were recruited from https://www.prolific.co and tested online via https://www.soscisurvey.de. Four failed to complete Stage 1, and among the remaining 282 participants, there were 94 males and 181 females; 6 reported other gender (non-binary, etc.) and 1 declined to answer the gender question. The mean age was 32.9 (range 18–73). The procedure lasted approximately 15 minutes, and the participants were paid £3 for taking part. The experiment began after the participant read instructions and provided consent (see supplementary materials Appendices A and B, Glautier et al., 2022). Procedures were approved by the School of Psychology Human Ethics Committee at Victoria University of Wellington (application #27470) and by the Department of Psychology Ethics Committee at the University of Southampton (application #50835).

Online Experimental Methods

The task was presented to participants using the platform Soscisurvey. The instructions explained that the task involved making a series of choices, such as “Would you prefer 550 GBP in 12 weeks or 1,000 GBP in 24 weeks.” Participants were informed that although their choices were hypothetical they should try to make their choices in exactly the same way as they would if the delays and amounts were real. The transition between Stage 1 and Stage 2 (see below) was not signaled to the participants. See supplementary materials for further details (Glautier et al., 2022).

Stage 1: Estimation of Delay-Discounting Rates

In Stage 1, we obtained maximum likelihood discount rates for two monetary amounts S = £820 and L = £1,000. The value of S was obtained from xL where the specific value of x was selected as part of the Stage 2 design optimization procedure described below. Discount rates were obtained for each model (using the same questions) and used to select the delays used in Stage 2, as described below. For each of these two amounts, participants had a series of questions of the form “Would you like a now or A in D?” where A was fixed and corresponded to either S or L.

The delays at which the values of S and L were estimated were 1 week, 1 month, 6 months, 1 year, and 2 years. Eight values of a were used for the estimation of L at each delay with a ∈ {£5, £100, £250, £400, £550, £700, £850, £1,000}. The corresponding eight values of a for the estimation of the value of S at each delay were obtained by multiplying each element of a by x, just as S was obtained from xL, to give {£4, £82, £205, £328, £451, £574, £679, £820}. Thus, there were 80 different questions – eight values of a to assess discounting at each of five delays for each amount S and L. Each question was repeated twice (160 total), and the order of questions was randomized independently for each participant within each repeat.

Maximum likelihood estimates of κ were obtained for each participant for the hyperbolic and exponential models by modeling the choices made in each of the i trials of Stage 1 (excluding fillers, see below) to find model parameters which minimized Equation 3.

(3)

In Equation 3, P(R) is the probability of the observed response computed using Equation 4 as used by Cavagnaro et al. (2016).

(4)

In Equation 4, the function V is used to compute the value of each choice alternative using Equations 1 and 2, and g > 0 is an additional parameter which determines the sensitivity of the choice to the difference between V (A) and V (a).

Stage 2: Preference Reversal Zone Test

Stage 1 yielded, for each participant, four maximum likelihood estimates of κ, one for S (k) and one for L (K) for each model, Equations 1 and 2. These κ values were used to compute the values of D required in the Stage 2 questions. Calculation of κ and D values was done “on-the-fly” as soon as Stage 1 was complete as described in section Online Experimental Methods. In Stage 2, we identified three pairs of choices for each of the two models, giving six pairs in total. Each pair corresponded to one of three types – an LLC pair, an LSC pair, and an SSC pair, and each type was represented for each model. The types were based on values of D computed from the models. Each pair of questions was repeated twice giving 24 questions in total, and the order of questions was randomized for each participant. The question types corresponded to different patterns of choice expected across the members of the pair, as described below. For example, for the LSC type, we expected, based on model predictions, that participants would choose the large reward L for the first question and the small reward S for the second question.

To explain the derivation of the LLC, LSC, and SSC question types, consider Equation 5.

(5)

Equation 5 is an alternative way to show delay-discounting curves by plotting value against position in time (as opposed to delay, as was done in Figure 1 for Equation 1). Equation 5 gives value against time for rewards which become available with δ set to some offset from t = 0. Plotting two such curves on the same graph, one for a small reward, α = a = 500 and δ = d = 50, and one for a larger reward, α = A = 2a and δ = D = 90, can reveal a theoretically interesting characteristic of Mazur’s equation, namely the presence of a preference reversal zone. This is shown in Figure 2 panel LSC. The implication is that someone might prefer an LL reward (e.g., better health) over an SS reward (e.g., smoking a cigarette) at t = 0, but this preference switches as t → 50 when the cigarette is imminently available.

Figure 2 Each panel shows two hyperbolic delay-discounting curves (Equation 5), one for a smaller-sooner reward and another for a larger-later reward. Panels LLC, LSC, and SSC illustrate theoretical curves for the three different conditions of the Stage 2 preference reversal zone test (see the section Stage 2: Preference Reversal Zone Test). The critical region is t ∈ [0, d]. In panel LLC, all choices made in the critical region result in a preference for the larger-later reward. The critical region in panel LSC has a preference reversal zone where preferences switch from the larger-later reward to the smaller-sooner reward as td. In panel SSC, all critical region choices result in a preference for the smaller-sooner reward. Panel SSC is annotated to show positions of a, A, d, and D. The vertical lines in panel LSC give LD and UD, the boundaries on D which mark the presence of a preference reversal zone. The ab criterion panel illustrates the derivation of the optimization criterion used to compute d for the Stage 2 questions (see the section Choice of x and d: the “ab” Optimization Criterion).

Figure 2 panel LLC shows two hyperbolic delay-discounting curves constructed using Equation 5. It can be seen here that a prediction can be made that at t = 0 a participant would choose L and that the same choice would be made at t = d – an LLC choice pattern. In panel LSC, the prediction is that the participant would choose L at t = 0, but at t = d they would choose S – an LSC choice pattern which is a prediction for a preference reversal. Panel LSC also shows two vertical lines, marked LD and UD.

These give boundaries on the value of D within which there is a preference reversal zone, and details of calculation of LD and UD are given below. In panel SSC, the prediction is for an SSC choice pattern, and the SS reward should be chosen at t = 0 and at t = d.

Of course Equation 2 can also be written as a function of t as in Equation 6.

(6)

Using Equation 6, the choice patterns shown in Figure 2 (panels LLC, LSC, and SSC) can be qualitatively reproduced if κ for the SS reward (k) and the LL reward (K) are allowed to differ and if k > K. As described in the introduction (e.g., Green et al., 1997), there is evidence that smaller rewards may be discounted more heavily than larger rewards. This puts the hyperbolic and exponential models on an even footing – with two discount parameters, both models can predict preference reversals. We now outline how the value of D, for the LL reward, can be calculated to generate model predictions for both models, for each of the choice patterns, using k and K computed from Stage 1.

LLC Question Pairs

From Figure 2, it can be seen that as the delay D to the LL reward increases at some point a preference reversal is predicted. To design the LLC question pairs, it is necessary to define an upper bound on D below which the LLC choice pattern is predicted. This upper bound is the delay D at which A has the delay-discounted value a. Given K from Stage 1 for an individual, we set d, A (for an LL reward), x, and a (for an SS reward), where a = xA, and using Equation 1, we solve Equation 7 to find the required D.

(7)
(8)

Therefore, referring to Equation 7, xA is the “now” value of the SS reward and following rearrangement Equation 8 sets the upper bound on D below which the now value of the SS reward is less than the value of the LL reward delivered with a delay of Dd. Thus, we have the limit on an LLC zone in which all choices should be for the LL reward. This limit is designated LDh because it marks the lower bound on D for the preference reversal zone for the hyperbolic model.

Following the logic set out above, we now derive LDe, the lower bound on D for the preference reversal zone for the exponential model, as given in Equation 9.

(9)

We can now construct LLC question pairs for the hyperbolic and exponential models for which we expect all choices to be for the LL reward:

  • “Would you like S in d or L in D?” D ∈ [d, LDh)
  • “Would you like S now or L in D?” D ∈ [0, LDhd (hyperbolic questions)
  • “Would you like S in d or L in D?” D ∈ [d, LDe)
  • “Would you like S now or L in D?” D ∈ [0, LDed (exponential questions)

SSC Question Pairs

From Figure 2, it can be seen that when D is large all choices are predicted to be for the SS reward. To select appropriate values for D for the SSC questions, we need to identify a lower bound on D above which the SSC choice pattern is predicted. This lower bound is the delay D at which A has the same delay-discounted value as a discounted for delay d, and we designate this limit UDh because it marks the upper bound on D for the preference reversal zone. Given k and K from Stage 1 for an individual and using Equation 1, we solve Equation 10 for D as in Equation 11.

(10)
(11)

The corresponding UDe for the exponential model is given by Equation 12.

(12)

We can now construct SSC question pairs for the hyperbolic and exponential models for which we expect all choices to be for the SS reward:

  • “Would you like S in d or L in D?” D > UDh
  • “Would you like S now or L in D?” D > UDhd (hyperbolic questions)
  • “Would you like S in d or L in D?” D > UDe
  • “Would you like S now or L in D?” D > UDed (exponential questions)

LSC Question Pairs

We now have LDh, UDh, LDe, and UDe which form the lower and upper boundaries of preference reversal zones for the hyperbolic and exponential discounting models, so now we can construct LSC question pairs as follows (in each pair, we expect an L choice for the first question and an S choice for the second question):

  • “Would you like S in d or L in D?” D ∈ [LDh, UDh)
  • “Would you like S now or L in D?” D ∈ (LDhd, UDhd] (hyperbolic questions)
  • “Would you like S in d or L in D?” D ∈ [LDe, UDe)
  • “Would you like S now or L in D?” D ∈ (LDed, UDed] (exponential questions)

Stage 2: Design Optimization

The preceding gives us boundaries for setting D|x, d, k, K to produce question pairs for which each model predicts one of the LLC, LSC, and SSC choice patterns with the LSC pattern corresponding to the preference reversal effect which we are particularly interested in. However, the precise values of D, x, and d (we have no control over k and K) can be optimized to increase the theoretical likelihood that we will be able to observe preference reversals, and we describe now two optimization criteria that we used. First, for each question pair type, there is a range of values for D for which the predictions hold and, given there is some degree of imprecision in our estimates of the boundaries and variability in the judgment of participants, we wish to select a value for D to maximize the likelihood that the model prediction is met. For this, we choose D to be at the midpoint of the ranges as specified above for the LLC and LSC patterns: For example, for LLC, we have for the question “S in d or L in D.” However, for the SSC questions, there is no upper bound on D, so in this case we will choose D = 2UDLD, that is, D is set to UD plus the width of the preference reversal zone (UDLD).

Second, the values of x, d, k, K all impact the value of D calculated to construct the intervals for which the LLC, LSC, and SSC predictions hold. Since our primary interest was in the LSC preference reversal pattern, we sought to choose values of x and d to maximize the likelihood of preference reversals for each model.

We had to choose x before the experiment began as it was needed in Stage 1. However, d was decided on-the-fly for each participant once Stage 1 was complete and we had calculated k and K.

Choice of x and d: The “ab” Optimization Criterion

We used the ab optimization criterion to make best estimates for x and d before the experiment started. The ab optimization criterion maximizes the difference between the values of LL and SS (a = VLLVSS) at t = 0 and the difference between the values of SS and LL (b = VSSVLL) at t = d, as illustrated in the ab criterion panel in Figure 2. Values of x and d that maximize the product ab given k and K were found using optimization algorithms to arrive at our choice of x = 0.82 (see supplementary materials, Glautier et al., 2022).

Once each participant completed Stage 1, the missing values (ke, kh, Ke, Kh) for the calculation of the best d ∈ [5, 180] were computed. The best d maximized the ab criterion, and once this value was known, the values of D for the Stage 2 questions were calculated for each model as described in the section Stage 2: Preference Reversal Zone Test.

Filler Questions

Our main aim in this experiment was to test the predictive capacity of the hyperbolic and exponential models, and this test relies on participants treating Stage 1 and Stage 2 as equivalent. Although we did not explicitly signal a transition between stages, the fact that none of the Stage 1 questions had a front-end delay and half of the Stage 2 questions did have a front-end delay could have been sufficiently salient to change participant behavior. To ensure good matching of stages, we therefore included a set of 38 filler questions in Stage 1 which included front-end delays. For each filler question, a, d, A, and D were chosen at random from the sets {£50, £100, £150, £200, £250, £300}, {1 week, 2 weeks, 1 month, 2 months, 3 months, 4 months}, {£400, £500, £600, £700, £800, £900}, and {6 months, 9 months, 1 year, 18 months, 2 years, 3 years}, respectively. Nineteen fillers were mixed in with the first repeat of the Stage 1 questions, and 19 were mixed with the second repeat, making 198 Stage 1 questions in total. Filler questions were not included in any of the following analyses.

Uncertainty of κ estimates

For Stage 2, predictions to hold estimates of κ must be accurate. Thus, before the experiment began, we examined the sensitivity of predictions to error in the estimation of κ (see supplementary materials, Glautier et al., 2022). We found that underestimation of the true κ would have much less impact than overestimation. A substantial overestimation such that the true κ was actually on the lower boundary of the 95% CI of the estimated κ would be catastrophic but would only occur infrequently. A substantial underestimation would have a much less severe impact but, again, would only occur infrequently. Since we had no reason to expect systematic overestimation or underestimation, we concluded that in the large majority of cases that the preference reversal zone based on estimated κ would overlap substantially with the preference reversal zone based on the true κ.

Analysis

Data Selection

Although the Stage 1 fitting of Equations 1 and 2 to obtain maximum likelihood estimates of κ was expected to proceed smoothly in the majority of cases, some cases were expected where delay-discounting was not well-described by the equations. This may be due to participant inattention, lack of understanding of the task, or for some other reason. Whatever the reason, we did not wish to add noise to our critical tests in Stage 2 – either due to continued poor performance in the task or due to inaccurate Stage 2 delays calculated from poor Stage 1 data. Therefore, participants’ Stage 1 data were used to select valid cases for Stage 2 analysis. For this purpose, we used the inclusion criteria of Johnson and Bickel (2008). These criteria specify that (i) the last indifference point is greater than the first indifference point by at least 10% of LL (here, for L, £100) and (ii) no indifference point is greater than that at the previous delay by more than 20% of LL (here, for L, £200). These inclusion criteria were applied to both S and L. Meeting these criteria requires an approximately monotonic decrease in value as a function of delay. In a meta-analysis, Smith et al. (2018) found that about 18% of delay-discounting curves failed to meet the criteria of Johnson and Bickel (2008).

A second data selection procedure was also applied. As noted earlier, for the exponential model, no preference reversal is predicted unless ke > Ke. Therefore, the model comparison described below is confined to those cases for whom this inequality holds based on Stage 1 performance.

Model Comparison

Our analysis had two parts. In the first part, the predictive performances of the two discount parameter hyperbolic and exponential models were compared. Note that we introduced an additional parameter g in Equation 4; hence, in the results that follow, we refer to these models as H3 and E3, respectively. In the second part, the data fitting performances of hyperbolic and exponential models were compared, including one and two discount parameter versions of each model. The one discount parameter models are referred to as H2 and E2 as these also incorporate g. Our two discount parameter models were piece-wise forms of Equations 1 and 2 using k when modeling S choices and using K when modeling L choices. Our one discount parameter models used a single κ value for modeling S and L choices. In addition, a baseline random-choice model, designated G below, was used in the data fitting analysis.

In the predictive performance analysis, the Stage 2 data were analyzed using the GCM described above. Because this test uses model predictions using maximum likelihood parameters calibrated in Stage 1 (as opposed to being fit to the Stage 2 data), the fitting advantage of more complex models does not come into play. For each model, an overall analysis was carried out using all 12 Stage 2 questions and a focused analysis was carried out for the four Stage 2 LSC trials only. was computed for each participant for each analysis. The best model for each participant is determined by the smallest , and the best model overall can be determined by comparing the average difference D = he with zero (e.g., McDaniel et al., 2009). Here, subscript h indicates the hyperbolic model and subscript e indicates the exponential model. When D < 0, the hyperbolic model is a better fit, and when D > 0, the exponential model is a better fit. Two-tailed Student’s t-tests were used to compare D with zero for Stage 2 overall and for the LSC trials alone. The analysis of the LSC data provides information about the model’s capacity to capture preference reversals, but we also simply compared the number of preference reversals observed on the hyperbolic and exponential model LSC trials using a Wilcoxon signed-rank test.

In the second part of our analysis, we compared model fits using maximum likelihood parameters obtained from Equation 3. Fits were obtained for the Stage 1 and Stage 2 data combined computing log-likelihoods for five models (H3, E3, H2, E2, and G). Our guessing model, G, had one free parameter, gs ∈ [0, 1], the probability of selecting the SS reward. This was a baseline model included alongside four delay-discounting models. Once log-likelihoods were computed, we performed an Akaike weight analysis following Wagenmakers and Farrell (2004). This analysis can be used for comparisons of non-nested models, incorporates an adjustment for model complexity in terms of the number of parameters, and does not assume the true model is in the set of models under consideration. The results of this analysis allow assessment of the relative likelihood of each of the five models based on their descriptive performance.

Statistical Power

We aimed to obtain a minimum total sample of 109 participants after exclusions described above. For 109 cases, this would allow a small–medium effect size (Cohen’s D = 0.35, Cohen, 1988) to be detected with the probability of a Type I error = 0.05 and power β = .95, as computed using G*Power (version 3.1.7; Faul et al., 2007), for our critical test on the log-likelihood differences D in the predictive performance analysis of Stage 2. A sample of 109 would still give acceptable power (β ≥ .8) for a smaller effect size down to Cohen’s D = 0.271. Data collection was stopped after recruitment of 109 participants who met the inclusion criteria.

Results

Of the 282 participants completing Stage 1, 89 (31.6%) failed to meet the inclusion criteria set out by Johnson and Bickel (2008). Of the 193 participants whose data were processed further, 84 (43.5%) were excluded because keKe. Two of the 109 participants who met the inclusion criteria and completed both stages had an optimization error during the calculation of d, so they were excluded. This left 107 participants for analysis – 31 males and 71 females, four reported other gender and 1 declined to answer. The mean age was 33.2 (range 18–63).

Stage 1 Summary

Figure 3 gives the mean Stage 1 indifference points. There was a clear decline in the value of rewards as a function of delay, and the smaller delayed reward was valued less than the larger delayed reward. The average maximum likelihood κ values calculated for the Stage 1 trials for models H3 and E3 (n = 160, excluding fillers) were larger for S than for L, and this was true for the hyperbolic and for the exponential models, consistent with previous research. The mean κ values were 0.26 “v” 0.168 (t(106) = 4.08, p < .001) and 0.166 “v” 0.115 (t(106) = 4.08, p < .001) S “v” L for hyperbolic and exponential models, respectively. The average maximum likelihood g values were 0.017 and 0.016 for the hyperbolic and exponential models, respectively. The average minimized values were 17.3 and 17.4 for the hyperbolic and exponential models, respectively.

Figure 3 Stage 1 mean indifference points as a function of delay and reward magnitude (± 1 standard error).

Stage 2 Model Predictive Performance

The main results of interest from Stage 2 are shown in Figure 4. For the LLC question pairs, the vast majority of choices were for L, and for the SSC question pairs, the vast majority of choices were for S, and this was true for the questions with the delays derived from both the hyperbolic and exponential models, as described in the Method subsection Stage 2: Preference Reversal Zone Test. These results were expected but, contrary to expectations, the vast majority of choices were also for S in the LSC condition. In the LSC condition, we expected to see preference reversals with L chosen for the first question and S chosen for the second question – both models predicted preference reversals, as shown in Figure 4, but very few were observed.

Figure 4 Observed and expected number of choice patterns made in each question pair condition (LLC, LSC, and SSC) of Stage 2 for the hyperbolic and exponential trials. Choice pattern LL, L chosen for both questions; pattern Ls, L chosen for the first question (“Would you like S in d or L in D?”) and S chosen for the second question (“Would you like S now or L in Dd?”); pattern sL, S chosen for the first question and L chosen for the second question; pattern ss, S chosen for both questions. The preference reversals of interest are represented in choice pattern Ls. The expected number of choice patterns was computed for each model H3 and E3 and condition as 2p(pattern), where p(pattern) was computed using Equation 4. The summation is over participants using maximum likelihood parameters from Stage 1. Questions for each model/condition were repeated twice and hence the multiplication by 2 for scaling.

For all 12 Stage 2 questions, the mean log-likelihoods for models H3 and E3 were h = 6.31 and e = 7.023 and the average log-likelihood difference between the models, D, was −0.713 (t(106) = 1.6, p = .11). For the four LSC trials, the mean likelihoods were h = 2.971 and e = 3.178 and the mean log-likelihood difference between the models, D, was −0.207 (t(106) = 0.692, p = .49). However, the outcome of this test was impacted by some outliers, and there were considerably fewer participants for whom the log-likelihoods favored the exponential model.

For example, in the analysis of all 12 Stage 2 questions, there were 8 cases where the difference between models H3 and E3 was greater than ±2 SDs from the mean. Following this up, a Wilcoxon test produced a highly significant result in favor of the hyperbolic model (V = 1708, p < .001). Furthermore, for this 12-question trial analysis, there were only 32 participants out of 107 for whom e < h. In a binomial test with p(success) = .5, the probability of 32 or fewer successes in 107 trials is less than 0.00002.

There was no difference between the number of preference reversals (LL switch to SS, the “Ls” choice pattern in Figure 4) in the condition with the LSC delays chosen using the hyperbolic model and the number of Ls choice patterns with delays chosen using the exponential model (Wilcoxon signed-rank test W = 5,455, p = .42)

Stage 2 Model Fitting

Poor model performance in the generalization criterion test can arise if model parameters depend on experimental design parameters, and this is not captured in the model. Therefore, we checked to see if there was any evidence that model parameters changed between Stage 1 and Stage 2. In a post hoc model fitting exercise, we obtained maximum likelihood parameter estimates for models H3 and E3 for the Stage 2 trials for comparison with the H3 and E3 model parameters obtained in Stage 1. The values are given in the supplementary material Table 2 (Glautier et al., 2022). The κ values estimated for Stage 1 did not differ from those estimated for Stage 2 (ts(106) < 1.36, ps > .17 in all four cases), but there was evidence that the g values (cf. Equation 4) were larger in Stage 2 than in Stage 1 (ts(106) > 2.8, ps < .01 in both cases).

Stage 1 and Stage 2 Model Fitting Performance

The results of the Akaike weight analysis are summarized in Table 1. The Akaike weights in column wAIC provide conclusive evidence in favor of model E2, the exponential model with one discount rate parameter. The evidence ratios (Burnham & Anderson, 2002) provide information on the weight of evidence in favor of model E2, and in all cases, the ratios exceed one million. Although the results of this overall analysis are clear-cut, the picture is less clear at the level of individual cases where there are more individuals for whom the hyperbolic model fares better. Table 2 shows the number of participants for whom each model holds a particular rank based on the Akaike weights wAIC. For example, there were 42 cases where model H2 was best and only 36 cases where model E2 was best (the guessing model was ranked 5 in all cases, so it was not considered further). To follow up this observation, the frequencies in Table 2 were collapsed into four cells of a 2 × 2 contingency table. The cells counted the number of cases in which the two exponential models were best (ranked 1 or 2, n = 36 + 14 + 25 + 26 = 101), the number of cases in which the two hyperbolic models were best (n = 113), the number of cases in which the hyperbolic models were worst (ranked 3 or 4, n = 101), and the number of cases in which the exponential models were worst (n = 113). Although the pattern suggested an advantage for the hyperbolic models, this was not borne out in a χ2 test which produced χ2 = 1.346 (1 df, p = .246). Thus, the hyperbolic and exponential models are on an approximately even footing on this ranking analysis, but this analysis ignores the magnitude of model differences and this is evident in the overall analysis summarized in Table 1 where the single discount parameter exponential model is the clear winner. Table 2 in the supplementary material gives the mean maximum likelihood parameters for the five models fitted in this analysis, and supplementary Figures 5 and 6 provide detailed distributional data in the form of box and whisker plots (Glautier et al., 2022).

Table 1 Overall Akaike weight analyses for data combined over Stage 1 and Stage 2
Table 2 Cross-tabulation showing the number of participants and ranks, based on wAIC (Rank 1 is best model), of fitted models

Discussion

Despite the fact that numerous studies suggest that preference reversals are well-described by hyperbolic models of delay-discounting (e.g., Green et al., 1994; Kirby & Herrnstein, 1995), we found little evidence of preference reversals using delays designed a priori using a two discount parameter version of Equation 1 to produce preference reversals. Our results (cf. Figure 4) showed that the LLC condition produced predominantly LL choice patterns, but these switched over to SS choice patterns in the LSC condition indicating that preference reversals may have been seen if we had used shorter LSC delays, in between those we actually used in the LLC and LSC conditions. We saw the same pattern using a two discount parameter version of Equation 2 for an exponential model. Thus, participants were significantly more impulsive than anticipated by the models. Why should the model parameters obtained in Stage 1 lead to such overestimation of delays for preference reversals?

One possibility is that the model parameters were estimated in Stage 1 at shorter delays (average 4.3 months) than those in Stage 2 (average 54.4 months). Assuming that increasing delay increases discounting more than implied by Equations 1 and 2, we asked whether the observed data might be better understood in these terms. In the supplementary materials (Glautier et al., 2022), we examined models considered by Rodriguez and Logue (1988) and Takahashi et al. (2008) (see also McKerchar et al., 2009; Peters et al., 2012) in which delay is exponentiated according to a sensitivity parameter s, as shown in Equation 13 for a hyperbolic model and in Equation 14 for an exponential model.

(13)
(14)

For the hyperbolic model, when our Stage 1 data were fitted with Equation 13, we found similar κ values to those obtained using Equation 1. In contrast, for the exponential models, when our Stage 1 data were fitted with Equation 14, we found larger κ values than those obtained using Equation 2 (cf. Glautier et al., 2022, supplementary Table 2). Thus, had we used Equations 13 and 14 to obtain the Stage 2 delays, the preference reversal zone questions for the exponential model would have used shorter delays than those we actually used and may have captured the preference reversals which we assume would be occurring at some point where the pattern of choices switches from LL to SS. Thus, it is of interest to use Equation 14 for a follow-up study with another attempt to identify a priori the temporal location of individual preference reversal zones.

Another perspective on this apparent failure of these models to adequately capture sensitivity to delay is suggested by our finding that our estimates for parameter g from Equation 4 were larger for Stage 2 than for Stage 1. Parameter g reflects sensitivity to the difference between the values of the small and larger rewards, amplifying differences as g increases. Thus, changes in g between stages could have worked in concert with underestimation of the effect of delay, as discussed above, to produce the observed patterns. However, some caution is needed here because there were only 12 Stage 2 trials upon which to base parameter estimates for each of the models.

In addition, looking at model predictions for preference reversals, we evaluated the overall predictive and fitting performance of hyperbolic and exponential models. Stage 2 of the experiment examined the predictive performance of our two discount parameter models, H3 and E3, and the hyperbolic model performed better than the exponential model. Using parameters estimated in Stage 1, the average log-likelihoods of the Stage 2 data were lower for the hyperbolic than for the exponential model, and there were significantly more participants than would be expected by chance for whom the hyperbolic model performed better than the exponential model. Poorer predictive model performance is linked to model complexity and indicates that the exponential model may overfit as compared to the hyperbolic model (e.g., Myung, 2000; Zucchini, 2000).

Following on from this, the fitting performance of the exponential models was better than hyperbolic models in the overall analysis of Stage 1 and Stage 2 with the Akaike weight analysis showing conclusive evidence in favor of the single discount parameter exponential model, E2, against the other four models in our set. However, this result did not extend to the individual level using model ranks (cf. Table 2). Recall that in the Stage 2 predictive performance analysis, we found significantly more participants for whom the hyperbolic model was the winner. In contrast, in the Stage 1 and Stage 2 fitting analysis, the number of participants for whom the exponential models were the best was in fact slightly lower than for the hyperbolic models despite the fact that the exponential models came out on top overall.

In summary, we found (1) little evidence that hyperbolic and exponential models could predict the temporal location of preference reversals, (2) that the predictive performance of the hyperbolic models was better than the exponential models, and (3) that a single discount parameter exponential model provided the best overall fits to the data. In relation to (1), although there was some evidence that participant behavior may have differed between the Stage 1 parameter calibration and the generalization criterion test in Stage 2, it is not clear whether this could explain the results; only the sensitivity parameter g changed between stages, the discount parameters themselves remained constant. An exploratory analysis showed that a modified exponential model (Equation 14) would be worth testing in a future investigation. In relation to (2), although there was no overall difference between the exponential and hyperbolic models in terms of their predictive performance in Stage 2, exclusion of some outlying cases showed that the hyperbolic model was better than the exponential model, and at the level of individual participants, there was a significant majority for whom the hyperbolic model was better. In relation to (3), fitting models to the complete data set, Stage 1 and Stage 2 combined, resulted in a clear overall win for the exponential model, despite the fact that the hyperbolic and exponential models were evenly matched at the level of individual participant fits, counting case numbers where one model or the other won on the basis of Akaike weight ranking.

In conclusion, we note three points. First, our finding of better overall fitting performance for the exponential model runs somewhat against the literature that has generally suggested better fits for hyperbolic models than for exponential models. However, as pointed out by Cavagnaro et al. (2016), the majority of existing studies have used least-squares regression to model value functions estimated with indifference points. We are not aware of any clear reason why this methodology should favor hyperbolic models and why using maximum likelihood methods should favor exponential models, but there is a growing number of studies that have used maximum likelihood methods, and among these studies, there is no clear advantage for hyperbolic models (e.g., Cavagnaro et al., 2016; Hofmeyr et al., 2017).

Second, one motivation for this work was to provide a strong test for models of preference reversals. This is especially important since these models may be used to guide applied work. However, in this study, as with the majority of delay-discounting studies, we used hypothetical rewards. Use of hypothetical rewards is sufficient to provide an internally valid experimental test of our theoretical models, but despite the fact that there has been little reliable evidence of differences between real and hypothetical reward methodologies (e.g., Green & Lawyer, 2014), a note of caution is still required if inferences about real rewards and applications are to be drawn.

Finally, our sample was highly selected. To provide common ground for a comparison of hyperbolic and exponential models, we included participants for whom ke > Ke because without this constraint the two-discount parameter exponential model does not produce preference reversals. This selection criterion excluded 43.5% of participants and, assuming that participants could show preference reversals without meeting this constraint, this is a serious limitation on the generality of this model.

However, our preliminary exploration of the modified exponential model given in Equation 14 showed that this model can generate preference reversals without this constraint. Given this model might also explain the increased impulsivity that we saw at the longer Stage 2 delays, we are encouraged to use this model in continuation of our search for the preference reversal zone.

References