The colour of money

26/05/2009

Are you disappointed because ITV's "most stressful game show on TV", The colour of money, seems to have been pulled? Do you think that you had just the right strategy to win? Then check out if you were right with John Haigh's analysis of best play.

The game consists of 20 boxes, each identified by its own colour, and each containing one of the integers from 1 to 20, which have been randomly allocated to the boxes. The contestant, let's call her Louise, is given a target number $T,$ at least 50 and up to about 80. She chooses a box at random, not knowing which integer $K$ it contains, and then sees the sequence $1, 2, 3, . . ., K b e i n g g e n e r a t e d i n f r o n t o f h e r . I f s h e c a l l s o u t " S t o p! ", t h e n t h e l a s t n u m b e r s h o w n i s b a n k e d t o w a r d s h e r t a r g e t . I f s h e f a i l s t o c a l l o u t a t$ K $o r e a r l i e r, t h e s e q u e n c e i s e x t i n g u i s h e d, a n d t h a t b o x c o n t r i b u t e s z e r o t o w a r d s h e r t a r g e t . I f s h e c a l l e d s t o p b e f o r e$ K $w a s r e a c h e d, t h a t i s, i f s h e b a n k e d, t h e n t h e t r u e v a l u e o f$ K $i s r e v e a l e d . S h e m a y s e l e c t u p t o t e n b o x e s, o n e a t a t i m e, a n d w i n s$ T$ thousand pounds if she reaches her target, otherwise she wins nothing.

Louise's aim is obviously to maximise her chance of reaching the target. Whatever her long-term strategy, she has only one decision to make in any round — at what value does she intend to call "Stop"? Is there an optimal strategy Louise should follow?

Starting at the end

What's the best strategy to get your hands on this?

There is indeed and we can find it by backward induction. We'll use the word position to describe the state of the game at a particular moment: the position tells us how far Louise is from her target value (that is, it tells us the residual target), and which boxes she has left.

Now suppose that Louise is just about to begin the final round. Her strategy is clear — wait for the residual target, then bank. At any position in the penultimate round, there is a finite list of decisions: there is a finite number of boxes to choose from, and for each box there is a finite number of values at which Louise can say "Stop". For each of these decisions you can find the distribution of positions reached in the final round, and thus you can find the chance of reaching the target. Hence the optimal decision, and its success chance, can be found for all these positions. It's possible to continue this argument all the way back to the initial round. Thus an optimal strategy to reach any target exists: but as its description must specify the decision to be made in each round for all possible residual targets, and boxes remaining, it is infeasible to write it down here. However, we can offer some pointers to possible long-term strategies, and show how to find the best strategy near the end of the game.

The first ever game

Let's look at the first game seen on ITV to illustrate the optimal end game strategy. With two rounds to go, the residual target was 15, and the twelve boxes left contained the values 1, 4, 5, 6, 9, 10, 12, 13, 15, 17, 19, 20. Four boxes have value of 15 or more, so there are two routes to success: the "bold play" of aiming for 15 immediately (that is not saying "stop" until 15 is reached), or intending to take two steps, such as 9, then 6. We compare their respective merits by counting how often each route would succeed. It does no harm to assume that both boxes are used, even if the target is met first go.

What's the best end game?

There are 12 choices for the first step, and 11 for the second, so there are $12 \times 11 = 132$ choices altogether. With bold play, Louise's first choice succeeds four times, and fails eight times. In the first case, she wins whatever happens with the 11 possibilities on her second go. In the second case she wins four times, as the same chance of success applies to second attempt as did to the unsuccessful first attempt. So altogether, $(4 \times 11) + (8 \times 4) = 76$ of the $132$ choices will win

Now suppose Louise aims to bank 9 first, that is, she does not call stop until 9 is reached. With eight of the remaining boxes this strategy succeeds, leaving her with eight boxes that score the residual 6 or better, giving $8 \times 8 = 64$ chances of success. There are four boxes which fail to score 9, in which case she must hope to select one of the four boxes that score 15 in the last round, giving another $4 \times 4 = 16$ chances. So we have a total of $64 + 16 = 80$ chances of success by this route. Similarly, the success frequency if aiming to bank 10 first go is $(7 \times 9) + (5 \times 4) = 83$ ; and going for 12 first wins $(6 \times 10) + (6 \times 4) = 84$ times - this is the best option.

We will show here that it is never better to try to bank the lower of two scores before the higher, so, by a margin of just one chance in 132 over its nearest competitor, best tactics are to aim for 12 first, winning 84/132 = 7/11 of the time. The answer may be counter-intuitive, as this intended route would lead to a total of 16, not 15 — but it is indeed optimal!

Two Rounds Left

Consider the penultimate round, with residual target $T_{0}$ . We'll assume that it is actually possible to achieve the target. Suppose that $x_{1} < x_{2} < . . . < x_{12}$ are the numbers in the remaining boxes. We'll show how to find the best strategy to reach the target. Suppose that $t$ of the boxes have value at least $T_{0}$ , and assume $t < 12$ - otherwise the target will definitely be reached in one selection. One possible strategy when $t > 0$ is to have two shots at reaching $T_{0}$ with one box. The success frequency out of the 132 possible box choices, using the argument deployed above, is $t \times 11 + (12 - t) t = t (23 - t) .$ \ Now suppose Louise aims to reach the target in two steps, $x_{i}$ and $x_{j}$ . Suppose $i < j$ with $x_{j} < T_{0}$ and $x_{i} + x_{j} \geq T_{0}$ , but $2 x_{i} < T_{0}$ : we will prove that aiming to stop at $x_{j}$ , then at $x_{i}$ , is at least as good as aiming for $x_{i}$ , then $x_{j}$ . The first strategy works if first $x_{j}$ and then $x_{i}$ are banked; or if $x_{j}$ is not reached, but $T_{0}$ is achieved in the last round. Simple counting, as in the example above, shows that the success frequency is $(13 - j) (12 - i) + (j - 1) t .$ The second one works if both $x_{i}$ and then $x_{j}$ are banked, or if $x_{i}$ is not reached, but $T_{0}$ is scored in the last round; when $x_{i}$ is banked, we must separate the cases when the first box has a value less than $x_{j}$ , or is at least $x_{j}$ . The overall success frequency is thus $(j - i) (13 - j) + (13 - j) (12 - j) + (i - 1) t = (13 - j) (12 - i) + (i - 1) t .$ Inspection shows that this cannot exceed (2), as $j > i$ . So our claim is proved. There may be several candidate pairs $(i, j)$ satisfying the given conditions; the best choice amongst them is that pair that maximises the value of (2). (When $t = 0$ , so that it is not possible to reach $T_{0}$ in one round, then aiming for $x_{i}$ , then $x_{j}$ , or for $x_{j}$ , then $x_{i},$ have the same chance of success.) Finally, suppose that $x_{i} < x_{j} < T_{0}$ and $x_{i} + x_{j} \geq T_{0}$ , but also $2 x_{i} \geq T_{0}$ . Here seeking to bank $x_{i}$ twice is clearly better than aiming for $x_{j}$ first. The success frequency is $(13 - i) (12 - i) + (i - 1) t .$ Optimal play in the penultimate round comes from whichever of the expressions (1), (2) and (3) is largest, across all relevant choices of $i$ and $j$ .

Three rounds left - illustration

Things become quite a lot more complicated when there are three rounds left. To get a feel for what's going on, let's look at a particular example. Suppose that the 13 remaining boxes contain the twelve amounts noted above for the first game played, along with a box containing the number 3. For any residual target, we can find optimal play by considering as putative stop values each of the 13 box values, and exploring the consequences for whichever of them is selected next. We used a programmed computer to find the best initial stop value and the overall success chance. The results are shown in table 1. Optimal play reaches any residual target of 6 or below, scores above 56 are impossible. There are 13×12×11=1716 possible choices for the three boxes.

Table 1: Suppose values {1, 3, 4, 5, 6, 9, 10, 12, 13, 15, 17, 19, 20} are in the last 13 boxes. For the residual targets shown, the optimal quantity to seek to bank on the first move is shown ('Best aim'), as is the frequency (out of 1716 occasions) that this strategy, followed by optimal strategy in the remaining two rounds, would achieve the target. For some targets, there are several optimal strategies; only that with the lowest initial target is given.

The calculations for table 1 apply specifically to 13 boxes listed. The subtle changes demonstrate how impractical it is to find best play in a TV studio, with no pre-programmed computer at hand to run through the possibilities. However, the margin of superiority of the best play, over its nearest rivals, may be very thin. The corresponding calculations for a range of other collections of 13 boxes reinforce a message seen in the table: Louise must not be too cautious - even for a target as low as 14 over three rounds, she should not "safely" bank 5 or 6. She can ill afford to waste boxes with high values.

Starting at the beginning - maximising the mean

Using backward induction to work out the best strategy rapidly becomes very hard, so let's take another approach, starting at the beginning.

One strategy might be to look at all the possible stopping values in each round and to choose the one that, on average, gives the highest score. Suppose that, in the first round, Louise intends to stop at $R$ . There are $21 - R$ boxes whose value is at least $R$ , and she succeeds, but $R - 1$ boxes where she would fail, so her mean score is $R (21 - R) / 20.$ This is maximised at 5.5 when $R$ is either 10 or 11. In later rounds, suppose the values $x_{1} < x_{2} < . . . < x_{K}$ remain in the last $K$ boxes. It cannot be optimal to stop at a value not in this list, and the mean score obtained is maximised by stopping at that value $x_{R}$ for which $x_{R} \times (K + 1 - R)$ is largest. Detailed analysis shows that using this for the first two rounds gives an overall mean score of $4198 / 380 \approx 11.047$ from these rounds, just over twice the best mean value from the first box.

How much should you aim to bank first?

We have run computer simulations of this strategy for all ten rounds: we found that the mean score was 55.7 (with standard deviation of 13.6), and that the respective chances of reaching targets of 50, 60, 70 and 80 are about 72\%, 44\%, 20\% and 5\%. This strategy is not as good as one that pays regard to the actual target, but it indicates floors for Louise’s chances of success.

We saw earlier that, where a desired sum could arise both in one step, or as the sum of two values, in the last two rounds, it might not be best to go for two roughly equal values. By contrast, earlier in the game, suppose there are several pairs $(i, j)$ for which the sums $x_{i} + x_{j}$ are equal. Among these pairs, the highest chance of banking both sums in the next two rounds will arise when $i$ and $j$ are as close together as possible. This suggests aiming to bank similar amounts in each of the early rounds, when it's difficult to work out an optimal strategy from the backward induction.

Starting at the beginning - going for the same R

So suppose that, over the full ten rounds, Louise uses the same stop value $R$ each time, irrespective of the unfolding outcomes. Since we'll be thinking in multiples of $R$ , we'll assume that $T = K R$ for some integer $K$ - if $T$ is not exactly divisible by $R$ , then we simply "round it up" to the next biggest integer that is. As Louise chooses ten boxes at random, she has $N = (\binom{20}{10}) = 184, 756$ equally likely choices. The chance she achieves her target $T$ is just the fraction of times her selections of stop values lead to a total of $T$ or more over the full ten rounds.

Recall that $21 - R$ boxes contain $R$ or more, and $R - 1$ boxes have less than $R$ . Thus, with a fixed stop value of $R$ , $x_{m} = (\binom{21 - R}{m}) (\binom{R - 1}{10 - m})$ choices will score exactly $m R$ . and so $y_{K} = x_{K} + x_{K + 1} + . . . + x_{10}$ choices will score at least $K R$ . Louise's chance of reaching $T = K R$ is $y_{K} / N$ . We can use this to work out optimal stop values $R$ for each target $T$ . For some targets there are several possible fixed stop values – e.g. 72 might come from six 12s, eight 9s, or nine 8s. Table 2 shows the figures only for the best methods in such cases.

Table 2: For a range of targets, the best fixed stop values, and the success frequencies (out of N =184,756). (Considerations (1), (2) below are ignored.)

We can do much better than these figures in places: for example, the stop value of 11 achieves a total of 66 in 60,626 ways, so totals of 63, 64 or 65 can also be reached at least that often. Two obvious modifications, that can only increase her chance of success, are:

if ever the box containing R is revealed, the stop value in later rounds can safely be increased to the next lowest score in the remaining boxes;
if her residual target falls below R, so her stop value drops.

Practical strategies

Table 2 shows that using a constant stop value can succeed, at least half the time, for all targets up to 60. However, table 1 illustrates the difficulties of making the best decision, with only three rounds left, even for a modest residual score. So what is Louise's best strategy?

Before each round, she knows the average sum needed to meet the target: if there are $K$ rounds to go, then the average target per round is $T / K$ . One possible strategy is to use the minimum stop value that would bank That amount. Computer simulations of this strategy are not encouraging, especially for low targets. It does not get good value from high scoring boxes in the early rounds, where the remaining average is 8 or less. It turns out to be uniformly inferior to the two strategies we discuss next.

The calculations needed for optimum play in the last two rounds are not too hard, so we assume that Louise uses the best strategy from that stage, and that she always pays regard to factors (a) and (b). An easy to recall strategy is to simply seek to bank the value that maximises the mean score obtained for the first eight rounds.

Alternatively, in a gesture at tailoring the strategy to the target, and based on aiming to bank a similar amount each round, Louise could use the following strategy:
A. Select a stop value for round one, and decrease (increase) that value by one unit if ahead of (behind) the pro rata target during round two to seven;
B. In round eight, note the residual target, and use a stop value based on a "smoothed" version of table 1.

By "smooth" in step B, we acknowledge that table 1 was constructed for a specific list of the last 13 boxes, but we will be guided by the pattern shown therein: for residual targets up to 28, gradually increase the stop value to 12, and use table 1 for higher targets.

Computer simulations lead to the following recommendations as stop values for step A:

For $50 \leq T \leq 56$ , use 9 or 10;
for $57 \leq T \leq 62$ , use 10;
for $63 \leq T \leq 65$ , use 10 or 11;
for $66 \leq T \leq 69$ , use 11;
for $70 \leq T \leq 81$ , use 11 or 12.

The success rates for these two strategies are very similar, ranging from around 85\% when $T = 50$ to about 10\% when $T = 81$ . The success rate does not quite fall linearly: in round figures, each time $T$ increases by one unit, the percentage chances of success drop by 2.5 up to $T = 60$ , then by 3 up to $T = 70$ , and finally by 2 up to $T = 81$ . Neither strategy is optimal, but we suggest that perfect play will do only slightly better. Luck plays a large role, and optimal play, even if identified, is by no means a guarantee of success!

Variations

The extra difficulty of hitting higher targets is not matched by a corresponding increase in prize value. If the target will always be in the range 50 to 80, then the utility of the potential prize to most contestants will not change dramatically. And an 85\% chance to win £50K is far more attractive than a 30\% chance to win £70K. Perhaps the prize for reaching $T$ might increase more rapidly than linearly, for $T > 50$ ?

One can imagine two (or more) players competing in a contest to see who could obtain the highest total score, for a given set of boxes (which they would take turns to select). Each player would have a hidden button to press, indicating when they wished to bank the current total displayed; the scores and box values would be revealed round by round. As in the TV programme Countdown the winners could take on new opponents, with periodic championships between the most successful players.

About this article

When not thinking about TV game shows, John Haigh teaches mathematics, especially probability, at Sussex University. He thanks David Spiegelhalter and Paul Twomey for their comments on earlier versions of this account.