icon

Game theory and the Cuban missile crisis

Steven J. Brams Share this page
January 2001


Theory of Moves

"We're eyeball to eyeball, and I think the other fellow just blinked" were the eerie words of Secretary of State Dean Rusk at the height of the Cuban missile crisis in October 1962. He was referring to signals by the Soviet Union that it desired to defuse the most dangerous nuclear confrontation ever to occur between the superpowers, which many analysts have interpreted as a classic instance of nuclear "Chicken".

Chicken is the usual game used to model conflicts in which the players are on a collision course. The players may be drivers approaching each other on a narrow road, in which each has the choice of swerving to avoid a collision or not swerving. In the novel Rebel without a Cause, which was later made into a movie starring James Dean, the drivers were two teenagers, but instead of bearing down on each other they both raced toward a cliff, with the object being not to be the first driver to slam on his brakes and thereby "chicken out", while, at the same time, not plunging over the cliff.

While ostensibly a game of Chicken, the Cuban missile crisis is in fact not well modelled by this game. Another game more accurately represents the preferences of American and Soviet leaders, but even for this game standard game theory does not explain their choices.

On the other hand, the "theory of moves," which is founded on game theory but radically changes its standard rules of play, does retrodict, or make past predictions of, the leaders' choices. More important, the theory explicates the dynamics of play, based on the assumption that players think not just about the immediate consequences of their actions but their repercussions for future play as well.

I will use the Cuban missile crisis to illustrate parts of the theory, which is not just an abstract mathematical model but one that mirrors the real-life choices, and underlying thinking, of flesh-and-blood decision makers. Indeed, Theodore Sorensen, special counsel to President John Kennedy, used the language of "moves" to describe the deliberations of Excom, the Executive Committee of key advisors to Kennedy during the Cuban missile crisis:

"We discussed what the Soviet reaction would be to any possible move by the United States, what our reaction with them would have to be to that Soviet action, and so on, trying to follow each of those roads to their ultimate conclusion."

Classical Game Theory and the Missile Crisis

Game theory is a branch of mathematics concerned with decision-making in social interactions. It applies to situations (games) where there are two or more people (called players) each attempting to choose between two more more ways of acting (called strategies). The possible outcomes of a game depend on the choices made by all players, and can be ranked in order of preference by each player.

In some two-person, two-strategy games, there are combinations of strategies for the players that are in a certain sense "stable". This will be true when neither player, by departing from its strategy, can do better. Two such strategies are together known as a Nash equilibrium, named after John Nash, a mathematician who received the Nobel prize in economics in 1994 for his work on game theory. Nash equilibria do not necessarily lead to the best outcomes for one, or even both, players. Moreover, for the games that will be analyzed - in which players can only rank outcomes ("ordinal games") but not attach numerical values to them ("cardinal games") - they may not always exist. (While they always exist, as Nash showed, in cardinal games, Nash equilibria in such games may involve "mixed strategies," which will be described later.)

The Cuban missile crisis was precipitated by a Soviet attempt in October 1962 to install medium-range and intermediate-range nuclear-armed ballistic missiles in Cuba that were capable of hitting a large portion of the United States. The goal of the United States was immediate removal of the Soviet missiles, and U.S. policy makers seriously considered two strategies to achieve this end [see Figure 1 below]:

  1. A naval blockade (B), or "quarantine" as it was euphemistically called, to prevent shipment of more missiles, possibly followed by stronger action to induce the Soviet Union to withdraw the missiles already installed.
  2. A "surgical" air strike (A) to wipe out the missiles already installed, insofar as possible, perhaps followed by an invasion of the island.

The alternatives open to Soviet policy makers were:

  1. Withdrawal (W) of their missiles.
  2. Maintenance (M) of their missiles.

Soviet Union (S.U.)
Withdrawal (W) Maintenance (M)
United
States
(U.S.)
Blockade
(B)
Compromise
(3,3)
Soviet victory,
U.S. defeat
(2,4)
Air strike
(A)
U.S. victory,
Soviet defeat
(4,2)
Nuclear war
(1,1)

Figure 1: Cuban missile crisis as Chicken

Key: (x, y) = (payoff to U.S., payoff to S.U.)
4=best; 3=next best; 2=next worst; 1=worst
Nash equilibria underscored

These strategies can be thought of as alternative courses of action that the two sides, or "players" in the parlance of game theory, can choose. They lead to four possible outcomes, which the players are assumed to rank as follows: 4=best; 3=next best; 2=next worst; and l=worst. Thus, the higher the number, the greater the payoff; but the payoffs are only ordinal, that is, they indicate an ordering of outcomes from best to worst, not the degree to which a player prefers one outcome over another. The first number in the ordered pairs for each outcome is the payoff to the row player (United States), the second number the payoff to the column player (Soviet Union).

Needless to say, the strategy choices, probable outcomes, and associated payoffs shown in Figure 1 provide only a skeletal picture of the crisis as it developed over a period of thirteen days. Both sides considered more than the two alternatives listed, as well as several variations on each. The Soviets, for example, demanded withdrawal of American missiles from Turkey as a quid pro quo for withdrawal of their own missiles from Cuba, a demand publicly ignored by the United States.

Nevertheless, most observers of this crisis believe that the two superpowers were on a collision course, which is actually the title of one book describing this nuclear confrontation. They also agree that neither side was eager to take any irreversible step, such as one of the drivers in Chicken might do by defiantly ripping off the steering wheel in full view of the other driver, thereby foreclosing the option of swerving.

Although in one sense the United States "won" by getting the Soviets to withdraw their missiles, Premier Nikita Khrushchev of the Soviet Union at the same time extracted from President Kennedy a promise not to invade Cuba, which seems to indicate that the eventual outcome was a compromise of sorts. But this is not game theory's prediction for Chicken, because the strategies associated with compromise do not constitute a Nash equilibrium.

To see this, assume play is at the compromise position (3,3), that is, the U.S. blockades Cuba and the S.U. withdraws its missiles. This strategy is not stable, because both players would have an incentive to defect to their more belligerent strategy. If the U.S. were to defect by changing its strategy to airstrike, play would move to (4,2), improving the payoff the U.S. received; if the S.U. were to defect by changing its strategy to maintenance, play would move to (2,4), giving the S.U. a payoff of 4. (This classic game theory setup gives us no information about which outcome would be chosen, because the table of payoffs is symmetric for the two players. This is a frequent problem in interpreting the results of a game theoretic analysis, where more than one equilibrium position can arise.) Finally, should the players be at the mutually worst outcome of (1,1), that is, nuclear war, both would obviously desire to move away from it, making the strategies associated with it, like those with (3,3), unstable.

Theory of Moves and the Missile Crisis

Using Chicken to model a situation such as the Cuban missile crisis is problematic not only because the (3,3) compromise outcome is unstable but also because, in real life, the two sides did not choose their strategies simultaneously, or independently of each other, as assumed in the game of Chicken described above. The Soviets responded specifically to the blockade after it was imposed by the United States. Moreover, the fact that the United States held out the possibility of escalating the conflict to at least an air strike indicates that the initial blockade decision was not considered final - that is, the United States considered its strategy choices still open after imposing the blockade.

As a consequence, this game is better modelled as one of sequential bargaining, in which neither side made an all-or-nothing choice but rather both considered alternatives, especially should the other side fail to respond in a manner deemed appropriate. In the most serious breakdown in the nuclear deterrence relationship between the superpowers that had persisted from World War II until that point, each side was gingerly feeling its way, step by ominous step. Before the crisis, the Soviets, fearing an invasion of Cuba by the United States and also the need to bolster their international strategic position, concluded that installing the missiles was worth the risk. They thought that the United States, confronted by a fait accompli, would be deterred from invading Cuba and would not attempt any other severe reprisals. Even if the installation of the missiles precipitated a crisis, the Soviets did not reckon the probability of war to be high (President Kennedy estimated the chances of war to be between 1/3 and 1/2 during the crisis), thereby making it rational for them to risk provoking the United States.

There are good reasons to believe that U.S. policymakers did not view the confrontation to be Chicken-like, at least as far as they interpreted and ranked the possible outcomes. I offer an alternative representation of the Cuban missile crisis in the form of a game I will call Alternative, retaining the same strategies for both players as given in Chicken but presuming a different ranking and interpretation of outcomes by the United States [see Figure 2]. These rankings and interpretations fit the historical record better than those of "Chicken", as far as can be told by examining the statements made at the time by President Kennedy and the U.S. Air Force, and the type and number of nuclear weapons maintained by the S.U. (more on this below).

  1. BW: The choice of blockade by the United States and withdrawal by the Soviet Union remains the compromise for both players - (3,3).
  2. BM: In the face of a U.S. blockade, Soviet maintenance of their missiles leads to a Soviet victory (its best outcome) and U.S. capitulation (its worst outcome) - (1,4).
  3. AM: An air strike that destroys the missiles that the Soviets were maintaining is an "honourable" U.S. action (its best outcome) and thwarts the Soviets (their worst outcome) - (4,1).
  4. AW: An air strike that destroys the missiles that the Soviets were withdrawing is a "dishonorable" U.S. action (its next-worst outcome) and thwarts the Soviets (their next-worst outcome) - (2,2).

Soviet Union (S.U.)
Withdrawal (W) Maintenance (M)
United
States
(US)
Blockade
(B)
Compromise
(3,3)
$\rightarrow $ Soviet victory,
U.S. capitulation
(1,4)
$\uparrow $ $\downarrow $
Air strike
(A)
"Dishonourable" U.S. action,
Soviets thwarted
(2,2)
$\leftarrow $ "Honourable" U.S. action,
Soviets thwarted
(4,1)

Figure 2: Cuban missile crisis as Alternative

Key: (x, y) = (payoff to U.S., payoff to S.U.)
4 = best; 3 = next best; 2 = next worst; l = worst
Nonmyopic equilibria in bold
Arrows indicate direction of cycling

Even though an air strike thwarts the Soviets at both outcomes (2,2) and (4,1), I interpret (2,2) to be less damaging for the Soviet Union. This is because world opinion, it may be surmised, would severely condemn the air strike as a flagrant overreaction - and hence a "dishonourable" action of the United States - if there were clear evidence that the Soviets were in the process of withdrawing their missiles anyway. On the other hand, given no such evidence, a U.S. air strike, perhaps followed by an invasion, would action to dislodge the Soviet missiles.

The statements of U.S. policy makers support Alternative. In responding to a letter from Khrushchev, Kennedy said,

"If you would agree to remove these weapons systems from Cuba . . . we, on our part, would agree . . . (a) to remove promptly the quarantine measures now in effect and (b) to give assurances against an invasion of Cuba,"
which is consistent with Alternative since (3,3) is preferred to (2,2) by the United States, whereas (4,2) is not preferred to (3,3) in Chicken.

If the Soviets maintained their missiles, the United States preferred an air strike to the blockade. As Robert Kennedy, a close adviser to his brother during the crisis, said,

"If they did not remove those bases, we would remove them,"
which is consistent with Alternative, since the United States prefers (4,1) to (1,4) but not (1,1) to (2,4) in Chicken.

Finally, it is well known that several of President Kennedy's advisers felt very reluctant about initiating an attack against Cuba without exhausting less belligerent courses of action that might bring about the removal of the missiles with less risk and greater sensitivity to American ideals and values. Pointedly, Robert Kennedy claimed that an immediate attack would be looked upon as "a Pearl Harbor in reverse, and it would blacken the name of the United States in the pages of history," which is again consistent with the Alternative since the United States ranks AW next worst (2) - a "dishonourable" U.S. action - rather than best (4) - a U.S. victory - in Chicken.

If Alternative provides a more realistic representation of the participants' perceptions than Chicken does, standard game theory offers little help in explaining how the (3,3) compromise was achieved and rendered stable. As in Chicken, the strategies associated with this outcome are not a Nash equilibrium, because the Soviets have an immediate incentive to move from (3,3) to (1,4).

However, unlike Chicken, Alternative has no outcome at all that is a Nash equilibrium, except in "mixed strategies". These are strategies in which players randomize their choices, choosing each of their two so-called pure strategies with specified probabilities. But mixed strategies cannot be used to analyse Alternative, because to carry out such an analysis, there would need to be numerical payoffs assigned to each of the outcomes, not the rankings I have assumed.

The instability of outcomes in Alternative can most easily be seen by examining the cycle of preferences, indicated by the arrows going in a clockwise direction in this game. Following these arrows shows that this game is cyclic, with one player always having an immediate incentive to depart from every state: the Soviets from (3,3) to (1,4); the United States from (1,4) to (4,1); the Soviets from (4,1) to (2,2); and the United States from (2,2) to (3,3). Again we have indeterminacy, but not because of multiple Nash equilibria, as in Chicken, but rather because there are no equilibria in pure strategies in Alternative.

Rules of Play in Theory of Moves

How, then, can we explain the choice of (3,3) in Alternative, or Chicken for that matter, given its nonequilibrium status according to standard game theory? It turns out that (3,3) is a "nonmyopic equilibrium" in both games, and uniquely so in Alternative, according to the theory of moves (TOM). By postulating that players think ahead not just to the immediate consequences of making moves, but also to the consequences of countermoves to these moves, counter-countermoves, and so on, TOM extends the strategic analysis of conflict into the more distant future.

To be sure, game theory allows for this kind of thinking through the analysis of "game trees," where the sequential choices of players over time are described. But the game tree continually changed with each development in the crisis. By contrast, what remained more or less constant was the configuration of payoffs of Alternative, though where the players were in the matrix changed. In effect, TOM, by describing the payoffs in a single game but allowing players to make successive calculations of moves to different positions within it, adds nonmyopic thinking to the economy of description offered by classical game theory.

The founders of game theory, John von Neumann and Oskar Morgenstern, defined a game to be "the totality of rules of play which describe it." While the rules of TOM apply to all games between two players, here I will assume that the players each have just two strategies. The four rules of play of TOM describe the possible choices of the players at each stage of play:

Rules of Play

  1. Play starts at an initial state, given at the intersection of the row and column of a payoff matrix.
  2. Either player can unilaterally switch its strategy, or make a move, and thereby change the initial state into a new state, in the same row or column as the initial state. The player who switches is called player l (P1).
  3. Player 2 (P2) can respond by unilaterally switching its strategy, thereby moving the game to a new state.
  4. The alternating responses continue until the player (P1 or P2) whose turn it is to move next chooses not to switch its strategy. When this happens, the game terminates in a final state, which is the outcome of the game.

Termination Rule

  1. A player will not move from an initial state if this moves (i) leads to a less preferred outcome, or (ii) returns play to the initial state, making this state the outcome.

Precedence Rule

  1. If it is rational for one player to move and the other player not to move from the initial state, the move takes precedence: it overrides staying, so the outcome will be induced by the player that moves.

Note that the sequence of moves and countermoves is strictly alternating: first, say, the row player moves, then the column player, and so on, until one player stops, at which point the state reached is final and, therefore, the outcome of the game. I assume that no payoffs accrue to players from being in a state unless it becomes the outcome (which could be the initial state if the players choose not to move from it).

To assume otherwise would require that payoffs be numerical, rather than ordinal ranks, which players can accumulate as they pass through states. But in many real-life games, payoffs cannot easily be quantified and summed across the states visited. Moreover, the big reward in many games depends overwhelmingly on the final state reached, not on how it was reached. In politics, for example, the payoff for most politicians is not in campaigning, which is arduous and costly, but in winning.

Rule l differs drastically from the corresponding rule of play in standard game theory, in which players simultaneously choose strategies in a matrix game that determines its outcome. Instead of starting with strategy choices, TOM assumes that players are already in some state at the start of play and receive payoffs from this state only if they stay. Based on these payoffs, they must decide, individually, whether or not to change this state in order to try to do better.

Of course, some decisions are made collectively by players, in which case it is reasonable to say that they choose strategies from scratch, either simultaneously or by coordinating their choices. But if, say, two countries are coordinating their choices, as when they agree to sign a treaty, the important strategic question is what individualistic calculations led them to this point. The formality of jointly signing the treaty is the culmination of their negotiations and does not reveal the move-countermove process that preceded the signing. It is precisely these negotiations, and the calculations underlying them, that TOM is designed to uncover.

To continue this example, the parties that sign the treaty were in some prior state from which both desired to move - or, perhaps, only one desired to move and the other could not prevent this move from happening (rule 6). Eventually they may arrive at a new state, after, say, treaty negotiations, in which it is rational for both countries to sign the treaty that was previously negotiated.

As with a treaty signing, almost all outcomes of games that we observe have a history. TOM seeks to explain strategically the progression of (temporary) states that lead to a (more permanent) outcome. Consequently, play of a game starts in an initial state, at which players collect payoffs only if they remain in that state so that it becomes the final state, or outcome, of the game.

If they do not remain, they still know what payoffs they would have collected had they stayed; hence, they can make a rational calculation of the advantages of staying or moving. They move precisely because they calculate that they can do better by switching strategies, anticipating a better outcome when the move-countermove process finally comes to rest. The game is different, but not the configuration of payoffs, when play starts in a different state.

Rules l - 4 (rules of play) say nothing about what causes a game to end, only when: termination occurs when a "player whose turn it is to move next chooses not to switch its strategy" (rule 4). But when is it rational not to continue moving, or not to move at all from the initial state?

Rule 5 (termination rule) says when a player will not move from an initial state. While condition (i) of this rule needs no defence, condition (ii) requires justification. It says that if it is rational, after P1 moves, for play of the game to cycle back to the initial state, P1 will not move in the first place. After all, what is the point of initiating the move-countermove process if play simply returns to "square one," given that the players receive no payoffs along the way to the outcome?

Backward Induction

To determine where play will end up when at least one player wants to move from the initial state, I assume the players use backward induction. This is a reasoning process by which the players, working backward from the last possible move in a game, anticipate each other's rational choices. For this purpose, I assume that each has complete information about the other's preferences, so each can calculate the other player's rational choices, as well as its own, in deciding whether to move from the initial state or any subsequent state.

To illustrate backward induction, consider again the game Alternative in Figure 2. After the missiles were detected and the United States imposed a blockade on Cuba, the game was in state BM, which is worst for the United States (1) and best for the Soviet Union (4). Now consider the clockwise progression of moves that the United States can initiate by moving to AM, the Soviet Union to AW, and so on, assuming the players look ahead to the possibility that the game makes one complete cycle and returns to the initial state (state 1):

State 1 State 2 State 3 State 4 State 1
U.S. starts U.S.
(1,4)
$\rightarrow $ S.U.
(4,1)
$\rightarrow $ U.S.
(2,2)
$\rightarrow $ | S.U.
(3,3)
$\rightarrow $
(1,4)
Survivor (2,2) (2,2) (2,2) (1,4)

This is a game tree, though drawn horizontally rather than vertically. The survivor is a state selected at each stage as the result of backward induction. It is determined by working backward from where play, theoretically, can end up (state 1, at the completion of the cycle).

Assume the players' alternating moves have taken them clockwise in Alternative from (1,4) to (4,1) to (2,2) to (3, 3), at which point S.U. in state 4 must decide whether to stop at (3,3) or complete the cycle by returning to (1,4). Clearly, S.U. prefers (1,4) to (3,3), so (1,4) is listed as the survivor below (3,3): because S.U. would move the process back to (1,4) should it reach (3,3), the players know that if the move-countermove process reaches this state, the outcome will be (1,4).

Knowing this, would U.S. at the prior state, (2,2), move to (3,3)? Because U.S. prefers (2,2) to the survivor at (3,3) - namely, (1,4) - the answer is no. Hence, (2,2) becomes the survivor when U.S. must choose between stopping at (2,2) and moving to (3,3) - which, as I just showed, would become (1,4) once (3,3) is reached.

At the prior state, (4,1), S.U. would prefer moving to (2,2) than stopping at (4,1), so (2,2) again is the survivor if the process reaches (4,1). Similarly, at the initial state, (1,4), because U.S. prefers the previous survivor, (2,2), to (1,4), (2,2) is the survivor at this state as well.

The fact that (2,2) is the survivor at the initial state, (1,4), means that it is rational for U.S. to move to (4,1), and S.U. subsequently to (2,2), where the process will stop, making (2,2) the rational choice if U.S. moves first from the initial state, (1,4). That is, after working backwards from S.U.'s choice of completing the cycle or not from (3,3), the players can reverse the process and, looking forward, determine what is rational for each to do. I indicate that it is rational for the process to stop at (2,2) by the vertical line blocking the arrow emanating from (2,2), and underscoring (2,2) at this point.

Observe that (2,2) at state AM is worse for both players than (3,3) at state BW. Can S.U., instead of letting U.S. initiate the move-countermove process at (1,4), do better by seizing the initiative and moving, counterclockwise, from its best state of (1,4)? Not only is the answer yes, but it is also in the interest of U.S. to allow S.U. to start the process, as seen in the following counterclockwise progression of moves from (1,4):

State 1 State 2 State 3 State 4 State 1
S.U. starts S.U.
(1,4)
$\rightarrow $ U.S.
(3,3)
$\rightarrow $ | S.U.
(2,2)
$\rightarrow $ U.S.
(4,1)
$\rightarrow $
(1,4)
Survivor (3,3) (3,3) (2,2) (4,1)

S.U., by acting "magnanimously" in moving from victory (4) at BM to compromise (3) at BW, makes it rational for U.S. to terminate play at (3,3), as seen by the blocked arrow emanating from state 2. This, of course, is exactly what happened in the crisis, with the threat of further escalation by the United States, including the forced surfacing of Soviet submarines as well as an air strike (the U.S. Air Force estimated it had a 90 percent chance of eliminating all the missiles), being the incentive for the Soviets to withdraw their missiles.

Applying TOM

Like any scientific theory, TOM's calculations may not take into account the empirical realities of a situation. In the second backward-induction calculation, for example, it is hard to imagine a move by the Soviet Union from state 3 to state 4, involving maintenance (via reinstallation?) of their missiles after their withdrawal and an air strike. However, if a move to state 4, and later back to state 1, were ruled out as infeasible, the result would be the same: commencing the backward induction at state 3, it would be rational for the Soviet Union to move initially to state 2 (compromise), where play would stop.

Compromise would also be rational in the first backward-induction calculation if the same move (a return to maintenance), which in this progression is from state 4 back to state 1, were believed infeasible: commencing the backward induction at state 4, it would be rational for the United States to escalate to air strike to induce moves that carry the players to compromise at state 4. Because it is less costly for both sides if the Soviet Union is the initiator of compromise - eliminating the need for an air strike - it is not surprising that this is what happened.

To sum up, the Theory of Moves renders game theory a more dynamic theory. By postulating that players think ahead not just to the immediate consequences of making moves, but also to the consequences of countermoves to those moves, counter-countermoves, and so on, it extends the strategic analysis of conflicts into the more distant future. TOM has also been used to elucidate the role that different kinds of power - moving, order and threat - may have on conflict outcomes, and to show how misinformation can affect player choices. These concepts and the analysis have been illustrated by numerous cases, ranging from conflicts in the Bible to disputes and struggles today.

Further Reading

  1. "Theory of Moves", Steven J. Brams. Cambridge University Press, 1994.
  2. "Game Theory and Emotions", Steven J. Brams in Rationality and Society, Vol. 9, No. 1, pages 93-127, February 1997.
  3. "Long-term Behaviour in the Theory of Moves", Stephen J. Willson, in Theory and Decision, Vol. 45, No. 3, pages 201-240, December 1998.
  4. "Catch-22 and King-of-the-Mountain Games: Cycling, Frustration and Power", Steven J. Brams and Christopher B. Jones, in Rationality and Society, Vol. 11, No. 2, pages 139-167, May 1999.
  5. "Modeling Free Choice in Games", Steven J. Brams in Topics in Game Theory and Mathematical Economics: Essays in Honor of Robert J. Aumann, pages 41-62. Edited by Myrna H. Wooders. American Mathematical Society, 1999.

About the author

Steven J. Brams is professor of politics at New York University. He is the author or co-author of 13 books that involve applications of game theory and social choice theory to voting and elections, bargaining and fairness, international relations, and the Bible and theology. His most recent books are Fair Division: From Cake-Cutting to Dispute Resolution (1996) and The Win-Win Solution: Guaranteeing Fair Shares to Everybody (1999), both co-authored with Alan D. Taylor. He is a Fellow of the American Association for the Advancement of Science, the Public Choice Society, a Guggenheim Fellow, and was a Russell Sage Foundation Visiting Scholar and a President of the the Peace Science Society (International).

Comments

Permalink

Fascinating article, and a brilliant illustration to the Theory of Moves.

My only issue was that I took exception to your explanation of rule 5 (ii), as I believed that a couple of crucial points had been missed.

Firstly, should we find that it is the player who made the first move, who would choose to shift the state back to it's origin, the opponent would then have the option of moving or staying. Not restricting to a 2x2 board, the second player may well have alternatives than the first move that the original player used, which could lead the game in a different direction.

However in the full rules of the game this is not a plausible situation (least not with 2 players).

In a game, a player can only move one dimensionally. Whilst implied, this isn't explicitly stated in the rules, causing my brief confusion. However since a player can only move one dimensionally (on any sized grid), with each player alternating in moves, there must be an even number of moves to return the game back to it's original state.

Secondly, is what happens when the grid is expanded. As you have said, if player one knows that his original move will result in the same set of moves as occurred before, then he will not choose it. On a 2x2 grid, his original move was his only move from the original state.

As you hinted at earlier in the article, options are usually more wide ranging (especially in a situations such as the Cuban Missile Crisis). Suppose then Player 1 may want to consider the possibility of taking a different route? This move may be his "second best" choice to the first move he could've made, (thereby explaining why it wasn't chosen to begin with) and results (as per rule 5(i)) with a state better than the one he has started with. Given the implications of Rule 5(ii), the game would still finish at this point and negotiations would not continue, and you can have no "Plan B", which is not usually the case. I'd be interested to know if there is an argument against an amendment to Rule 5(ii) for such a situation.

However when you do expand to show progression of possible moves, in the Cuban Missile Crisis example, knowing the reactions of what the other side would choose in this case, and logically working through all possible paths, you will come to the same conclusion as before. Any alternatives not mentioned, that had been options to the US or the Soviet Union, would have led to unfavourable consequences to both sides, or back circular again to the original starting point, and hence then why they then chose the moves that they did.