Bluffing and exploitation: An introduction to poker maths

by John Billingham

If you've never played poker, you probably think that it's a game for degenerate gamblers and cigar-chomping hustlers in cowboy hats. That's certainly what I used to think. It turns out that poker is actually a very complicated game indeed.

Hand of cards

Poker originated in Europe in the middle ages.

The early forerunners of poker originated in Europe in the middle ages, including brag in England and pochen (meaning to bluff) in Germany. The French game of poque spread to America in the late 18th century, where it developed into modern poker. The derivation of the word poker suggests that it's mainly a game about bluffing, which is perhaps an indicator that it's a game of psychology and cunning rather than a sophisticated and challenging pastime. However, as I will show you below, bluffing is not a low and tricky manoeuvre, but a mathematically essential part of the game.

I'm going to teach you how to play one of the simplest possible versions of poker. Instead of using a full deck, we're going to use just three cards — A♠, K♠ and Q♠ — and we're only going to allow two players to play — let's call them John and Tom. Each player puts £1 on the table, and is then randomly dealt one of the three cards. Tom can now either bet £1 or check. If he checks, each player shows his card, and the player with the best one wins the £2 on the table (A♠ > K♠ > Q♠). If Tom bets, John can either fold, in which case Tom wins the £2 on the table and neither player has to show their card, or he can call by matching Tom's bet of £1, after which the cards are shown and the best card wins the £4 on the table.

  • If Tom has A♠, he will bet, hoping that John calls. If John has A♠ he will call a bet from Tom. A player with A♠ knows that he has the best hand.
  • If John has Q♠, he will fold if Tom bets — he knows that he has the worst hand.
  • If Tom has K♠, he will check. If he bets, John will fold with Q♠ and call with A♠ — he cannot expect John to call with a worse hand or fold a better hand.
  • What should Tom do with Q♠? If he bets, he may be able to get John to fold K♠, and he can't win by checking — he knows that he has the worst hand. A bet from Tom with Q♠ is a bluff.
  • What should John do with K♠ if Tom bets? He'd like to fold if Tom has A♠, but if Tom is bluffing with Q♠, he'd like to call.

(Anybody who has played poker seriously should be able to recognise a bluff, a value bet, a hand with showdown value and a bluff catcher in this discussion.)

In the absence of any frantic ear tugging or eyelid twitching from either player, John and Tom need to think mathematically about this game in order to work out sensible strategies when they play repeatedly.

If Tom always bluffs with Q♠, John may notice that he is facing lots of bets, and decide to call whenever he has K♠. However, Tom may then realise that John is calling his bluffs every time and stop bluffing, so that John will give Tom an extra £1 whenever Tom has A♠, but Tom will save his money when he has Q♠. A rational response is for John to stop calling with K♠, but then Tom can start bluffing again, and round and round the cycle goes.

Unexploitable bluffing

Whenever a cycle like this arises in a game, the mathematician John Nash showed that there exists what's known as an unexploitable mixed strategy. Tom should bluff some fraction of the time that he has Q♠, whilst John should call to catch a bluff some fraction of the time that he has K♠. The strategy is called unexploitable because neither player can improve their win rate by choosing a different strategy, so neither has an incentive to deviate from it. Below I will explain the mathematics that leads us to conclude that the unexploitable mixed strategy for Tom is to bluff 1/3 of the time that he is dealt Q♠ and for John to call 1/3 of the time that he is dealt K♠. If either player uses this strategy, Tom makes a profit in the long run of about 5.5p per hand. The game is skewed in Tom's favour since he has the option of checking and showing down K♠, an option that John does not have. The game can be made fair by forcing John and Tom to alternate roles from hand to hand.

We can get some idea of the complexity of this game by looking at its decision tree, shown below. The tree shows all the possible paths the game can take. (If you don't want to see all the maths, skip ahead.)

Decision tree

The decision tree for the AKQ game.

The open circles in the decision tree represent points where different random events can happen. We'll call these random nodes. In the AKQ game, the random nodes represent John and Tom's cards: the first random node at the top represents John receiving one of the three cards and the second row of random nodes represents Tom receiving one of the remaining two cards. The probability of each card being dealt is given along the arrows. The solid nodes are decision nodes, where one of the players must choose an action. The only decisions are whether Tom bluffs with Q♠ (at frequency b) and whether John bluffcatches (calls) with K♠ (at frequency c).

John Nash

Nobel laureate John Forbes Nash made fundamental contributions to game theory.

The additional amount that John wins at the end of the game on top of what he would have won had they shown their cards straight away (his ex-showdown winnings) is given at the bottom of each leaf of the decision tree. For example, if John has A and Tom has Q and decides to bluff (the path shown in red in the tree), then John will catch the bluff and win the £4 on the table, making a profit of £2. Had they shown their cards straight away, then John would have won the £2 on the table, making a profit of £1. Therefore John's ex-showdown winnings are £1.

By multiplying out the numbers along the arrows of each possible path through the decision tree, we can calculate the probability that the game takes this path. So the probability that John has A and Tom has Q and bluffs is $\frac{1}{3} \times \frac{1}{2} \times b = \frac{b}{6}.$ Multiplying this probability by the number of the corresponding leaf gives us the value of the game to John. In our example this is $\frac{b}{6} \times 1 = \frac{b}{6}.$ From this value a computer (or a human for a small decision tree like this) can work out an optimal strategy. The dotted lines connect decision nodes where the decision is the same — these are called information sets. As you can see, the AKQ tree has nine nodes, ten leaves and two information sets.

In order to work out $E(J),$ the amount that John can expect to win on average (if the game were played many, many times) in addition to what he would win at showdown (his ex-showdown winnings), we have to add up the results we get from the individual paths through the decision tree. This gives

  \[ E(J) = \frac{1}{6}bc - \frac{1}{3}b\left(1-c\right)-\frac{1}{6}c+\frac{1}{6}b = \frac{1}{2}b\left(c-\frac{1}{3}\right)-\frac{1}{6}c. \]    

The amount that Tom can expect to win, $E(T)$, is $-E(J)$, since John’s loss is Tom’s gain, so Tom wants to make $E(J)$ as small as possible.

If John calls with a fraction of his Kings greater than $\frac{1}{3},$ so that $c - \frac{1}{3}$ is positive, then

  \[  E(J) = \frac{1}{2}b\left(c-\frac{1}{3}\right)-\frac{1}{6}c \geq -\frac{1}{6}c \]    

and the minimum is achieved when $b=0$ so Tom can maximally exploit him (minimise $E(J)$) by never bluffing ($b=0$). Similarly you can work out that if John calls less often than this, so that $c-\frac{1}{3}$ is negative, Tom can maximally exploit John by always bluffing ($b=1$). The only way that John can defend himself against exploitation is to use $c=\frac{1}{3}$. This is his unexploitable calling (bluff catching) frequency. His bluff catches with K♠ and value calls with A♠ are in the appropriate ratio to make Tom indifferent, i.e. however often he bluffs, his win rate does not change.

We can also rearrange the expression for $E(J)= -E(T)$ to get

  \[ E(T) = \frac{1}{2}c\left(\frac{1}{3}-b\right)+\frac{1}{6}b. \]    

A similar argument shows that Tom’s unexploitable bluffing frequency is $b= \frac{1}{3}.$ John can maximally exploit any deviation from this strategy by either always calling with K♠ ($c=1$), if Tom bluffs too often, or always folding with K♠ ($c=0$), if Tom doesn’t bluff often enough.

If either player uses his unexploitable strategy we have

  \[ E(T) = \frac{1}{18} \]    

so Tom wins on average £1/18 = 5.5p per hand.

Only the tip of the tree

A much more complex variant of poker is two player Fixed Limit Rhode Island HoldEm. In this game each player is dealt a card from a full deck, there are three rounds of betting, and a card is dealt on the table between each round. The players try to make the best three card hand. The decision tree for this game has about three billion nodes, and it's the largest game for which the unexploitable strategy has been calculated (see the further reading list below for more information).

Cowboy accessories

Poker: not just for cowboys.

The simplest poker variant that people actually play for money is two player Fixed Limit Texas HoldEm. Each player starts with two cards, there are four rounds of betting and a total of five cards are dealt on the table between each of these rounds. The decision tree for this game has about 1018 nodes (that's 1 with 18 zeros after it, or a billion billion). Although there are computer algorithms that can calculate a good approximation to the optimal strategy, the exact solution has yet to be found.

The most popular variant of poker nowadays is No Limit Texas HoldEm, in which the players can bet any amount, which adds another layer of complexity, but even this is simpler than Pot Limit Omaha, a game that is growing in popularity, in which each player is dealt four cards. In addition, poker is usually played by more than two players at a time, sometimes with as many as ten. Once you start to think about the strategic complexity of these larger games, it really becomes quite mind boggling ... and we've only talked about finding the unexploitable strategy. Real human players don't play an unexploitable strategy. They can be exploited for a greater profit by playing a non-optimal, exploitable counterstrategy. How can you find out what strategy a player is using and determine the appropriate counterstrategy? What if your opponent changes his strategy? Programming a computer to adjust its strategy to maximally exploit an opponent is much harder than using it to calculate the unexploitable strategy, and the best human players can outperform the best computer programs, something that is now impossible for even the very best chess players. Poker is the king of games, and I've only been able to scratch the surface of its mathematical structure in this short article.

Further reading

About the author

John Billingham is a Professor of Theoretical Mechanics at the University of Nottingham. When he's not solving partial differential equations, he likes to play poker, both live and online. You can read about his feeble attempts to learn how to play better poker, along with a more in-depth discussion of the mathematics of poker in The Education of a Modern Poker Player, a book that he wrote with the help of Thomas Tiroch and Emanuel Cinca, both of whom really know their way around a decision tree.


Poker Theory

Please, more articles on bluffing theory! This is awesome.