# Natural selection, maths and milk

March 2008

Let's recall the definitions:

FX = observed frequency of the allele X (where X is one of T, C, A or B), regardless of which other allele it occurs with;

FXY = observed frequency of X allele occurring with Y allele (where X is one of T or C and Y is one of A or B).

Each of the quantities above lie between zero and one, because they are frequencies: FX is the number of times that the allele X was observed in our sample, divided by the total size of the sample.

Now, since in each individual case one of the T or C allele must always be observed, we have

FT + FC =1.

Similarly,

FA + FB =1.

We also have:

FTA + FTB = FT;

FCA + FCB = FC;

FTA + FCA = FA;

FTB + FCB = FB.

First we will show that

D = FTA - FTFA = FCB - FCFB.

This means that the value D can be calculated in two different ways and this will be useful later.

Clearly,

FTA = FT - FTB = 1 - FC - FTB.

So

D = FTA - FTFA

= 1 - FC - FTB - (1 - FC)(1 - FB)

= FB - FTB - FCFB = FCB - FCFB,

which is what we wanted to show.

Now note that FTA can take any value between 0 and FT. If FTA = 0, then this means that T was never observed with A, and if FTA = FT, then T was only ever observed with A.

Since D = FTA - FTFA, we get

-FTFA ≤ D ≤ FT - FTFA = FTFB.

Similarly, since D is also equal to FCB - FCFB and since 0 ≤ FCBFC, we get

-FCFB ≤ D ≤ FC - FTFB = FCFA.

For D2 this means

0 ≤ D2 ≤ FTFC FA FB.

Therefore, r2 always lies between 0 and 1.

The last thing to notice is that D2 can indeed take the value 1, but only when FTA = 0 and FCB = 0 (so that T only occurs with B and C only occurs with A), or when FTA = FT and FCB = FC (so that T only occurs with A and C only occurs with B).

In other words, r2 = 1 precisely when there is complete association between alleles.