Natural selection, maths and milk

March 2008

Let's recall the definitions:

F_X = observed frequency of the allele X (where X is one of T, C, A or B), regardless of which other allele it occurs with;

F_XY = observed frequency of X allele occurring with Y allele (where X is one of T or C and Y is one of A or B).

Each of the quantities above lie between zero and one, because they are frequencies: F_X is the number of times that the allele X was observed in our sample, divided by the total size of the sample.

Now, since in each individual case one of the T or C allele must always be observed, we have

F_T + F_C =1.

Similarly,

F_A + F_B =1.

We also have:

F_TA + F_TB = F_T;

F_CA + F_CB = F_C;

F_TA + F_CA = F_A;

F_TB + F_CB = F_B.

First we will show that

D = F_TA - F_TF_A = F_CB - F_CF_B.

This means that the value D can be calculated in two different ways and this will be useful later.

Clearly,

F_TA = F_T - F_TB = 1 - F_C - F_TB.

D = F_TA - F_TF_A

= 1 - F_C - F_TB - (1 - F_C)(1 - F_B)

= F_B - F_TB - F_CF_B = F_CB - F_CF_B,

which is what we wanted to show.

Now note that F_TA can take any value between 0 and F_T. If F_TA = 0, then this means that T was never observed with A, and if F_TA = F_T, then T was only ever observed with A.

Since D = F_TA - F_TF_A, we get

-F_TF_A ≤ D ≤ F_T - F_TF_A = F_TF_B.

Similarly, since D is also equal to F_CB - F_CF_B and since 0 ≤ F_CB ≤ F_C, we get

-F_CF_B ≤ D ≤ F_C - F_TF_B = F_CF_A.

For D² this means

0 ≤ D² ≤ F_TF_C F_A F_B.

Therefore, r² always lies between 0 and 1.

The last thing to notice is that D² can indeed take the value 1, but only when F_TA = 0 and F_CB = 0 (so that T only occurs with B and C only occurs with A), or when F_TA = F_T and F_CB = F_C (so that T only occurs with A and C only occurs with B).

In other words, r² = 1 precisely when there is complete association between alleles.

Return to main article