# Natural selection, maths and milk

Issue 46Let's recall the definitions:

*F _{X}* = observed frequency of the allele

**X**(where

**X**is one of

**T**,

**C**,

**A**or

**B**), regardless of which other allele it occurs with;

*F _{XY}* = observed frequency of

**X**allele occurring with

**Y**allele (where

**X**is one of

**T**or

**C**and

**Y**is one of

**A**or

**B**).

Each of the quantities above lie between zero and one, because they are *frequencies*: *F _{X}* is the number of times that the allele

**X**was observed in our sample, divided by the total size of the sample.

Now, since in each individual case one of the **T** or **C** allele must always be observed, we have

*F _{T} + F_{C}* =1.

Similarly,

*F _{A} + F_{B}* =1.

We also have:

*F _{TA} + F_{TB}* = F

_{T};

*F _{CA} + F_{CB}* = F

_{C};

*F _{TA} + F_{CA}* = F

_{A};

*F _{TB} + F_{CB}* = F

_{B}.

First we will show that

*D = F _{TA} - F_{T}F_{A} = F_{CB} - F_{C}F_{B}*.

This means that the value *D* can be calculated in two different ways and this will be useful later.

Clearly,

*F _{TA} = F_{T} - F_{TB} = 1 - F_{C} - F_{TB}.*

So

*D = F _{TA} - F_{T}F_{A}*

*= 1 - F _{C} - F_{TB} - (1 - F_{C})(1 - F_{B})*

*= F _{B} - F_{TB} - F_{C}F_{B} = F_{CB} - F_{C}F_{B},*

Now note that *F _{TA}* can take any value between 0 and

*F*. If

_{T}*F*= 0, then this means that

_{TA}**T**was never observed with

**A**, and if

*F*=

_{TA}*F*, then

_{T}**T**was only ever observed with

**A**.

Since *D* = *F _{TA}* -

*F*

_{T}*F*, we get

_{A}*-F _{T}F_{A} ≤ D ≤ F_{T} - F_{T}F_{A} = F_{T}F_{B}*.

Similarly, since *D* is also equal to *F _{CB} - F_{C}F_{B}* and since 0 ≤

*F*≤

_{CB}*F*, we get

_{C}*-F _{C}F_{B} ≤ D ≤ F_{C} - F_{T}F_{B} = F_{C}F_{A}*.

For *D ^{2}* this means

*0 ≤ D ^{2} ≤ F_{T}F_{C} F_{A} F_{B}*.

Therefore, *r ^{2}* always lies between 0 and 1.

The last thing to notice is that *D ^{2}* can indeed take the value 1, but only when

*F*= 0 and

_{TA}*F*= 0 (so that

_{CB}**T**only occurs with

**B**and

**C**only occurs with

**A**), or when

*F*=

_{TA}*F*and

_{T}*F*=

_{CB}*F*(so that

_{C}**T**only occurs with

**A**and

**C**only occurs with

**B**).

In other words, *r ^{2}* = 1 precisely when there is complete association between alleles.