Tails, Bayes loses: Appendix

Share this page

This is an appendix with some technical detail for the article Tails, Bayes loses (and invents data assimilation in the process) You'll need to have some knowledge of probability theory and statistics to understand this.

Technical explanation

In trying to make use of data which has errors in it to estimate an event $E$, we want to choose the event $E$ which has the maximum chance of happening given the data that we have observed. This is often called the maximum likelihood estimate or MLE.

We can do this using Bayes’ theorem. We already learned that if the probability of an event $E$ happening is $P(E),$ and the probability of the data being observed given the event has happened is $P(D|E),$ then the probability $P(E|D)$ of that event happening, given the observation of the data, is given by Bayes’ theorem and is

  \[ P(E|D) = P(D|E) P(E)/P(D). \]    

Here $P(D)$ is the probability of recording the data from the system we are studying, which can occur whether or not we see the event.

To find the MLE we want to choose the estimate of the event which makes $P(E|D)$ as large as possible. It follows from Bayes’ theorem that we want to find the event $E$ for which the Bayesian estimate $M$ of the MLE given by

  \[ M = P(D|E) P(E) \]    

is as large as possible.

Now let’s suppose that the event $E$ is a one-dimensional variable (so it takes on values that are single numbers, such as temperature) which has a mean value $e$, which is our prior estimate for $E$, and that its errors have variance $\sigma $ . We also suppose that the measured data is a one-dimensional variable which has mean $E$ (so that the data is an unbiased estimate of the event value) and its errors have variance $\nu .$

We now consider the special, but very common, case which arises when both the data errors and the errors in the event are independent Gaussian random variables (also called Normal random variables). Most realistic examples are like this. In this case we can estimate $P(E)$ and $P(D|E)$ by the expressions

  \[ P(E) = \frac{1}{\sqrt{2\pi \sigma }} e^{-(E-e)^2/2\sigma } \]    


  \[ P(D|E) = \frac{1}{\sqrt{2\pi \nu }} e^{-(D-e)^2/2\nu }. \]    

It then follows from formula for the the Bayesian estimate above that

  \[ M=\frac{e^{-J(E)}}{2\pi \sigma \nu }, \]    


  \[ J(E)=\frac{(E-e)^2}{2\sigma }+ \frac{(D-E)^2}{2\nu }. \]    

We want to make $M$ as large as possible. We do this by finding the value of $E$ which makes $J(E)$ as small as possible. The mimimiser of $J(E)$ is then our MLE of the value of $E.$

For this one -dimensional example where $E$ is a single variable we can find the minimum of $J(E)$ exactly. In particular we have,

  \[ E=\frac{\nu e + \sigma D}{\nu +\sigma } \]    

as the best estimate for E.

See the following example for how this works in the case of the thermometer problem and how much we need to nudge our estimates by.

An explicit example

Suppose that we want to predict the temperature of a room. We write $ T_{true}$ for the true temperature of the room.

Using our knowledge of the weather over the last few days we make an unbiased prior prediction $T_{pred}$ of what we think the temperature should be. (As an example, a simple prediction of tomorrow’s weather is given by today’s weather. This prediction is right 70% of the time!) The (prior) prediction of the temperature has an error $T_{pred }- T_{true}$. The error has variance $E_{pred}.$

We then look at the thermometer and it is recording a temperature of $T_{data}.$ The thermometer measurement says that the room is cold. We suspect that it may be wrong because everyone in the room is dressed in summer clothing and are fanning themselves to keep cool.

The data error $T_{data} -T_{true}$ has variance $E_{data},$ which from the above considerations is likely to be large. We will assume that the prediction error and the data error are independent random variables, which each follow the normal distribution (find out what this means here).

To nudge the prediction in the direction of the data we construct a new measurement, the analysis, $T_{analysis}$ given by

  \[  T_{analysis}=\lambda T_{pred} + (1-\lambda )T_{data}. \]    

The nudging parameter $\lambda $ controls how much we nudge the prediction in the direction of the data, and we want to choose $\lambda $ so that the error in the analysis is has as small a variance as possible. This error is given by $T_{analysis}- T_{true}$ and a little algebra shows us that

  \[ T_{analysis}- T_{true} = \lambda (T_{pred}-T_{true}) + (1-\lambda ) (T_{data}-T_{true}). \]    

We can see that this is made up of the prediction error and the data error which we have assumed to be independent random variables. A standard result in probability theory then states that the variance $E$ of

  \[ T_{analysis}-T_{true} \]    

is given by

  \[  E=\lambda ^2 E_{pred} +(1-\lambda )^2 E_{data}. \]    

We then want to find the value of $\lambda $ which minimises $E.$ If you know a bit of calculus, you’ll know that you can do this by differentiating $E$ with respect to $\lambda $ and setting the result to zero. This gives the optimal value of $\lambda $ as

  \[ \lambda _{opt}=\frac{E_{data}}{E_{pred}+E_{data}} \]    


  \[ T_{anaysis}=\frac{E_{data} T_{pred}}{E_{pred}+E_{data}} + \frac{E_{pred} T_{data}}{E_{pred}+E_{data}}. \]    

This is our best estimate of the temperature of the room which nicely combines the prediction and the data into one formula. As a sanity check, if the data error variance $E_{data}$ is large compared to the prediction error variance $E_{pred}$ (as we suspect from looking at the room occupants), then $\lambda $ will be close to one, and we place much more reliance on the prediction than on the data. Also if the data error variance and the prediction error variance are the same (so that the prediction is as good an estimate as the data) then we have

  \[ T_{analysis} = \frac{T_{pred}}{2} + \frac{T_{data}}{2}, \]    

which looks very reasonable.

Back to the article Tails, Bayes loses (and invents data assimilation in the process).

About this article

Chris Budd

Chris Budd.

Chris Budd OBE is Professor of Applied Mathematics at the University of Bath, Vice President of the Institute of Mathematics and its Applications, Chair of Mathematics for the Royal Institution and an honorary fellow of the British Science Association. He is particularly interested in applying mathematics to the real world and promoting the public understanding of mathematics.

He has co-written the popular mathematics book Mathematics Galore!, published by Oxford University Press, with C. Sangwin, and features in the book 50 Visions of Mathematics ed. Sam Parc.