Skip to main content
Home
plus.maths.org

Secondary menu

  • My list
  • About Plus
  • Sponsors
  • Subscribe
  • Contact Us
  • Log in
  • Main navigation

  • Home
  • Articles
  • Collections
  • Podcasts
  • Maths in a minute
  • Puzzles
  • Videos
  • Topics and tags
  • For

    • cat icon
      Curiosity
    • newspaper icon
      Media
    • graduation icon
      Education
    • briefcase icon
      Policy

    Popular topics and tags

    Shapes

    • Geometry
    • Vectors and matrices
    • Topology
    • Networks and graph theory
    • Fractals

    Numbers

    • Number theory
    • Arithmetic
    • Prime numbers
    • Fermat's last theorem
    • Cryptography

    Computing and information

    • Quantum computing
    • Complexity
    • Information theory
    • Artificial intelligence and machine learning
    • Algorithm

    Data and probability

    • Statistics
    • Probability and uncertainty
    • Randomness

    Abstract structures

    • Symmetry
    • Algebra and group theory
    • Vectors and matrices

    Physics

    • Fluid dynamics
    • Quantum physics
    • General relativity, gravity and black holes
    • Entropy and thermodynamics
    • String theory and quantum gravity

    Arts, humanities and sport

    • History and philosophy of mathematics
    • Art and Music
    • Language
    • Sport

    Logic, proof and strategy

    • Logic
    • Proof
    • Game theory

    Calculus and analysis

    • Differential equations
    • Calculus

    Towards applications

    • Mathematical modelling
    • Dynamical systems and Chaos

    Applications

    • Medicine and health
    • Epidemiology
    • Biology
    • Economics and finance
    • Engineering and architecture
    • Weather forecasting
    • Climate change

    Understanding of mathematics

    • Public understanding of mathematics
    • Education

    Get your maths quickly

    • Maths in a minute

    Main menu

  • Home
  • Articles
  • Collections
  • Podcasts
  • Maths in a minute
  • Puzzles
  • Videos
  • Topics and tags
  • Audiences

    • cat icon
      Curiosity
    • newspaper icon
      Media
    • graduation icon
      Education
    • briefcase icon
      Policy

    Secondary menu

  • My list
  • About Plus
  • Sponsors
  • Subscribe
  • Contact Us
  • Log in
  • Tails, Bayes loses: Appendix

    9 May, 2024

    This is an appendix with some technical detail for the article Tails, Bayes loses (and invents data assimilation in the process) You'll need to have some knowledge of probability theory and statistics to understand this.

    Technical explanation

    In trying to make use of data which has errors in it to estimate an event $E$, we want to choose the event $E$ which has the maximum chance of happening given the data that we have observed. This is often called the maximum likelihood estimate or MLE.

    We can do this using Bayes' theorem. We already learned that if the probability of an event $E$ happening is $P(E),$ and the probability of the data being observed given the event has happened is $P(D|E),$ then the probability $P(E|D)$ of that event happening, given the observation of the data, is given by Bayes' theorem and is $$P(E|D) = P(D|E) P(E)/P(D).$$

    Here $P(D)$ is the probability of recording the data from the system we are studying, which can occur whether or not we see the event. To find the MLE we want to choose the estimate of the event which makes $P(E|D)$ as large as possible. It follows from Bayes' theorem that we want to find the event $E$ for which the Bayesian estimate $M$ of the MLE given by $$M = P(D|E) P(E)$$ is as large as possible. Now let's suppose that the event $E$ is a one-dimensional variable (so it takes on values that are single numbers, such as temperature) which has a mean value $e$, which is our prior estimate for $E$, and that its errors have variance $\sigma$ . We also suppose that the measured data is a one-dimensional variable which has mean $E$ (so that the data is an unbiased estimate of the event value) and its errors have variance $\nu.$ We now consider the special, but very common, case which arises when both the data errors and the errors in the event are independent Gaussian random variables (also called Normal random variables). Most realistic examples are like this. In this case we can estimate $P(E)$ and $P(D|E)$ by the expressions $$P(E) = \frac{1}{\sqrt{2\pi \sigma}} e^{-(E-e)^2/2\sigma}$$ and $$P(D|E) = \frac{1}{\sqrt{2\pi \nu}} e^{-(D-e)^2/2\nu}.$$

    It then follows from formula for the the Bayesian estimate above that $$M=\frac{e^{-J(E)}}{2\pi \sigma \nu},$$ where $$J(E)=\frac{(E-e)^2}{2\sigma}+ \frac{(D-E)^2}{2\nu}.$$

    We want to make $M$ as large as possible. We do this by finding the value of $E$ which makes $J(E)$ as small as possible. The mimimiser of $J(E)$ is then our MLE of the value of $E.$ For this one -dimensional example where $E$ is a single variable we can find the minimum of $J(E)$ exactly. In particular we have, $$E=\frac{\nu e + \sigma D}{\nu +\sigma}$$ as the best estimate for E. See the following example for how this works in the case of the thermometer problem and how much we need to nudge our estimates by.

    An explicit example

    Suppose that we want to predict the temperature of a room. We write $ T_{true}$ for the true temperature of the room. Using our knowledge of the weather over the last few days we make an unbiased prior prediction $T_{pred}$ of what we think the temperature should be. (As an example, a simple prediction of tomorrow's weather is given by today's weather. This prediction is right 70\% of the time!) The (prior) prediction of the temperature has an error $T_{pred }- T_{true}$. The error has variance $E_{pred}.$ We then look at the thermometer and it is recording a temperature of $T_{data}.$ The thermometer measurement says that the room is cold. We suspect that it may be wrong because everyone in the room is dressed in summer clothing and are fanning themselves to keep cool.

    The data error $T_{data} -T_{true}$ has variance $E_{data},$ which from the above considerations is likely to be large. We will assume that the prediction error and the data error are independent random variables, which each follow the normal distribution (find out what this means here).

    To nudge the prediction in the direction of the data we construct a new measurement, the analysis, $T_{analysis}$ given by $$ T_{analysis}=\lambda T_{pred} + (1-\lambda)T_{data}.$$

    The nudging parameter $\lambda$ controls how much we nudge the prediction in the direction of the data, and we want to choose $\lambda$ so that the error in the analysis is has as small a variance as possible. This error is given by $T_{analysis}- T_{true}$ and a little algebra shows us that $$T_{analysis}- T_{true} = \lambda (T_{pred}-T_{true}) + (1-\lambda) (T_{data}-T_{true}).$$

    We can see that this is made up of the prediction error and the data error which we have assumed to be independent random variables. A standard result in probability theory then states that the variance $E$ of $$T_{analysis}-T_{true}$$ is given by $$ E=\lambda^2 E_{pred} +(1-\lambda)^2 E_{data}.$$

    We then want to find the value of $\lambda$ which minimises $E.$ If you know a bit of calculus, you'll know that you can do this by differentiating $E$ with respect to $\lambda$ and setting the result to zero. This gives the optimal value of $\lambda$ as $$\lambda_{opt}=\frac{E_{data}}{E_{pred}+E_{data}}$$

    and $$T_{anaysis}=\frac{E_{data} T_{pred}}{E_{pred}+E_{data}} + \frac{E_{pred} T_{data}}{E_{pred}+E_{data}}.$$

    This is our best estimate of the temperature of the room which nicely combines the prediction and the data into one formula. As a sanity check, if the data error variance $E_{data}$ is large compared to the prediction error variance $E_{pred}$ (as we suspect from looking at the room occupants), then $\lambda$ will be close to one, and we place much more reliance on the prediction than on the data. Also if the data error variance and the prediction error variance are the same (so that the prediction is as good an estimate as the data) then we have $$T_{analysis} = \frac{T_{pred}}{2} + \frac{T_{data}}{2},$$

    which looks very reasonable.

    Back to the article Tails, Bayes loses (and invents data assimilation in the process).


    About this article

    Chris Budd

    Chris Budd.

    Chris Budd OBE is Professor of Applied Mathematics at the University of Bath, Vice President of the Institute of Mathematics and its Applications, Chair of Mathematics for the Royal Institution and an honorary fellow of the British Science Association. He is particularly interested in applying mathematics to the real world and promoting the public understanding of mathematics.

    He has co-written the popular mathematics book Mathematics Galore!, published by Oxford University Press, with C. Sangwin, and features in the book 50 Visions of Mathematics ed. Sam Parc.

    • Log in or register to post comments

    Our Podcast: Maths on the Move

    Our Maths on the Move podcast brings you the latest news from the world of maths, plus interviews and discussions with leading mathematicians and scientists about the maths that is changing our lives.

    Apple Podcasts
    Spotify
    Podbean

    Plus delivered to you

    Keep up to date with Plus by subscribing to our newsletter or following Plus on X or Bluesky.

    University of Cambridge logo

    Plus is part of the family of activities in the Millennium Mathematics Project.
    Copyright © 1997 - 2025. University of Cambridge. All rights reserved.

    Terms