Skip to main content
Home
plus.maths.org

Secondary menu

  • My list
  • About Plus
  • Sponsors
  • Subscribe
  • Contact Us
  • Log in
  • Main navigation

  • Home
  • Articles
  • Collections
  • Podcasts
  • Maths in a minute
  • Puzzles
  • Videos
  • Topics and tags
  • For

    • cat icon
      Curiosity
    • newspaper icon
      Media
    • graduation icon
      Education
    • briefcase icon
      Policy

    Popular topics and tags

    Shapes

    • Geometry
    • Vectors and matrices
    • Topology
    • Networks and graph theory
    • Fractals

    Numbers

    • Number theory
    • Arithmetic
    • Prime numbers
    • Fermat's last theorem
    • Cryptography

    Computing and information

    • Quantum computing
    • Complexity
    • Information theory
    • Artificial intelligence and machine learning
    • Algorithm

    Data and probability

    • Statistics
    • Probability and uncertainty
    • Randomness

    Abstract structures

    • Symmetry
    • Algebra and group theory
    • Vectors and matrices

    Physics

    • Fluid dynamics
    • Quantum physics
    • General relativity, gravity and black holes
    • Entropy and thermodynamics
    • String theory and quantum gravity

    Arts, humanities and sport

    • History and philosophy of mathematics
    • Art and Music
    • Language
    • Sport

    Logic, proof and strategy

    • Logic
    • Proof
    • Game theory

    Calculus and analysis

    • Differential equations
    • Calculus

    Towards applications

    • Mathematical modelling
    • Dynamical systems and Chaos

    Applications

    • Medicine and health
    • Epidemiology
    • Biology
    • Economics and finance
    • Engineering and architecture
    • Weather forecasting
    • Climate change

    Understanding of mathematics

    • Public understanding of mathematics
    • Education

    Get your maths quickly

    • Maths in a minute

    Main menu

  • Home
  • Articles
  • Collections
  • Podcasts
  • Maths in a minute
  • Puzzles
  • Videos
  • Topics and tags
  • Audiences

    • cat icon
      Curiosity
    • newspaper icon
      Media
    • graduation icon
      Education
    • briefcase icon
      Policy

    Secondary menu

  • My list
  • About Plus
  • Sponsors
  • Subscribe
  • Contact Us
  • Log in
  • Maths in a minute: Variance

    18 December, 2017
    1 comments

    When faced with a list of numbers we often feel the urge to compute its average to get a rough idea of how big or small those numbers are. What the average doesn't tell us, however, is how spread out the numbers are. As an example, imagine you ask five of your friends what their yearly salaries are and you get the answers

    Two populations with the same mean and different variances.

    Here are two populations with the same average (100) and different variances. You can think of the horizontal axis measuring salaries and the vertical axis of measuring how many people in a population receive the corresponding salary. The red curve has a smaller variance because the salaries are not as spread out as they are for the blue one.

    \begin{array}[c] $20000$\\ $20000$\\ $20000$\\ $20000$\\ $100000.$\end{array} The average of this set of numbers is $(4 \times 20000 + 100000)/5 = 36000,$ but knowing this number doesn't give you any indication of the fact that there's one salary that's a lot bigger than the other four.

    To get a sense of how spread out a data set is, you can use something called its sample variance. First, work out the difference between each number on the list and the average of the list and then square that difference (the reason we square it is that we're only interested in the size of the difference and not whether it's positive or negative). Now take the average of these squared differences. In our example above, this gives us the following sample variance:

    $$(4 \times (36000-20000)^2+(36000-100000)^2)/5 = (4 \times 16000^2+64000^2)/5 = 1024000000.$$

    That's a very large number, which indicates that all of our five numbers lie quite far away from their average: an indication that the data set is very spread out. By contrast, look at the set

    \begin{array}[c] $20000$\\ $20000$\\ $20000$\\ $20000$\\ $20001.$\end{array} The average is $$(4 \times 20000 + 20001)/5 = 100001/5 = 20000.2.$$ The sample variance in this case is $$ (4 \times (20000.2 - 20000)^2 + (20000.2 - 20001)^2)/5 = 0.16.$$ That's a small number, indicating that the set of numbers isn't very spread out at all. Here's the formal definition of the sample variance $v$ of a list $x_1$, $x_2$, $x_3$, ..., $x_n$ of $n$ numbers whose average is $\bar{x}$: $$v= \frac{1}{n}\left((x_1-\bar{x})^2 + (x_2-\bar{x})^2 + ... + (x_n-\bar{x})^2\right).$$ This definition works for a given list of numbers, but there's also a definition of variance that works when you are dealing with a random process, such as rolling dice, and would like to know how spread out a list of outcomes of the process is likely to be. Suppose your random process has $n$ outcomes which we label $x_1$, $x_2$, $x_3$, etc, up to $x_n.$ If you're rolling a die, then $n=6$ and $x_1=1$, $x_2=2,$ ... ,$x_6=6.$ Also suppose you know the probability $p_1$, $p_2$, etc, up to $p_n$ of each outcome. In the case of a fair die we have $p_1=p_2=...=p_6=1/6$, but generally the probabilities of different outcomes could be different. The expected value is defined as $$E=p_1x_1+p_2x_2+...+p_nx_n.$$

    It's a sort of idealised average, find out more here. The population variance is defined as

    $$var = p_1(x_1-E)^2 + p_2(x_2-E)^2 +...+ p_n(x_n-E)^2.$$

    The population variance gives you an idea of how spread out the outcomes are likely to be if you repeat the random process a large number of times. You can try and work out for yourself what its value is for a fair die. Conversely, if you don't know the population variance of a process, then the sample variance you get from repeating the process a large number of times can be used to estimate it.

    The positive square root of the variance is called the standard deviation.

    You can also define the variance of an infinite or continuous random variable. See here to find out more.

    • Log in or register to post comments

    Comments

    Jet

    21 December 2017

    Permalink

    An alternative measure of spread is the mean absolute deviation (MAD). It's almost the same as variance, but instead of squaring the differences, you take their absolute value. This has many advantages over variance:

    1. You're now actually finding the average distance from the mean.
    2. Variance can overweight extreme values due to the squaring.
    3. The units of MAD are the same as the original quantity, rather than quantity squared.
    4. MAD is more intuitive for non-statisticians. What does it actually mean for a sample to have a variance of 1 024 000 000 pounds squared?

    Isn't it time statistics adopted MAD as the default measure of spread, rather than the outdated variance?

    • Log in or register to post comments

    Read more about...

    variance
    statistics
    probability theory
    Maths in a minute

    Our Podcast: Maths on the Move

    Our Maths on the Move podcast brings you the latest news from the world of maths, plus interviews and discussions with leading mathematicians and scientists about the maths that is changing our lives.

    Apple Podcasts
    Spotify
    Podbean

    Plus delivered to you

    Keep up to date with Plus by subscribing to our newsletter or following Plus on X or Bluesky.

    University of Cambridge logo

    Plus is part of the family of activities in the Millennium Mathematics Project.
    Copyright © 1997 - 2025. University of Cambridge. All rights reserved.

    Terms