Skip to main content
Home
plus.maths.org

Secondary menu

  • My list
  • About Plus
  • Sponsors
  • Subscribe
  • Contact Us
  • Log in
  • Main navigation

  • Home
  • Articles
  • Collections
  • Podcasts
  • Maths in a minute
  • Puzzles
  • Videos
  • Topics and tags
  • For

    • cat icon
      Curiosity
    • newspaper icon
      Media
    • graduation icon
      Education
    • briefcase icon
      Policy

      Popular topics and tags

      Shapes

      • Geometry
      • Vectors and matrices
      • Topology
      • Networks and graph theory
      • Fractals

      Numbers

      • Number theory
      • Arithmetic
      • Prime numbers
      • Fermat's last theorem
      • Cryptography

      Computing and information

      • Quantum computing
      • Complexity
      • Information theory
      • Artificial intelligence and machine learning
      • Algorithm

      Data and probability

      • Statistics
      • Probability and uncertainty
      • Randomness

      Abstract structures

      • Symmetry
      • Algebra and group theory
      • Vectors and matrices

      Physics

      • Fluid dynamics
      • Quantum physics
      • General relativity, gravity and black holes
      • Entropy and thermodynamics
      • String theory and quantum gravity

      Arts, humanities and sport

      • History and philosophy of mathematics
      • Art and Music
      • Language
      • Sport

      Logic, proof and strategy

      • Logic
      • Proof
      • Game theory

      Calculus and analysis

      • Differential equations
      • Calculus

      Towards applications

      • Mathematical modelling
      • Dynamical systems and Chaos

      Applications

      • Medicine and health
      • Epidemiology
      • Biology
      • Economics and finance
      • Engineering and architecture
      • Weather forecasting
      • Climate change

      Understanding of mathematics

      • Public understanding of mathematics
      • Education

      Get your maths quickly

      • Maths in a minute

      Main menu

    • Home
    • Articles
    • Collections
    • Podcasts
    • Maths in a minute
    • Puzzles
    • Videos
    • Topics and tags
    • Audiences

      • cat icon
        Curiosity
      • newspaper icon
        Media
      • graduation icon
        Education
      • briefcase icon
        Policy

      Secondary menu

    • My list
    • About Plus
    • Sponsors
    • Subscribe
    • Contact Us
    • Log in
    • Maths in a minute: Variance

      18 December, 2017
      1 comments

      When faced with a list of numbers we often feel the urge to compute its average to get a rough idea of how big or small those numbers are. What the average doesn't tell us, however, is how spread out the numbers are. As an example, imagine you ask five of your friends what their yearly salaries are and you get the answers

      Two populations with the same mean and different variances.

      Here are two populations with the same average (100) and different variances. You can think of the horizontal axis measuring salaries and the vertical axis of measuring how many people in a population receive the corresponding salary. The red curve has a smaller variance because the salaries are not as spread out as they are for the blue one.

      20000$$20000$$20000$$20000$$100000.$ The average of this set of numbers is (4×20000+100000)/5=36000, but knowing this number doesn't give you any indication of the fact that there's one salary that's a lot bigger than the other four.

      To get a sense of how spread out a data set is, you can use something called its sample variance. First, work out the difference between each number on the list and the average of the list and then square that difference (the reason we square it is that we're only interested in the size of the difference and not whether it's positive or negative). Now take the average of these squared differences. In our example above, this gives us the following sample variance:

      (4×(36000−20000)2+(36000−100000)2)/5=(4×160002+640002)/5=1024000000.

      That's a very large number, which indicates that all of our five numbers lie quite far away from their average: an indication that the data set is very spread out. By contrast, look at the set

      20000$$20000$$20000$$20000$$20001.$ The average is (4×20000+20001)/5=100001/5=20000.2. The sample variance in this case is (4×(20000.2−20000)2+(20000.2−20001)2)/5=0.16. That's a small number, indicating that the set of numbers isn't very spread out at all. Here's the formal definition of the sample variance v of a list x1, x2, x3, ..., xn of n numbers whose average is x¯: v=1n((x1−x¯)2+(x2−x¯)2+...+(xn−x¯)2). This definition works for a given list of numbers, but there's also a definition of variance that works when you are dealing with a random process, such as rolling dice, and would like to know how spread out a list of outcomes of the process is likely to be. Suppose your random process has n outcomes which we label x1, x2, x3, etc, up to xn. If you're rolling a die, then n=6 and x1=1, x2=2, ... ,x6=6. Also suppose you know the probability p1, p2, etc, up to pn of each outcome. In the case of a fair die we have p1=p2=...=p6=1/6, but generally the probabilities of different outcomes could be different. The expected value is defined as E=p1x1+p2x2+...+pnxn.

      It's a sort of idealised average, find out more here. The population variance is defined as

      var=p1(x1−E)2+p2(x2−E)2+...+pn(xn−E)2.

      The population variance gives you an idea of how spread out the outcomes are likely to be if you repeat the random process a large number of times. You can try and work out for yourself what its value is for a fair die. Conversely, if you don't know the population variance of a process, then the sample variance you get from repeating the process a large number of times can be used to estimate it.

      The positive square root of the variance is called the standard deviation.

      You can also define the variance of an infinite or continuous random variable. See here to find out more.

      • Log in or register to post comments

      Comments

      Jet

      21 December 2017

      Permalink

      An alternative measure of spread is the mean absolute deviation (MAD). It's almost the same as variance, but instead of squaring the differences, you take their absolute value. This has many advantages over variance:

      1. You're now actually finding the average distance from the mean.
      2. Variance can overweight extreme values due to the squaring.
      3. The units of MAD are the same as the original quantity, rather than quantity squared.
      4. MAD is more intuitive for non-statisticians. What does it actually mean for a sample to have a variance of 1 024 000 000 pounds squared?

      Isn't it time statistics adopted MAD as the default measure of spread, rather than the outdated variance?

      • Log in or register to post comments

      Read more about...

      variance
      statistics
      probability theory
      Maths in a minute
      University of Cambridge logo

      Plus Magazine is part of the family of activities in the Millennium Mathematics Project.
      Copyright © 1997 - 2025. University of Cambridge. All rights reserved.

      Terms