Skip to main content
Home
plus.maths.org

Secondary menu

  • My list
  • About Plus
  • Sponsors
  • Subscribe
  • Contact Us
  • Log in
  • Main navigation

  • Home
  • Articles
  • Collections
  • Podcasts
  • Maths in a minute
  • Puzzles
  • Videos
  • Topics and tags
  • For

    • cat icon
      Curiosity
    • newspaper icon
      Media
    • graduation icon
      Education
    • briefcase icon
      Policy

    Popular topics and tags

    Shapes

    • Geometry
    • Vectors and matrices
    • Topology
    • Networks and graph theory
    • Fractals

    Numbers

    • Number theory
    • Arithmetic
    • Prime numbers
    • Fermat's last theorem
    • Cryptography

    Computing and information

    • Quantum computing
    • Complexity
    • Information theory
    • Artificial intelligence and machine learning
    • Algorithm

    Data and probability

    • Statistics
    • Probability and uncertainty
    • Randomness

    Abstract structures

    • Symmetry
    • Algebra and group theory
    • Vectors and matrices

    Physics

    • Fluid dynamics
    • Quantum physics
    • General relativity, gravity and black holes
    • Entropy and thermodynamics
    • String theory and quantum gravity

    Arts, humanities and sport

    • History and philosophy of mathematics
    • Art and Music
    • Language
    • Sport

    Logic, proof and strategy

    • Logic
    • Proof
    • Game theory

    Calculus and analysis

    • Differential equations
    • Calculus

    Towards applications

    • Mathematical modelling
    • Dynamical systems and Chaos

    Applications

    • Medicine and health
    • Epidemiology
    • Biology
    • Economics and finance
    • Engineering and architecture
    • Weather forecasting
    • Climate change

    Understanding of mathematics

    • Public understanding of mathematics
    • Education

    Get your maths quickly

    • Maths in a minute

    Main menu

  • Home
  • Articles
  • Collections
  • Podcasts
  • Maths in a minute
  • Puzzles
  • Videos
  • Topics and tags
  • Audiences

    • cat icon
      Curiosity
    • newspaper icon
      Media
    • graduation icon
      Education
    • briefcase icon
      Policy

    Secondary menu

  • My list
  • About Plus
  • Sponsors
  • Subscribe
  • Contact Us
  • Log in
  • Maths in a minute: Linear regression

    by
    Wim Hordijk
    20 March, 2017

    A linear regression tries to estimate a linear relationship that best fits a given set of data. For example, we might want to find out how the number of tropical storms has changed over the years. In this case, we can plot the number of storms against time. The linear regression will find the straight line that best fits the plotted data, and calculate several statistics indicating how well the line fits the data and whether the slope of the line is significantly different from zero (i.e., whether there is a real trend or not). (See this article for more on the tropical storm example.)

    Storms

    Yearly tropical storms. The blue line indicates the result of a linear regression on the number of storms over time.

    In general, when you perform a linear regression, a dependent variable (say $y$, the number of tropical storms in our example) is assumed to be linearly dependent on one or more explanatory variables (say $x_1$, $x_2$, etc, up to $x_n$). The general equation for the linear relationship is $$y = a_0+a_1x_1+a_2x_2+ ...+ a_nx_n.$$ Given a set of observed values for the dependent variable $y$ and corresponding explanatory variables $x_1$ to $x_n$, the linear regression then estimates values for the model parameters $a_0$ to $a_n$ such that the total error (i.e., the differences between the observed values $y$ and the model predicted values $\hat{y} = a_0+a_1x_1+a_2x_2+ ... +a_nx_n$) is minimised.

    For example, in the case of tropical storms the dependent variable is the number of storms each year (hurricanes), and the (single) explanatory variable is time (year). For a set of 18 observations from the Monthly Storm Reports, one for each year from 1999 to 2016, a linear regression results in the following estimated model:

    $$hurricanes = -637.9+0.3323 \times year.$$ In this data set, the observed number of hurricanes for the year 2001 is 29. However, the model predicted value for this year is $-637.9+0.3323\times2001 = 27.03$. In other words, the error is $29-27.03 = 1.97$. In the model estimation, the values of the parameters were chosen in such a way that the total error (summed over all years) is minimised (in reality, a linear regression actually minimises the sum of squared errors).
    A graph

    Linear regression can help you spot trends in your data.

    A statistic that is often used to indicate how well the estimated model fits the given data is the coefficient of determination, denoted $R^2$, which measures the proportion of the variance in the dependent variable that is actually explained by the explanatory variable(s). This value is usually on a scale from zero to one, with zero indicating no explanatory value (i.e., complete unpredictability) and one indicating full explanatory value (i.e., complete predictability). The $R^2$ value in the regression performed here is $R^2=0.11, which is low. This indicates that the estimated models have very little explanatory power, and that the data is mostly random rather than having a linear dependence on time.

    Finally, a linear regression analysis also tests the null hypothesis that the slope of the regression line is zero (i.e., that there is actually no dependence of the dependent variable on the explanatory variables). The statistic calculated to decide whether to accept or reject this null hypothesis is the p-value. This statistic (a value between zero and one) indicates the probability that the data is the way it is under the assumption that there is no dependence. Another way to put this is to say that the p-value indicates the probability of making a mistake when rejecting the null hypothesis (i.e., the probability of rejecting a true hypothesis).

    A standard threshold (or significance level) used for the p-value is 0.01, or a 1% probability of rejecting a true hypothesis. This means that any p-value that is larger than 0.01 is not considered enough statistical evidence to reject the null hypothesis. Sometimes a more "forgiving" significance level of 0.05 (or 5%) is used, but the main idea is that the higher the p-value of the regression, the more likely it is that the slope of the linear model is not significantly different from zero. (You can find out more about p-values and significance levels in this article.)

    Do it yourself

    If you'd like to perform your own linear regression, you might want to use the R program. For example, for the hurricane data (after extracting the number of storms for each year from the public database), the regression can be performed with the following R commands:

    year-c(1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016)
    hurricanes-c(20,32,29,24,29,27,42,25,23,32,26,26,30,36,31,28,30,36)
    myModel-lm(hurricanes ~ year)
    summary(myModel)
    predict(myModel)

    The summary command prints out a summary of the linear regression, including the estimated values for the model parameters, the $R^2$ value and the p-value, and several other statistics. The predict command provides the model predicted values (with which the blue line in the figure above was plotted).


    About the author

    Wim Hordijk

    Wim Hordijk is a computer scientist currently on a fellowship at the Konrad Lorenz Institute in Klosterneuburg, Austria. He has worked on many research and computing projects all over the world, mostly focusing on questions related to evolution and the origin of life. More information about his research can be found on his website.

    • Log in or register to post comments

    Read more about...

    Maths in a minute
    linear regression
    statistics

    Our Podcast: Maths on the Move

    Our Maths on the Move podcast brings you the latest news from the world of maths, plus interviews and discussions with leading mathematicians and scientists about the maths that is changing our lives.

    Apple Podcasts
    Spotify
    Podbean

    Plus delivered to you

    Keep up to date with Plus by subscribing to our newsletter or following Plus on X or Bluesky.

    University of Cambridge logo

    Plus is part of the family of activities in the Millennium Mathematics Project.
    Copyright © 1997 - 2025. University of Cambridge. All rights reserved.

    Terms