The mathematics of diseases

Matthew Keeling Share this page

The mathematics of diseases

Matthew Keeling
Back to the Epidemiology for beginners collection.

Diseases are a ubiquitous part of human life. Many, such as the common cold, have minor symptoms and are purely an annoyance; but others, such as Ebola or AIDS, fill us with dread. It is the unseen and seemingly unpredictable nature of diseases, infecting some individuals while others escape, that has gripped our imagination. From prehistory to the present day, diseases have been a source of fear and superstition. Over the past one hundred years, mathematics has been used to understand and predict the spread of diseases, relating important public-health questions to basic infection parameters. Here, we shall review the simplest of disease models and consider some of the more mathematical developments that have improved our understanding and predictive ability.

The mathematics of diseases is, of course, a data-driven subject. Although some purely theoretical work has been done, the key element in this field of research is being able to link mathematical models and data. Case reports from doctors provides us with one of the most detailed sources of biological data; we often know the number of weekly disease cases for a variety of communities over many decades. This data also contains the signature of social effects, such as changes in birth rate or the increased mixing rates during school terms. Therefore, a comprehensive picture of disease dynamics requires a variety of mathematical tools, from model creation to solving differential equations to statistical analysis.

Basic Models

Almost all mathematical models of diseases start from the same basic premise: that the population can be subdivided into a set of distinct classes, dependent upon their experience with respect to the disease. The most simple of these models classifies individuals as one of susceptible, infectious or recovered. This is termed the SIR model. Individuals are born into the susceptible class. Susceptible individuals have never come into contact with the disease and are able to catch the disease, after which they move into the infectious class. Infectious individuals spread the disease to susceptibles, and remain in the infectious class for a given period of time (the infectious period) before moving into the recovered class. Finally, individuals in the recovered class are assumed to be immune for life.

Movement rates between classes of the SIR model

Movement rates between classes of the SIR model

We can make this description more mathematical by formulating a differential equation for the proportion of individuals in each class (the equations are shown at the end of the article). Computer simulations of this mathematical model agree well with mathematical theory, predicting decaying oscillations (you might want to compare this with the damped oscillations observed in a spring). Therefore, although initially this model shows large epidemics occurring at regular intervals, eventually the level of the disease reaches a constant value.

[IMAGE: graph of damped oscillations]

Damped oscillations of the SIR model

The Epidemiological Parameter R0

Many interesting and useful results have been proved for the simple SIR model, but before we can explore this rich subject area, we need a further bit of Epidemiological notation. One fundamental parameter governs the spread of diseases, and is also related to the long term behaviour and the level of vaccination necessary for eradication. This parameter is called the basic reproductive ratio, $R_0$. $R_0$ is defined by epidemiologists as "the average number of secondary cases caused by an infectious individual in a totally susceptible population". As such $R_0$ tells us about the initial rate of increase of the disease over a generation. When $R_0$ is greater than 1, the disease can enter a totally susceptible population and the number of cases will increase, whereas when $R_0$ is less than 1, the disease will always fail to spread. Therefore, in its simplest form $R_0$ tells us whether a population is at risk from a given disease.

The value of R0 for some well-known diseases
Disease R0
AIDS 2 to 5
Smallpox 3 to 5
Measles 16 to 18
Malaria > 100

[IMAGE: graphs for measles and plague]

The fit between cases and the SIR epidemic for bubonic plague and measles

A second use for $R_0$ is looking at the behaviour of a single epidemic outbreak. Consider the situation when a new strain of influenza enters a totally susceptible population. Simple intuition tells us that the disease will spread rapidly through the population, infecting a large proportion of the population in a very short time. It is therefore plausible to ignore births and deaths in the population and solely concentrate on the disease dynamics. For these short-term epidemics, initially the number of cases increases exponentially ($I(t) \propto \exp[(R_0-1)t]$). However, as more of the population enters the recovered class and there are fewer susceptibles, the disease spreads less well and eventually the number of cases declines. Due to this decline not everyone will be infected before the disease dies out. By looking at the long-term behaviour of the SIR model, Kermack and McKendrick were able to predict the proportion of individuals ($S_{\infty}$) who would escape the infection, \[S_{\infty} = \exp ([S_{\infty}-1] R_0).\]

[IMAGE: graph of S_\infty]

A graphical method for calculating the percentage that escape infection

Although this expression cannot be evaluated analytically, by examining the two sides graphically (plotting $S_{\infty}$ and $\exp ([1-S_{\infty}] R_0)$ on the same graph) it is clear that as $R_0$ increases, fewer individuals escape the disease. Calculating $S_{\infty}$ numerically, when $R_0=2$ we find that $S_{\infty}$ is approximately $20\%$, whereas when $R_0=5$ we get $S_{\infty}$ is approximately $0.7\%$. Therefore, increasing $R_0$ has a dramatic effect on the proportion that escape the outbreak. \par Finally, if we wish to model a disease that is endemic, that is, persists indefinitely in the population, our SIR model must also include births to replenish the level of susceptibles. In this case the long term behaviour of the disease can again be related to the parameter $R_0$. The long-term proportion of susceptible individuals in the population, once the oscillations have died away, is given by \[ S^{\star}= 1/R_0.\] Therefore, those diseases that spread the most rapidly have the fewest susceptible individuals. (It is interesting to note that the long-term level of infection, $I^{\star}$, does not depend on the parameter $R_0$, but instead is dependent on the birth rate and the infectious period.) \par The concept behind vaccination is to reduce the proportion of susceptibles until the disease cannot survive. At the long-term level of susceptibles, $S^{\star}$, each infectious individual on average causes one further secondary case. (If infectious individuals causes more or less than one case, then the level of infection would either rise or fall and the disease wouldn't be stable.) Therefore, if we can reduce the number of susceptibles even further, so that the disease does less well, we can begin to eradicate the disease. The threshold level of vaccination ($V_T$) necessary to eradicate the disease is therefore \[ V_T = 1- S^{\star} = 1 - 1/R_0.\] \par It should now be clear why vaccination has allowed us to completely eradicate smallpox ($R_0$ is approximately 4), whereas there are still cases of measles in Britain and the USA ($R_0$ is approximately 17) despite mass vaccination, and why it is so very difficult to control malaria ($R_0 >100$). It is important to realize that we don't need to vaccinate everybody to eradicate a disease; by a process known as herd immunity, for each person that is vaccinated the risk of infection for the rest of the community decreases. Therefore vaccination does not just protect the individual, but also offers some protection to the whole community.

Foot and Mouth Disease

[IMAGE: incinerating cattle]

Incinerating cattle to help contain the spread of foot and mouth disease

Recently there has been a great deal of attention focused on the spread of foot and mouth disease in the UK. This gives the mathematical modeller a prime opportunity to put all of the above theory into practice.

Foot and mouth is a disease of cattle, pigs, sheep and some other livestock, although fortunately not humans. It is common in areas of Africa and Asia, but it has been over 30 years since the last major outbreak in Britain. Foot and mouth spreads very rapidly and can be transmitted either by close contact within farms or at markets, or it can be wind-borne over much longer distances. In cattle and pigs the disease has disastrous consequences, and therefore modelling and understanding its spread is economically very important.

Foot and mouth disease can again be described by a simple SIR model. However, because its spread within a farm is so rapid, most models classify the entire farm as either susceptible, infectious or recovered. These farm-level models have an $R_0$ of around 50. Control of the disease is a difficult problem - very stringent measures need to be taken to overcome this large value of $R_0$. Vaccination is not a useful policy as it only offers partial protection and must be repeated every 4-6 months. Instead, by destroying all infected animals and limiting the movement of livestock it is hoped that the transmission between farms can be reduced sufficiently that the disease will die out.

[IMAGE: foot and mouth disease virus]

Computer-generated image of the foot and mouth virus


The theory outlined here has only scratched the surface of the research done into epidemic spread and persistence. The study of diseases has been such a successful application of mathematical theory because most diseases conform to the assumptions behind the simple models. However, many complications have been introduced into the SIR-type models which allow them to better capture the observed dynamics and answer more applied questions. The following are a list of practical issues that have been implemented into SIR-type models.

  • Many diseases, such as measles or chickenpox, are primarily disease of children. By further subdividing the population into differing age-classes researchers have been able to capture age-structured transmission in more detail.

  • For such childhood infections there is greater mixing (the contact rate is larger) during school terms. Such seasonal forcing leads to regular epidemics or more complex dynamics, as the disease oscillates between the high-contact and low-contact solutions.

  • When modelling the spread of HIV it is vital to subdivide the population by sexual orientation and drug use.

  • For some diseases other organisms are involved in the transmission, e.g. the mosquito is essential for transmission of malaria, and together rats and fleas are responsible for the majority of bubonic plague cases. For such diseases we need to couple an SIR model for humans with an SIR model for the other organisms.

The Equations

For those interested in a bit more detail, the mathematical equations which describe the proportion of the population in the three classes are \begin{eqnarray*} \frac{dS}{dt} &= B - \beta SI - dS;\\ \frac{dI}{dt} &= \beta SI - gI - dI;\\ \frac{dR}{dt} &= gI - dR;\\ R_0 &= \frac{\beta}{g}. \end{eqnarray*} Here $B$ is the birth rate, $d$ is the death rate, $1/g$ is the infectious period, and $\beta$ is the contact rate. In the majority of models, the birth and death rates are assumed to be equal, so that the population size remains constant.

About the author

Matt Keeling is now a Professor at the University of Warwick where he is also Director of the Zeeman Institute for Systems Biology & Infectious Disease Epidemiology Research. He also co-leads the JUNIPER network.

He wrote this article back in 2001 when he was a Royal Society University Research Fellow working in the Zoology Department at Cambridge University. It remains one of our most useful articles on disease modelling.



i have look your sir model thing, this is really useful for us. thanks


This was very interesting as I am studying a degree and looking ahead for jobs to apply mathematics to real-world problems that help people, but am not interested in making profit for business.

Do you think these sorts of things you discuss would be better approached with a mathematics-only degree, a joint maths and stats degree, or a joint degree of computing with either maths or stats?


To Matt Keeling,
I wish you had your email on your article.
With disease, there is the infectious period, and then each person, or other being, has their own immune system, which may or may not be able to deal with the disease.
Some disease, such as Epidemic Pleurodynia (Coxsackie Virus), are very infectious, however, once you get the illness, one gets life long immunity. The descriptions of Epidemic Pleurodynia are not so good, in reality, some people get this illness for a few days, while some get it for a few months. By taking indocin, most of the symptoms go away within the hour, however, they may return every few days to 10 days, for up to 3 even 4 months, and of course, taking the indocin will again relieve the symptoms. Then, when the disease finally leaves, the person is immune for the rest of their life. The Epstein barr virus, causes mono, and it may come last for a few months and then the immune system will kill it all. However, in some people, they get chronic epstein barr virus and they severe fatigue and weakness for the rest of their lives. As a physician, I have been in a typhoid epidemic and a cholera epidemic. Also have been in an epidemic of Epidemic Pleurodynia. Some patients get sub clinical cases of the disease, thus they get the disease, but have no symptoms. Sometimes these people can spread the disease, and not even know it (Typhoid mary). There is a famous woman in the LA area that is immune to AIDs, she gets the aids virus, and it has been proven, then in a few weeks, it is completely gone, and she is always asymptomatic.
There is a push for immune boosting agents, such as Dr Doug Brodie had developed and used with great service. See the Cancer Control Society web page. If one can get the immune system into high gear, a patient's own immune system can cure almost any illness.
It would be nice to have a model of which was easily set up for people, so that they can plug in some numbers and then take a look at the epidemic potential. To my understanding, the bubonic plague, which is endemic to the US South West, and I have seen patients with it, is not transferable from human to human. When they get the pneumonic plague, then a cough can spread the yersinia pestis via droplets and it is the pneumonic plague that can kill in hours.


What does the variable "t" mean in these formulas? I am a 5th grader researching this for a school project.


is there anything in mathematics, which can help predict who will and can end up in hospital. and following on from this, looking at populations can we use this predictive work to reduce hospital admissions and therefore save money?


What kind of equation would one use to create a 1-10 scale indicating the importance ranking of a certain disease, and/or disease causing pathogen? This equation should account for factors such as the mortality rate of the disease, as well as the incidence, and the degree of resistance of the involved pathogen to various treatments.


I am in eighth grade, and am doing a school project.
Could someone provide some more information on this topic or direct me to a another helpful website? I am able to understand up to algebra 2 and some basic precalculus, so nothing above calculus, please!

Help is appreciated
Thank You


Hello there. O am writing an essay on the spread of zombies and using the SIR model. I'm still confused on how to get the transmission coefficient/contact rate. What is it based on? How can i get it?


I most commonly see 'I' used for charge in physics, however that wouldn't make sense in this case. It can't be the infectious period because that's already represented as 1/g. I'm giving a presentation on this in AP calculus in 3 days and this is by far the best source I've found, however all of it is currently void due to that single variable.


I just spent the last couple hours looking at your equations from all kinds of angles and I finally found a correct equation that shows your graph and numbers. It should be S-1 in the brackets, not vice versa how you have it written.

I've only looked at this Web page this week, and I agree with you. The equation for S(infinity) as displayed is missing a minus sign, or alternatively should be changed as you suggest. The legend on the graph following for the y-axis correctly includes the minus sign.


how do you find the value of the contact rate?


Hello. I am doing my thesis on dynamical system approach for childhood disease which is measles and I am using SIR model. Thank you for the interesting information.


I understand how these models can work for most traditional diseases, where infected individuals are no longer contagious after some period, and are generally immune, themselves. However, for a condition such as AIDS, where infected individuals are permanently infected (and without intervention, permanently contagious), what happens to the "gI" term in the equation for dI/dt (R being nonexistent).

If one were modelling HIV/AIDS, would they want to do something different with g?

Furthermore, in the equation for dS/dt, if beta is multiplied by S and I, shouldn't B be multiplied by the total population, and dS be multiplied by S?

Any clarification would be much appreciated. Thanks.


Have you considered updating this page or creating a new one specifically about COVID-19 and the issues that are arising?


How can you account for the fact that different infected people might transmit the disease to the same susceptible individual within the same time-step? Early on in an outbreak this would not be that much of a factor, but if the sizes of the susceptible and infected groups are similar later on, would there not be some overlap between transmissions? I don't think that Beta*I*S accounts for this, and I don't want to overestimate the number of infected individuals in my model.


Your equation S(infinity) as displayed (on my computer, at least) is incorrect, there is a minus sign missing. Immediately below the equation you have a plot which correctly includes the minus sign on the label for the y-axis.

It should ready S(infinity) = exp( (S(infinity) - 1)R(zero)) I believe.