Plus Advent Calendar Door #12: Machine learning and neural networks

We're not anywhere close to the scifi concept of strong artificial intelligence, where a machine can learn any task and react to almost any situation, indistinguishably from a human. But we are surrounded by examples of weak artificial intelligence – such as the speech recognition in our phones – where a machine is trained to do a specific, yet complex, job.

One of the most significant recent developments in weak artificial intelligence is machine learning – where rather than teaching a machine explicitly how to do a complex task (in the sense of a traditional computer program), instead the machine learns directly from the experience of repeatedly doing the task itself.

Advances in engineering and computer science are key to progress in this area. But the real nuts and bolts of machine learning is done with mathematics. A machine learning algorithm boils down to constructing a mathematical function that takes a certain type of input and reliably gives the desired output.

Machine learning algorithms can learn how to tell a picture of a cat from a picture of a dog.

For example, suppose you want a machine that can distinguish between digital images of cats and dogs. An image can be treated as a mathematical object – each pixel is actually represented as a number identifying its colour, and the image as a whole is represented by the long list of the values of the pixels (acting as a vector in some very high-dimensional vector space). The machine then applies a complicated function to this mathematical version of the picture and outputs either "cat" or "dog".

The key idea of machine learning is that a human hasn't specified this mathematical cat-or-dog function, instead the machine learnt what mathematical function was best suited to distinguishing between cats and dogs on its own. And it did this by looking at lots and lots of pictures of cats and dogs (the training data), and using an optimisation process to learn what mathematical function worked best.

We are used to functions having parameters: for example the general equation for a line $f (x) = a x + b$ has the parameters $a$ that gives the slope of the line, and $b$ that tells you where it crosses the vertical axis. Similarly functions resulting from machine learning algorithms have some general structure which is tailored by parameters, but these functions are more complicated and have many, many parameters.

At the start of the learning process the parameters are set to some values, and the algorithm processes the training data, returning the answer "cat" or "dog" for each picture. Initially it is likely to get a lot of these answers wrong, and it'll realise this when it compares its answers to those from the training data. The algorithm then starts tweaking the parameters until eventually it has found values for them that return the correct answers with a high probability.

This learning process is generally done by something called a gradient descent algorithm which trains the machine to do a specific task by tweaking the parameters as the learning machine works through lots and lots of training data. (You can read an introduction to gradient descent algorithms here.)

There are various approaches to machine learning but the most common method is using something called a neural network. This is really a way to structure a complicated mathematical function. Artificial neurons, which themselves are a mathematical function, are arranged in a series of layers, taking a linear sum of the outputs of the neurons in previous layers as an input. Then a neuron applies a nonlinear function to that input and then passes the output of this on to the neurons in the next layer.

The network is then trained by working through lots of training data, and applying a machine learning algorithm that tweaks the parameters in the linear sums that link one layer to the next. With enough training data the parameters are tuned until the neural network has settled on a complicated mathematical function that carries out its given task, such as correctly distinguishing between cats and dogs.

Machines have successfully taught themselves how to beat us at games like chess and Go, rather than us teaching them all our best moves. Machine learning is now part of our everyday life – you use it when you speak to your digital devices, when you click on a recommended product from an online store, or when you use language translation apps and websites. And not only that, machine learning is now playing important roles in medicine, particle physics, and monitoring traffic and analysing tree cover.

Read a more detailed introduction into machine learning with this great collection of articles by Chris Budd (which this article is partly based on). And find out more about the many applications of machine learning on Plus.

Return to the Plus advent calendar 2021.

This article was produced as part of our collaboration with the Isaac Newton Institute for Mathematical Sciences (INI) – you can find all the content from the collaboration here.

The INI is an international research centre and our neighbour here on the University of Cambridge's maths campus. It attracts leading mathematical scientists from all over the world, and is open to all. Visit www.newton.ac.uk to find out more.