Seeing traffic through new eyes

Roads are arteries of modern life, but traffic comes with problems. It's bad for the environment and for our climate, and it can also be bad for our health. Road accidents cause deaths and injuries, pollution causes lung and respiratory diseases, as well as cancers, and the noise and stress of traffic can impact people's mental health.

Good planning and effective urban policies can help reduce these risks, but to make a good strategy you first need to know what kind of vehicles — bikes, cars, tuk tuks, trucks — are on the road, in what proportion, and how they tend to behave. Such data is difficult and expensive to collect. Especially in places where cities are expanding rapidly, such as India, very little data about traffic is available.

Some of the inter-disciplinary INTEGRAL team and partners. From top left: Carola-Bibiane Schönlieb, Rahul Goel, Kelly Kokka, Rihuan Ke, Angelica Aviles-Rivero, Sanjay Kumar, James Woodcock.

"There is a growing awareness of the adverse impacts of motor vehicles on human health and the benefits that use of walking and cycling can achieve, however, data on usage of different modes of transport are rarely available for the cities," says public health expert James Woodcock from the MRC Epidemiology Unit at the University of Cambridge. "The lack of data is of greater concern in low-and-middle income countries that are witnessing an exponential growth in motor vehicles."

The challenge has inspired mathematicians and other scientists at the University of Cambridge, including Woodcock, to form INTEGRAL which stands for INdia remoTE ImaGery anALysis. One of the aims of this project is to look at traffic in some of India's major cities through new eyes: to teach computers to understand photos and videos taken by Google Street View, Google Earth and traffic cameras. "We want to develop an open-source tool that can help relevant stakeholders in India (and eventually world-wide) to quantify traffic volume and determine traffic type in cities," says INTEGRAL co-Head Carola-Bibiane Schönlieb.

Pooling expertise

It's the artificial intelligence aspect of the project — getting computers to understand image data — that requires the input from mathematicians such as Schönlieb, and turns INTEGRAL into a truly interdisciplinary project. Along with Schönlieb and Woodcock it comprises scientists from the Cambridge Department of Plant Sciences and Cambridge Public Health. There are also a range of industrial partners, including C40, which connects major cities around the world to take action on climate change, the Indian Institute of Technology Delhi, Cambridge Global Challenges, the German Aerospace Centre, Indian technology company Kritikal Solutions and the environmental advisory group IORA.

"I enjoy working at the interface of machine learning and public health," says INTEGRAL member Kelly Kokka. "Its global application and the potential to fill a major data gap is hugely satisfying! Also it's exciting to collaborate with such an amazing team and learn from each other."

Together with her colleagues, Kokka has conducted some proof of concept work, to show that the idea of using publicly available image data to assess traffic does work. "From our earlier work using Google Street View images, we found that images of people on the road in different modes of transport are a strong predictor of overall traffic patterns in the cities," explains INTEGRAL member Rahul Goel. "The ability to predict this information using globally available imagery data sources is a great step towards understanding travel patterns especially in places where surveys are not available."

Learning to see

The tricky part of the project will be to get machines to recognise different types of vehicles, cyclists and pedestrians in images and videos. We humans are very good at processing visual information. We don't have trouble telling apart two cars of the same colour, recognising a bus even when it's partly hidden by a tree, or noting that the cyclist that was here before we blinked has now moved over there.

A computer doesn't have these instinctive abilities. To a computer an image is just an array of numbers, each encoding the colour of the corresponding pixel. The INTEGRAL team are developing algorithms that will enable a computer to examine each pixel and its neighbours, and then allocate the pixel to one of a number of classes, such as "car", "truck", "background", "road", etc. The result will be an image in which pixels belonging to the same class are given the same colour, as shown below. Processing an image in this way is called semantic segmentation. Once an image, or video, has been segmented, it's an easy task to count the number and types of vehicles in it.

This is an example of a video that has been segmented. The top shows the raw video footage and the bottom shows the segmented version, where features of the same kind are given the same colour.

To get a computer algorithm to correctly perform such semantic segmentation, the INTEGRAL team are pushing the boundaries of machine learning. This area of artificial intelligence involves a computer algorithm "learning" how to spot patterns within data that correspond to a particular feature, for example, to spot the patterns in the pixels of an image that mean the image depicts a car.

In its simplest guise, machine learning requires lots of training data — for example, to "teach" the algorithm to differentiate pictures from cars from pictures of trucks, you would first give it a large number of pictures of cars and trucks and also tell it which shows a car and which shows a truck. The algorithm will analyse the pixel values of each image and return the output "it's a car" or "it's a truck" based on the patterns it finds. It will then compare its output to the correct answer — and if it turns out that the output was wrong a lot of the time, it will tune some internal parameters to see if it gets a better result. Eventually, the parameters will be just right to get the correct answer almost all of the time. (To find out more about this type of machine learning, see this article).

Learning with little teaching

That an algorithm can learn at all is quite amazing, but it gets better still. This sort of supervised learning requires a lot of training images which all have to be labelled by a human. For the purpose of semantic segmentation this doesn't involve just looking at the image and saying "it's a car" or "it's a truck", but looking at each individual pixel and deciding what class it belongs to. What's more, when video footage is involved, a human needs to label a video frame by frame. That's far too time consuming and expensive a task.

This is why the INTEGRAL team are pursuing a new approach, using something called semi-supervised learning. Here you hand-label only a very small collection of images (or video frames). A clever algorithm then attaches labels to a large number of previously unlabelled images, by squeezing as much statistical information as possible out of the set of images. The now much larger set of labelled images can then be used as training data. (See this short introduction for more on semi-supervised learning.)

"Our framework is based on a holistic principle that leverages several semi-supervised learning techniques for a more meaningful prediction on the unlabelled part of the data," explains INTEGRAL member Rihuan Ke. "It significantly improves the accuracy of existing approaches and allows to learn from very limited manual annotations."

The work the INTEGRAL team have done so far is only the beginning. "[Ultimately] the goal is to develop innovative, efficient, robust and generalisable tools for the analysis of complex urban level video scenes," says Angelica Aviles-Rivero, another member of INTEGRAL. "These tools will provide an easier way to analyse vast amounts of data in an [incredibly] short period of time."

And this isn't all. While India's cities are expanding, the biodiversity of its forest is potentially in danger. Another strand of the INTEGRAL project will use the kind of methods developed for understanding footage of traffic to understand satellite images of forests. The purpose is to count different species of trees and thereby understand the state of India's biodiversity (you can read more about this strand of the project in this article).

Once a machine has learnt how to analyse images and videos reliably, there is almost no limit to the uses you can put it to.

About this article

Marianne Freiberger is Editor of Plus. She spoke to the INTEGRAL team in December 2020.

This article now forms part of our coverage of a major research programme on deep learning held at the Isaac Newton Institute for Mathematical Sciences (INI) in Cambridge. The INI is an international research centre and our neighbour here on the University of Cambridge's maths campus. It attracts leading mathematical scientists from all over the world, and is open to all. Visit www.newton.ac.uk to find out more.