"We all appreciate the value of forests in combating climate change. They are important because they store carbon, and if we chop them down then that carbon is released into the atmosphere. Forests are also a home for biodiversity. Having natural forests is good for many organisms."
These are the words of David Coomes, an ecologist at the University of Cambridge, who currently co-leads INTEGRAL, a project which aims to understand the diversity of forests in India. But most of the researchers involved in this project aren't ecologists or conservationists, they are mathematicians. That's because artificial intelligence (AI) will be crucial in assessing biodiversity—but the algorithms that are needed push the boundaries of AI, so new mathematical techniques are required to develop them.
Mapping the world
Listen to our podcast featuring the INTEGRAL team!
And this is not all: modern life produces an abundance of images, taken by anything from traffic cameras to satellites, containing information that could never be extracted by humans alone. "Remote sensing, [understanding an area through images], is a major means for mapping our world," says mathematician Carola-Bibiane Schönlieb who co-leads INTEGRAL. "This data by itself is useless if we do not have the means to analyse it, to extract the information from it that we are interested in."
The mathematical techniques that make remote sensing possible can be used in any context, whether the object you'd like to recognise is a tree in an aerial photograph, a vehicle in an image from a traffic camera, or a tumour in a medical scan. This synergy is something the INTEGRAL project exploits. While India's ever expanding cities threaten its forests, the vast amounts of traffic within those cities also threaten human health. Another strand of the project is to use data from traffic cameras to assess the composition of traffic in those cities to inform the decisions of planning authorities — you can read more about the traffic part of the project in this article.
Teaching machines to learn
The kind of artificial intelligence being employed by the INTEGRAL team is called machine learning. This involves an algorithm learning to spot patterns in a data set that correspond to structures hiding inside that data set. You can read a more detailed introduction to machine learning in this article. In the case of remote sensing, the data sets are images (which, in a computer, are represented as arrays of numbers) and the patterns indicate whether the image depicts a particular type of tree or, in the case of traffic, a particular vehicle.
The trouble with machine learning in its simplest form is that it needs to learn from a set of training data which is already labelled with the correct answer, for example whether it's a mango tree or a palm tree. But the task of providing such annotated data alone already requires a lot of expensive human input.
Some of the members of the INTEGRAL team. From top left: Carola-Bibiane Schönlieb, James Woodcock, Angelica Aviles-Rivero, Saurabh Pandey, Sanjay Bisht, Debmita Bandyopadhyay, Rihuan Ke, David Coomes.
"State of the art AI approaches come with a price," explains Schönlieb. "They need a lot of very high quality annotated data to be trained on. In applications where we are dealing with real data such annotation is very costly and time consuming to obtain, either because expert knowledge is required to do the annotations, and/or because there is a lot of manual work involved in collecting the data on the ground or sitting in front of a computer doing the annotations. This is where the mathematical motivation of INTEGRAL comes in."
To deal with this challenge the INTEGRAL team are developing so-called semi-supervised learning techniques. Here algorithms make maximal use of information inherent in the training data to make do with a much smaller amount of annotated data. It seems like magic, but it does work. (You can find out more about semi-supervised learning in this short introduction.)
"It's a very new technique we are applying to real time data sets,” says INTEGRAL member Debmita Bandyopadhyay of the University of Cambridge. "[This is] is a huge challenge where a lot of misinterpretation can take place: the forests in India are mixed forests so from one [image] pixel to the other species can change. So we are facing challenges, but we are reaching there."
Connecting the world
A key part of the INTEGRAL project is the collaboration between India and the UK. Apart from scientists and mathematicians at the University of Cambridge, the project comprises a range of organisations, including the environmental advisory group IORA Ecological Solutions, Forest Survey of India, Indian Institute of Technology Delhi, and the Indian technology company Kritikal Solutions.
Experts on the ground in India play an important role. "To train the AI [still requires] some human [input]," says Saurabh Pandey from KritiKal Solutions. "We can say whether the red patch you see [in an image] is Mango trees, or something else entirely. The sensors will tell you the colour of a particular pattern and the data that we collect will tell you that the pattern belongs to a particular species."
"We're at a very exciting moment, where we have the field data, the [images] from aircraft , and Debmita working very hard to test the methods we have developed on these data sets," says Coomes.
"Once we have got these classifications working there are all sorts of opportunities to apply them elsewhere. The UN has declared a decade of forest restoration around the world, so there's a huge appetite for these species maps, which are based on INTEGRAL work."
Further reading
To find out more about traffic strand of INTEGRAL, see Seeing traffic through new eyes. To hear from the INTEGRAL team listen to our podcast.
For more about machine learning see