This article is part of a series celebrating the 20th birthday of the Isaac Newton Institute in Cambridge. The Institute is a place where leading mathematicians from around the world can come together for weeks or months at a time to indulge in what they like doing best: thinking about maths and exchanging ideas without the distractions and duties that come with their normal working lives. And as you'll see in our articles, what starts out as abstract mathematics scribbled on the back of a napkin can have a major impact in the real world.
In 1997 the Isaac Newton Institute hosted a programme on neural networks and machine learning (NNM). Organised by Christopher M. Bishop (currently Distinguished Scientist at Microsoft Research, Cambridge), the programme attracted over 180 participants and was the largest international gathering of its kind at the time. It has since been hailed a landmark event.
Mimicking the brain
Artificial neural networks grew out of researchers' attempts to mimick the human brain.
The neural networks and machine learning programme took place at a time when the field found itself at a crucial junction. Since the mid-1980s, researchers' efforts to build intelligent machines had focused on trying to mimic the brain's vastly complex network of billions of individual neurons. In the much smaller artificial neural networks, the "neurons" are mini processing units that can receive information and transform it according to a set of mathematical rules. A set of input data, say an image of some hand-written text on a page, is broken up and coded into mathematically digestible pieces, which then flow through the network, being transformed on their paths from neuron to neuron, and eventually emerge as an output, for example a transcription of the hand-written text into ASCII characters.
A crucial feature of artificial neural networks is their ability to learn. When presented with an example set of input-output data, say a set of hand-written pages and their correct transcriptions, the artificial network can compare its own outputs with the desired ones. If its own outputs are not good enough, it can adjust the parameters that govern its mathematical transformations until it achieves satisfactory results. Using this automated learning-by-example process, artificial neural networks can learn to recognise and classify patterns they have never seen before.
Vast amounts of training data are essential for the learning process, but large data sets tend to come with quite a lot of "noise": errors and variability. In the 1990s it became clear that the future of neural networks hinged not so much on neurobiology, but on their ability to make the most of noisy data sets, for example by recognising statistical patterns and quantifying uncertainty using probabilities. The NNM programme grew out of the recognition that the probabilistic aspects of neural networks needed to be put on a sound mathematical footing. The organisers recognised the strong inter-disciplinary aspect of the field, bringing together experts from computer science and statistics, as well as other fields with an interest in the area, such as physics and dynamical systems. The programme afforded these experts the time and space to exchange ideas.
Probabilistic graphical models
One particularly fruitful, though rather unexpected, outcome of the programme was the convergence of neural networks theory and what is called graphical modelling. In a graphical model the elements of a data set, say all the people on a social networking site, are represented by nodes in a network, with the links between nodes representing relationships between them, gained from statistical information extracted from the data. Thus, a graphical model provides a way of representing additional structural information about a data set.
Researchers at the NNM programme took the crucial first steps towards incorporating into neural networks the extra structural information that comes with graphical models. This approach has enabled neural networks to tackle much richer and harder problems than was previously possible. "The benefit of bringing the two communities together has been to provide us with a new paradigm for machine learning," says the programme organiser Christopher Bishop. "This is having a major impact, including a major commercial impact."
TrueSkill and adPredict
Following on from the MMN programme, two practical, and commercially extremely powerful, applications have been developed at Microsoft Research. Both applications learn in real time from large data networks comprising of millions, or billions, of nodes.
TrueSkill is a system for the Xbox Live internet gaming environment. Image: Evan-Amos.
TrueSkill is a system for the Xbox Live internet gaming environment. It takes the results from the large network of players competing online and uses this information to estimate players' skills and to match up players with similar skill levels for the next round of games.
adPredict is a mechanism for pricing advertisements that appear in the Microsoft search engine Bing. Analysing users' "click behaviour", it estimates the probability that users click on an advertisement, which directly influences advertising revenue.
"TrueSkill is to the best of my knowledge the first planet-scale application of Bayesian models," says Bishop. "AdPredict resulted from an internal competition within Microsoft, in which the Bing search engine team provided a training data set and invited teams to compete to produce the best predictor of the probability that a user would click on a particular advertisement. The adPredict system from Microsoft Research Cambridge was the winner in terms of accuracy of prediction jointly with another entry. However, the adPredict system was chosen for use in the product, as it was simpler and more scalable. It is now used on 100% of US traffic and is being rolled out world-wide."
By bringing together two separate scientific communities to work on a common theme (a "deliberate act of social engineering" as described by Bishop), the NNM programme not only paved the way for the applications developed at Microsoft, but also impacted on other areas, including the statistical analysis of DNA sequences, face recognition technologies and computer vision.
But the NNM programme did not just benefit the machine learning community. One participant who was able to put ideas from neural network theory back into statistical analysis is David Spiegelhalter, Winton Professor of the Public Understanding of Risk and Senior Scientist at the Biostatistics Unit in the University of Cambridge. After extensive interaction with other programme participants, Spiegelhalter experienced the crucial "Eureka moment" he needed to perfect a statistical software package called BUGS. The package is widely used to fit probabilistic graphical models to real-world data. It has a wide range of applications, from the modelling of animal populations to appraisals of new medical interventions.
"The programme was an enormously stimulating time," says Spiegelhalter. "The free time and space, and the opportunity to talk to people provided a great atmosphere for ideas."
Spiegelhalter's work eventually resulted in the paper Bayesian measures of model complexity and fit (with discussion) by DJ Spiegelhalter, NG Best, BP Carlin, and A van der Linde (JRSS, Series B, 64:583-640, 2002). By October 2010 this paper had over 2000 citations on Google Scholar and 1269 on Web of Science. According to Essential Science Indicators from Thomson Reuters it has become the third highest cited paper in all the mathematical sciences over the ten years ending in October 2009. As Spiegelhalter says, "All this is due to the Isaac Newton Institute providing a retreat for inspiration and concentration."