
"Since you are a computer scientist, I have an optimisation problem for you!", my colleague said, half jokingly. As an ecologist, one of the things my colleague studies is invasive plant species. The question he was facing is how to reconstruct the most likely routes along which these species travel when they invade new territory, based on historical records on when and where they first appeared. As it turns out, this question is an instance of a known mathematical optimisation problem called the minimum cost arborescence problem.
Invasive species: Spotted knapweed

Centaurea stoebe (spotted knapweed), a member of the daisy family. Photo: Matt Lavin, CC BY-SA 2.0.
Due to increased human activity and mobility, many invasive species have been introduced to various parts of the world. Examples are possums in New Zealand, rabbits in Australia, and spotted knapweed in North America. Spotted knapweed (scientific name: centaurea stoebe) is considered a pest in North America, outcompeting many native species. It was most likely brought over from Europe on ships in the late 19th century. In fact, there were at least two introduction sites: one on the east coast near Boston, and one on the west coast near Vancouver.
My ecologist colleague was interested in reconstructing the most likely dispersal routes of spotted knapweed after its (dual) introduction in North America. He already had data on when this species was first observed in many different locations across the continent, but did not know how to analyse this data mathematically to get the result he wanted. The key to the problem is a mathematical object known as a minimum cost arborescence.
Minimum cost arborescence
In mathematics a directed graph is a collection of nodes with arrows between them, which point from one node to another. An arborescence is a directed graph in which there is one unique node
A directed graph G with root node r and a cost assigned to each arrow. The minimum cost arborescence for this graph is indicated in bold. For example, the path {r, v2, v1, v4} from the root node r to node v4 has a lower cost (1+2+2=5) than the two shorter paths {r, v1, v4} (7+2=9) and {r, v2, v4} (1+6=7).
Reconstructing dispersal routes
When you are trying to reconstruct the route along which a species, such as the spotted knapweed, has dispersed you usually have the following pieces of information:
- A number
of geographic locations where the species has been observed. The locations are points on the map, so we can give them coordinates , , etc, up to . - For each location there is a time at which the particular species of interest was first observed at that location. We write
for the time corresponding to the location , etc, up to for the time corresponding to the location .
Using this information we can construct a directed graph
Assuming that a location is always invaded from the nearest earlier occupied location, it doesn't take much thought to see that finding the minimum cost arborescence for the graph

The most likely dispersal routes of spotted knapweed in North America, as reconstructed using the minimum cost arborescence method. The two red dots indicate the two (independent) introduction sites.
My colleague Olivier Broennimann and I published this formal correspondence between the dispersal routes reconstruction problem in ecology and the minimum cost arborescence problem in mathematics in an article in the Journal of Theoretical Biology.
Generalisations and other applications
Some of the assumptions underlying the method above are rather simplistic. For example, an invasion event may not always happen from the nearest already occupied location. There may be a barrier between two locations, such as a river or a mountain, that prevents the dispersal from one location to another. Or dispersal may depend on predominant wind direction (such as with plant seeds), which may preclude dispersal in certain directions. Such cases can be dealt with by setting the cost value of the arrow connecting the two locations to a very large value or even infinity. In other words, the cost function can be easily adjusted according to "real world" situations, to take geographical, climatic, or biological constraints into account.

Cute or not, the common bushtail possum is an invader.
There is also likely to be some noise in the data in terms of the first occurrence times. In the data set, the times
Why do we do it?
Being able to reconstruct the most likely dispersal routes of invasive species has important practical applications. For example, it can help in the design and implementation of quarantine strategies, or of conservation actions such as preventing the establishment of new "seed" populations, rather than focusing on established invasion fronts. Furthermore, the method as described here can be used to study various "what if?" scenarios. Both the data and the cost function can be adjusted to reflect hypothetical (possibly even future) scenarios, for which it can then be studied how that affects the most likely dispersal of invasive species.
In conclusion, the ecological dispersal routes reconstruction problem can be stated in terms of a known mathematical problem, for which there exists an efficient algorithm to find the optimal solution. This provides a mathematically sound and versatile tool to study the dynamics of invasive species, thus turning an ecologist's nightmare into a mathematician's dream.
About the author

Wim Hordijk is an independent scientist currently visiting the Santa Fe Institute in the USA. He has worked on many research and computing projects all over the world, mostly focusing on questions related to evolution and the origin of life. More information about his research can be found on his website.