*This article is part of our Information about information project, run in collaboration with FQXi. Click here to find out about other ways of measuring information. *

How much information is there in this article? It's hard to tell. Its length clearly isn't an indicator because that depends on my choice of words. One thing you could do to measure information is to condense the article into bullet points. If you end up with many of them, then that's because the article is pretty dense. If there's only one, then the information it conveys is rather basic, or at least not very complex.

### Measuring complexity

Andrey Kolmogorov (1903 - 1987).

In the 1960s mathematicians developed a way of measuring the complexity of a piece of information that reflects this intuition. They were looking at information from a computer's point of view. To a machine, a piece of information is just a string of symbols: letters, numbers, or, at the most basic level, 0s and 1s. A computer can't make bullet points, but it can be programmed to output a given string of symbols. If the information in the string can be condensed quite a lot, then there should be a short program that does this. If it can't, then the program will be longer.

To see how this works, imagine the string consists of a hundred zeros. A program that outputs that string only needs to say something like "print 0; repeat 100 times". Different programming languages will use different commands to do this, but the general idea will be the same. So that's a short program for a pretty simple string. If, on the other hand, your string contains 0s and 1s without any pattern at all, then there is no structure to exploit. Your program would have to say "print:" and then quote the entire string, making the program slightly longer than the string itself.

The length of
the shortest computer program that produces a given string and then
halts (we don't want infinite loops) is called the *Kolmogorov
complexity* of the string, after Andrey Kolmogorov, one of three people who came up with the idea (the other two were Ray Solomonoff and Gregory Chaitin). If you interpret complexity
as lack of structure, then the name makes sense.

"Kolmogorov complexity measures the maximum compressibility of a string via any computable process whatsoever," says Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology. "Compressibility is something we're all familiar with. There are all sorts of data compression programs, which play a crucial role on the internet, for example for streaming videos." The programs exploit the fact that lots of the information we deal with has some sort of structure, like the string of 100 zeroes above. The English language obviously contains structure, and so do images and pieces of music (which to a computer are also strings of 0s and 1s). Just as with the string of 0s above, you can exploit this structure to compress a file.

### Computing complexity...or not

So, what's the Kolmogorov complexity of your favourite song or your favourite movie? Before working it out, you need to fix which programming language you use, since a program could be shorter in one language than in another. This is a bit like fixing units of measurement such as inches or centimetres. It's important to know which units you're using, but once you've agreed on them, you can stop referring to them all the time. Which is why we won't specify any particular language here. The only restriction is that you're not allowed to cheat, for example by saving your favourite string under the name "s" and then writing a program that simply says "print s" โ we require that the program contains all the instructions on how to compress and decompress information.

Next, you need to find the shortest program that outputs your song
or movie, and that's a problem. "Kolmogorov
completely left open the question of how you actually find the shortest
program to output a given string," says Aaronson. That's bad news, but
things get worse. "You can actually prove
that this is an *uncomputable problem*." Informally, this means
that there is no general method for computing the Kolmogorov
complexity of any given string. You might be able to compute it for
some strings, but there's no general recipe that works for all.

Bertrand Russell (1872 - 1970).

There's a very elegant proof of this fact, based on a logical
conundrum called *Berry's paradox*. First notice that I could
use ordinary English words to specify a number. For example, the
sentence "The number of fingers on a generic person's hand" specifies
the number five. Clearly not every number can be specified using a
short sentence, say of at most 20 words. I, for one, can't think of a brief way to specify the number 2,354,899,123,657. Amongst the numbers that can't be specified by a sentence of at most 20 words
be, there must be a smallest one, which I'll call *n*.

But hang on โ I've
just specified *n* using the sentence

Sentence A: "*n* is the smallest whole number that cannot
be specified
using at most 20 words",

*n*can and can't be specified using at most 20 words โ it's a paradox. Bertrand Russell was the first to write about this puzzle, but he had learnt it from a junior librarian in Oxford by the name of G. G. Berry.

The way out of the conundrum lies in the ambiguity of
language. Sentence A doesn't give
you any numerical value, it merely gives a name, *n*, to a number you
believe exists. If giving a name to a number you believe exist counts as specifying
it, then you can specify any number at all,
so there isn't a smallest
one you can't specify. If you require that specifying a number
involves something more concrete, then sentence A doesn't actually specify
*n*. In either case, the paradox unravels.

There is no such ambiguity in computer languages though, which is why an analogue of Berry's paradox proves that Kolmogorov complexity is uncomputable. "Let's suppose there were a [general method for] computing the Kolmogorov complexity of any given string," explains Aaronson. "Then it turns out that you could use this method to design a very small computer program that finds a string of enormous Kolmogorov complexity, much bigger than the length of the program itself." That's a contradiction. The Kolmogorov complexity of the output string is the length of the shortest program that produces it. Since our program does produce the string, the string's Kolmogorov complexity can't possibly exceed the program's length: it's got to be the same, or smaller. We've encountered a contradiction, and this time there's no way out by appealing to ambiguity. We have no choice but to conclude that our original assumption, that Kolmogorov complexity is computable, was false.

### Uncomputable but useful

What's the use of a measure you can't actually compute? "Kolmogorov complexity is useful because it tells you the ideal situation you would like to approach with the imperfect tools you have in practice," says Aaronson. "In situations where the maths calls for Kolmogorov complexity, you can instead feed your data into a compression program, say using gzip or saving your image as a .gif file, and use that as a crude approximation of Kolmogorov complexity. Kolmogorov complexity would be the theoretical ideal compression. What you're getting with standard compression methods are various non-ideal compressions, which use various regularities in your data, but not all regularities."

How to compare two strings representing DNA?

The theoretical concept has inspired some very practical
applications. Imagine, for example, you want to compare two DNA
strings, or two music or image files, to see if they are correlated in
some way. One thing to do is to take the
strings *x* and *y* representing the two files
and to stick one to the end of the other, creating one very long
string *xy*. If *x* and *y* are exactly the same,
so they are maximally correlated, then the Kolmogorov complexity of
*xy* will be very close to that of the individual Kolmogorov complexity
of *x* and *y*. That's because a computer program that
outputs *xy* would only need to include the program that outputs
*x* and print the output twice.

If, on the other hand,
*x* and *y* aren't correlated at all, then the shortest
computer program to produce *xy* would have to consist of the
shortest program that outputs *x* and the shortest program that
outputs *y*. So in that case the Kolmogorov complexity of
*xy* equals the Kolmogorov complexity of *x* plus
the Kolmogorov complexity of *y*.

In practice you can't compute the Kolmogorov complexity, but you
can use compression programs to approximate it. If the approximated
complexity of *xy* is close to the sum of the approximated
individual complexities of *x* and *y*, then that's evidence they are not
very correlated. If it's close to the complexity of an individual
string, it's evidence that they are.

"There are all sorts of reasons why you might want to know the general relatedness of two different products, and this is often a useful technique," says Aaronson. "These are not direct applications because you cannot compute Kolmogorov complexity. But the maths gives you the ideal, which then gives you inspiration for what you can do with the tools you have."

But how does Kolmogorov complexity hold up as a measure of information content? The complexity of a string is low if it contains a lot of regularities that can be exploited for compression. If it doesn't contain regularities, in other words if it is random, then the Kolmogorov complexity is high. But can random strings really be thought of as carrying a lot of information? Aren't they in some sense meaningless? This is what we'll look at in the next article.

### About this article

Marianne Freiberger is co-editor of *Plus*. She would like to thank Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology, for a very useful conversation about information. And she recommends the book Information Theory: A Tutorial Introduction by James V. Stone as a mathematical introduction to the subject.