How much information is there in this article? It's hard to tell. Its length clearly isn't an indicator because that depends on my choice of words. One thing you could do to measure information is to condense the article into bullet points. If you end up with many of them, then that's because the article is pretty dense. If there's only one, then the information it conveys is rather basic, or at least not very complex.
Andrey Kolmogorov (1903 - 1987).
In the 1960s mathematicians developed a way of measuring the complexity of a piece of information that reflects this intuition. They were looking at information from a computer's point of view. To a machine, a piece of information is just a string of symbols: letters, numbers, or, at the most basic level, 0s and 1s. A computer can't make bullet points, but it can be programmed to output a given string of symbols. If the information in the string can be condensed quite a lot, then there should be a short program that does this. If it can't, then the program will be longer.
To see how this works, imagine the string consists of a hundred zeros. A program that outputs that string only needs to say something like "print 0; repeat 100 times". Different programming languages will use different commands to do this, but the general idea will be the same. So that's a short program for a pretty simple string. If, on the other hand, your string contains 0s and 1s without any pattern at all, then there is no structure to exploit. Your program would have to say "print:" and then quote the entire string, making the program slightly longer than the string itself.
The length of the shortest computer program that produces a given string and then halts (we don't want infinite loops) is called the Kolmogorov complexity of the string, after Andrey Kolmogorov, one of three people who came up with the idea (the other two were Ray Solomonoff and Gregory Chaitin). If you interpret complexity as lack of structure, then the name makes sense.
"Kolmogorov complexity measures the maximum compressibility of a string via any computable process whatsoever," says Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology. "Compressibility is something we're all familiar with. There are all sorts of data compression programs, which play a crucial role on the internet, for example for streaming videos." The programs exploit the fact that lots of the information we deal with has some sort of structure, like the string of 100 zeroes above. The English language obviously contains structure, and so do images and pieces of music (which to a computer are also strings of 0s and 1s). Just as with the string of 0s above, you can exploit this structure to compress a file.
Computing complexity...or not
So, what's the Kolmogorov complexity of your favourite song or your favourite movie? Before working it out, you need to fix which programming language you use, since a program could be shorter in one language than in another. This is a bit like fixing units of measurement such as inches or centimetres. It's important to know which units you're using, but once you've agreed on them, you can stop referring to them all the time. Which is why we won't specify any particular language here. The only restriction is that you're not allowed to cheat, for example by saving your favourite string under the name "s" and then writing a program that simply says "print s" — we require that the program contains all the instructions on how to compress and decompress information.
Next, you need to find the shortest program that outputs your song or movie, and that's a problem. "Kolmogorov completely left open the question of how you actually find the shortest program to output a given string," says Aaronson. That's bad news, but things get worse. "You can actually prove that this is an uncomputable problem." Informally, this means that there is no general method for computing the Kolmogorov complexity of any given string. You might be able to compute it for some strings, but there's no general recipe that works for all.
Bertrand Russell (1872 - 1970).
There's a very elegant proof of this fact, based on a logical conundrum called Berry's paradox. First notice that I could use ordinary English words to specify a number. For example, the sentence "The number of fingers on a generic person's hand" specifies the number five. Clearly not every number can be specified using a short sentence, say of at most 20 words. I, for one, can't think of a brief way to specify the number 2,354,899,123,657. Amongst the numbers that can't be specified by a sentence of at most 20 words be, there must be a smallest one, which I'll call n.
But hang on — I've just specified n using the sentence
Sentence A: "n is the smallest whole number that cannot be specified using at most 20 words",which itself has less than 20 words. Therefore n can and can't be specified using at most 20 words — it's a paradox. Bertrand Russell was the first to write about this puzzle, but he had learnt it from a junior librarian in Oxford by the name of G. G. Berry.
The way out of the conundrum lies in the ambiguity of language. Sentence A doesn't give you any numerical value, it merely gives a name, n, to a number you believe exists. If giving a name to a number you believe exist counts as specifying it, then you can specify any number at all, so there isn't a smallest one you can't specify. If you require that specifying a number involves something more concrete, then sentence A doesn't actually specify n. In either case, the paradox unravels.
There is no such ambiguity in computer languages though, which is why an analogue of Berry's paradox proves that Kolmogorov complexity is uncomputable. "Let's suppose there were a [general method for] computing the Kolmogorov complexity of any given string," explains Aaronson. "Then it turns out that you could use this method to design a very small computer program that finds a string of enormous Kolmogorov complexity, much bigger than the length of the program itself." That's a contradiction. The Kolmogorov complexity of the output string is the length of the shortest program that produces it. Since our program does produce the string, the string's Kolmogorov complexity can't possibly exceed the program's length: it's got to be the same, or smaller. We've encountered a contradiction, and this time there's no way out by appealing to ambiguity. We have no choice but to conclude that our original assumption, that Kolmogorov complexity is computable, was false.
Uncomputable but useful
What's the use of a measure you can't actually compute? "Kolmogorov complexity is useful because it tells you the ideal situation you would like to approach with the imperfect tools you have in practice," says Aaronson. "In situations where the maths calls for Kolmogorov complexity, you can instead feed your data into a compression program, say using gzip or saving your image as a .gif file, and use that as a crude approximation of Kolmogorov complexity. Kolmogorov complexity would be the theoretical ideal compression. What you're getting with standard compression methods are various non-ideal compressions, which use various regularities in your data, but not all regularities."
How to compare two strings representing DNA?
The theoretical concept has inspired some very practical applications. Imagine, for example, you want to compare two DNA strings, or two music or image files, to see if they are correlated in some way. One thing to do is to take the strings x and y representing the two files and to stick one to the end of the other, creating one very long string xy. If x and y are exactly the same, so they are maximally correlated, then the Kolmogorov complexity of xy will be very close to that of the individual Kolmogorov complexity of x and y. That's because a computer program that outputs xy would only need to include the program that outputs x and print the output twice.
If, on the other hand, x and y aren't correlated at all, then the shortest computer program to produce xy would have to consist of the shortest program that outputs x and the shortest program that outputs y. So in that case the Kolmogorov complexity of xy equals the Kolmogorov complexity of x plus the Kolmogorov complexity of y.
In practice you can't compute the Kolmogorov complexity, but you can use compression programs to approximate it. If the approximated complexity of xy is close to the sum of the approximated individual complexities of x and y, then that's evidence they are not very correlated. If it's close to the complexity of an individual string, it's evidence that they are.
"There are all sorts of reasons why you might want to know the general relatedness of two different products, and this is often a useful technique," says Aaronson. "These are not direct applications because you cannot compute Kolmogorov complexity. But the maths gives you the ideal, which then gives you inspiration for what you can do with the tools you have."
But how does Kolmogorov complexity hold up as a measure of information content? The complexity of a string is low if it contains a lot of regularities that can be exploited for compression. If it doesn't contain regularities, in other words if it is random, then the Kolmogorov complexity is high. But can random strings really be thought of as carrying a lot of information? Aren't they in some sense meaningless? This is what we'll look at in the next article.
About this article
Marianne Freiberger is co-editor of Plus. She would like to thank Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology, for a very useful conversation about information. And she recommends the book Information Theory: A Tutorial Introduction by James V. Stone as a mathematical introduction to the subject.