Our channels of communication aren't perfect. Whether you send a message by email or by smoke signal, there's always the chance that some random interference mangles it up on the way. Surprisingly though, you can always reduce the error rate to nearly zero, simply by choosing a clever way of encoding the message.
How do you cope with errors?
To get a feel for this, imagine you want to transmit the outcome of a coin flip to me via some long-distance communication channel, writing a 0 for heads and a 1 for tails. You know there's a chance that a 0 will get changed into a 1 on the way, and vice versa. Can you come up with a code that is secure against such bit flipping? You might try using two bits per outcome, writing 00 for heads and 11 for tails. If a 01 or a 10 arrives at the other end, then I will know that an error has happened. But I won't know which of the two bits was flipped, so I can't deduce whether you originally sent a 00 or a 11.
Let's instead try a code using three bits; 000 for heads and 111 for tails. If I see a sequence that isn't one of those two I won't only know that there has been an error. Assuming that only one bit per three can be flipped, I will also know what triple was the original: if the erroneous string has two 0s then the original was 000, otherwise it was 111. Such a way of encoding, which secures against errors, is called an error correcting code.
The example illustrates an important idea. You can get to grips with errors by introducing redundancy: extra symbols that reduce the chance that an error corrupts the message. "This idea shows up all over the place," says Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology. "The English language, or any other language, has lots of redundancy built into it. If you miss one little piece of what I said you can still understand the sentence. This is a property of human languages: we don't maximally compress the information for a good reason."
Errors versus redundancy
Just how error-proof can you make things using such error-correcting codes? In our example above the error rate would be zero if we could we sure that the channel only ever flips one bit per block of three. In reality things aren't that predictable: you might know that your channel flips one bit per three on average, but this doesn't mean that it can't occasionally change a 000 standing for heads into a 011, leaving the receiver none the wiser as to what the intended message was. You might look for a cleverer code to guard against this, but the point is that whatever code you use, there's still a chance of ambiguity, leading to an error in the decoded message at the other end.
It seems that to reduce that chance, all you can do is add more redundancy. This would also make your transmission much less efficient since you need more bits to transmit a single symbol. It also puts you in danger of entering an arms race. If I ask you to produce a code that reduces the error rate to some small number, you need to add a certain amount of redundancy, if I reduce my required error rate further, you need to add more redundancy, an even smaller error rate requires even more redundancy, and so on, beyond all limits. A near-zero error rate would then require a near-infinite amount of redundancy, which means near-zero efficiency.
Radio transmissions can be noisy.
In the earlier days of long-distance communication this was indeed what people thought: when you're dealing with a noisy channel, you have no choice but trade your error rate for redundancy. In 1948, however, the mathematician Claude Shannon proved them wrong. In his ground-breaking Mathematical theory of communication Shannon showed that given any error rate, no matter how small, it's possible to find a code that produces this tiny error rate and enables you to transmit messages at a transmission rate that is only limited by the channel's capacity. This result is known as Shannon's noisy channel coding theorem. It shows that near-error-free transmission doesn't lead to near-zero efficiency.
If you have read the previous article, you won't be surprised to hear that the noisy channel theorem involves the entropy of the source that produces the information (such as a coin flipping machine). Loosely speaking, the entropy measures the average amount of information carried by a symbol produced by the machine. Adding redundancy to a message reduces the amount of information per symbol and therefore lowers the entropy. What Shannon showed is that once the entropy is below some critical value, a code with an arbitrarily small rate of decoding error can always be found.
Finding the best code
How do you find that magic code? Unfortunately, Shannon doesn't tell us. "Shannon didn't give a single concrete example of what code you would use," says Aaronson. His proof merely shows that if you pick a code at random from a set of suitable candidate codes, then with overwhelming probability that code will do. It's quite an amazing result, but not a very useful one. "Picking a code at random would not be computationally efficient. You might need a prohibitive amount of computational power to decode it." But Shannon wasn't concerned with that; his aim was to see what could be done with noisy channels in principle.
Which brings us right up to date. "A lot of the research in information theory in the 60-plus years since Shannon's paper, you can think of as an attempt to make Shannon's result explicit: trying to come up with an explicit error correcting code that is able to transmit information at close to the [maximal] rate that Shannon's theoretical construction gives you," says Aaronson. "There has been a lot of progress, especially in the last twenty-five years. There are now codes that can be encoded and decoded very efficiently and come close to [Shannon's limit]. These codes have major applications. They are used in CDs, in sending messages to space probes, etc. They are used all over the place in electrical engineering."
Shannon's ideas have been hugely influential, but his way of measuring information has a curious feature. The entropy measures the average information of a symbol produced by a particular information source. It tells us nothing about the information contained in one string of symbols produced by the source compared to another. Surely there must be a way of measuring that too? Find out more in the next article.
About this article
Marianne Freiberger is co-editor of Plus. She would like to thank Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology, for a very useful conversation about information. And she recommends the book Information Theory: A Tutorial Introduction by James V. Stone as a mathematical introduction to the subject.