*This article is part of our Information about information project, run in collaboration with FQXi. Click here to find out about other ways of measuring information. *

Our channels of communication aren't perfect. Whether you send a message by email or by smoke signal, there's always the chance that some random interference mangles it up on the way. Surprisingly though, you can always reduce the error rate to nearly zero, simply by choosing a clever way of encoding the message.

### Flipping bits

How do you cope with errors?

To get a feel for this, imagine you want to transmit the outcome of
a coin flip to me via some long-distance communication channel, writing a 0 for heads and a 1 for tails. You know there's a chance that a 0 will get
changed into a 1 on the way, and vice versa. Can you come up with a code that is
secure against such *bit flipping*? You might try using two bits per
outcome, writing 00 for heads and 11 for tails. If a 01 or a 10
arrives at the other end, then I will know that an error has
happened. But I won't know which of the two bits was flipped, so I
can't deduce whether you originally sent a 00 or a 11.

Let's instead try a code using three bits; 000 for heads and 111 for
tails. If I see a sequence that isn't one of those two I
won't only know that there has been an error. Assuming that only one
bit per three can be flipped, I will also know what triple
was the original: if the erroneous string has two 0s then the
original was 000, otherwise it was 111. Such a way of encoding, which
secures against errors, is called an *error correcting code*.

The example illustrates an important idea. You can get to
grips with errors by introducing *redundancy*: extra symbols
that reduce the chance that an error corrupts the
message. "This idea shows up all over the place," says Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology. "The
English language, or any other language, has lots of redundancy
built into it. If you miss one little piece of what I said you can
still understand the sentence. This is a property of human languages:
we don't maximally compress the information for a good reason."

### Errors versus redundancy

Just how error-proof can you make things using such error-correcting codes? In our example above the error rate would be zero if we could we sure that the
channel only
ever flips one bit per block of three. In reality things aren't
that predictable: you might know that your channel flips one bit per three
*on average*, but this doesn't mean
that it can't occasionally change a 000 standing for heads into a
011, leaving the receiver none the wiser as to what the intended
message was.
You might look for a
cleverer code to guard against this, but the point is that whatever code you use, there's still a chance
of ambiguity, leading to an error in the decoded message at the other end.

It seems that to reduce that chance, all you can do is add more redundancy. This would also make your transmission much less efficient since you need more bits to transmit a single symbol. It also puts you in danger of entering an arms race. If I ask you to produce a code that reduces the error rate to some small number, you need to add a certain amount of redundancy, if I reduce my required error rate further, you need to add more redundancy, an even smaller error rate requires even more redundancy, and so on, beyond all limits. A near-zero error rate would then require a near-infinite amount of redundancy, which means near-zero efficiency.

Radio transmissions can be noisy.

In the earlier days of long-distance communication this was indeed what people thought: when you're dealing with a noisy channel, you have no choice but trade your error rate for redundancy. In 1948, however, the mathematician Claude Shannon proved them wrong. In his ground-breaking *Mathematical theory of communication* Shannon showed that given any error rate, no matter how small, it's possible to find a code that produces this tiny error rate and enables you to transmit messages at a transmission rate that is only limited by the channel's capacity. This result is known as
Shannon's *noisy channel coding theorem*. It shows that near-error-free transmission doesn't lead to near-zero efficiency.

If you have read the previous article, you won't be surprised to hear that the noisy channel theorem involves the entropy of the source that produces the information (such as a coin flipping machine). Loosely speaking, the entropy measures the average amount of information carried by a symbol produced by the machine. Adding redundancy to a message reduces the amount of information per symbol and therefore lowers the entropy. What Shannon showed is that once the entropy is below some critical value, a code with an arbitrarily small rate of decoding error can always be found.

### Finding the best code

How do you find that magic code? Unfortunately, Shannon doesn't tell us. "Shannon didn't give a single concrete example of what code you would use," says Aaronson. His proof merely shows that if you pick a code at random from a set of suitable candidate codes, then with overwhelming probability that code will do. It's quite an amazing result, but not a very useful one. "Picking a code at random would not be computationally efficient. You might need a prohibitive amount of computational power to decode it." But Shannon wasn't concerned with that; his aim was to see what could be done with noisy channels in principle.

Which brings us right up to date. "A lot of the research in information theory in the 60-plus years since Shannon's paper, you can think of as an attempt to make Shannon's result explicit: trying to come up with an explicit error correcting code that is able to transmit information at close to the [maximal] rate that Shannon's theoretical construction gives you," says Aaronson. "There has been a lot of progress, especially in the last twenty-five years. There are now codes that can be encoded and decoded very efficiently and come close to [Shannon's limit]. These codes have major applications. They are used in CDs, in sending messages to space probes, etc. They are used all over the place in electrical engineering."

Shannon's ideas have been hugely influential, but his way of measuring information has a curious feature. The entropy measures the average information of a symbol produced by a particular information source. It tells us nothing about the information contained in one string of symbols produced by the source compared to another. Surely there must be a way of measuring that too? Find out more in the next article.

### About this article

Marianne Freiberger is co-editor of *Plus*. She would like to thank Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology, for a very useful conversation about information. And she recommends the book Information Theory: A Tutorial Introduction by James V. Stone as a mathematical introduction to the subject.