

Information is bits.
We send information over the internet every day. All that information, whether it's a birthday email or your homework, is in encoded into sequences of 0s and 1s. What's the best way of doing this? The way that uses the least amount of 0s and 1s and therefore requires the smallest amount of computer memory.
Let's do a back-of-the-envelope calculation. Suppose we want to encode the 26 (lower case) letters of the alphabet, plus an additional symbol that denotes a space: each of the 27 symbols is to be represented by a string of 0s and 1s. How long do these strings need to be? Clearly they must be long enough to give a total of at least 27 distinct strings, so that we can associate them uniquely to our 27 symbols. There areExploiting patterns
The calculation above is neat, but we can do better. "Any reasonable [code] would take advantage of the fact that some letters, like the letter "e" in English, occur much more frequently than others," explains Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology. "So we can use a smaller number of bits for those." It's an idea that's been used in Morse code for over 150 years: here the more common letters are encoded using shorter strings of dots and dashes than the rarer ones.

.
bits per character. That's the same calculation as the one above. In particular, we need
The birth of the bit
Thinking of information in terms of bits — 0s and 1s — seems like
something that firmly belongs to the modern era, but the
idea has been around for over 60 years. It was the mathematician Claude Shannon who in 1948 popularised the word "bit", which is shorthand for
"binary digit". Shannon also came up with the formula we found
above. Given an alphabet of

Claude Shannon.
is called the entropy of the distribution (see this article to find out more). Shannon proved that the average number of bits needed per symbol cannot be smaller than the entropy, no matter how cleverly you encode them. For the vast majority of alphabets and distributions, the average number of bits needed is in fact only slightly bigger than the entropy. (One encoding that comes very close to Shannon's entropy for the vast majority of alphabets and distributions is the so-called Huffman code.)
If you're in the business of sending messages long-distance, the entropy is a useful number to know. If you know you can transmit some numberShannon published this result in A mathematical theory of communication, "A paper that is justly celebrated as one of the greatest of the twentieth century," says Aaronson. It's a highly theoretical effort. The source code theorem, for example, doesn't tell you which code gets you within a specified distance of the maximal rate of C/H per second — it only says that there is one.
When it comes to practical applications, the theorem also has a major short-coming: it assumes that your communication channels transmit pristine messages and never introduce any errors. In real life this just doesn't happen: whether you send a message in a telegram, via email or using SMS, there is always a chance it is scrambled on the way and arrives at the other end corrupted. This is where things get interesting. How do you deal with so-called noisy channels and how much information can you transmit? Find out in the next article.
About this article
Marianne Freiberger is co-editor of Plus. She would like to thank Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology, for a very useful conversation about information. And she recommends the book Information Theory: A Tutorial Introduction by James V. Stone as a mathematical introduction to the subject.

This article is part of our Information about information project, run in collaboration with FQXi. Click here to find out about other ways of measuring information.