Here's a strange fact: if you look up some numbers, for example the
numbers in your tax return, population sizes of Chinese provinces, or
the length of the world's rivers, then most likely around 30% of these
numbers start with the digit 1, around 18% start with the digit 2,
12.5% start with a 3, and so on, all the way to 9 (which only heads up
around 5% of the numbers) - the larger the digit, the fewer numbers in
your list start with it. This fact, known as *Benford's law*, applies to
so many different kinds of data sets that it's often used to detect
fraud. But why does it work?

Well, if the processes that give rise to your list of numbers do
produce a universal distribution of first digits, then this
distribution should apply no matter what units you use. It should work
no matter if you do your tax return in pounds or in euros, or measure
your rivers in metres or miles - it's universal after all. This means
that the distribution of first digits remains the same when you
multiply your numbers by whatever constant you need to change between
units. And it turns out that the only distribution with this property
of *scale invariance* is precisely the Benford distribution.

As an example, imagine that your first digits are distributed equally (roughly the same proportion of numbers begin with the digit 1, 2, 3...) – so NOT according to the Benford distribution. Is this distribution scale invariant? Let's see what happens when we multiply by 2. All numbers starting with 5, 6, 7, 8, and 9, when multiplied by 2, give a number starting with 1. By contrast, the only way to end up with a number beginning with, say, 3, is to start out with a number starting with 1. In other words, the resulting distribution of first digits, after multiplying by 2, is skewed towards 1. It's not uniform, so your original distribution is not scale invariant. It's not too hard to show that in order to be scale invariant, the first digits have to be distributed in the way stipulated by the Benford distribution. It's worth noting though that Benford's law only applies to data sets that are neither too random, nor too constrained: alas, it doesn't work for lottery numbers.

To find out more about Benford's law read our article Looking out for number one. This news story explores an application of Benford's law to uncover potentially fraudulent elections.

*Return to the Plus Advent Calendar*

## Wonderful one

Hi and thanks for this great story about Benford's law. I'm wondering if the distribution would be different in other bases, since it appears from your explanation that it has a lot to do with the point at which the scaling pushes values to the next order of magnitude. Any thoughts on that?