Main Page | See live article | Alphabetical index

Benford's law

Benford's law, also called the first digit law, states that in numbers from many sources, the leading digit 1 occurs much more often than the others (namely about 30% of the time). Furthermore, the higher the digit, the less likely it is to occur as the leading digit of a number. This applies to figures related to the natural world or of social significance; be it numbers taken from electricity bills, newspaper articles, street addresses, stock prices, population numbers, death rates, areas or lengths of rivers or physical and mathematical constants.

Mathematical statement

More precisely, Benford's Law states that the leading digit n (n = 1, ..., 9) occurs with probability log10(n + 1) − log10(n), or
Leading digit Probability
1 30.1 %
2 17.6 %
3 12.5 %
4 9.7 %
5 7.9 %
6 6.7 %
7 5.8 %
8 5.1 %
9 4.6 %

One can also formulate a law for the first two digits: the probability that the first two-digit block is equal to n (n = 10, ..., 99) is log10(n+1) − log10(n).

Explanation

That in general the leading digit 1 should be more common than the other digits can be understood as follows: start counting from 1: 1, 2, 3, ... As you reach 9, every digit will have been equally likely. But then, from 10 to 19, you only have the leading digit 1, so 1 gets a huge head start. Only when you reach 99 will all digits be equally likely again. But then 1 gets another huge head start from 100 to 199. And so it continues: 1 has always a lead, except for very rare exceptions (9, 99, 999, 9999, ...).

Perhaps somewhat more precisely, suppose (capital) X is a random variable whose probability of being equal to any positive integer (lower-case) x is a constant times xs, where s > 1. The aforementioned "constant" must then be 1/ζ(s), where ζ is the Riemann zeta function (see zeta distribution). The probability that the first digit of X is n approaches log10(n + 1) − log10(n) as s approaches 1.

The precise form of Benford's law can be explained if one assumes that the logarithms of the numbers are uniformly distributed; this means that a number is for instance just as likely to be between 100 and 1000 (logarithm between 2 and 3) as it is between 10,000 and 100,000 (logarithm between 4 and 5). For many sets of numbers, especially ones that grow geometrically such as incomes and stock prices, this is a reasonable assumption.

Note that for numbers drawn from many distributions, for example IQ scores, human heights or other variables following normal distributions, the law is not valid. However, if one "mixes" number from those distributions, as occurs for example when taking numbers from newspaper articles, Benford's law reappears. This can be proven mathematically: if one repeatedly "randomly" choses a probability distribution and then randomly choses a number according to that distribution, the resulting list of numbers will obey Benford's law.

Applications

Income tax agencies and accounting businessess use Benford's Law to spot fraud, as people who make up figures tend to distribute their digits more uniformly.

History

The discovery of this fact goes back to 1881, when the American astronomer Simon Newcomb noticed that the first pages of logarithm books (used at that time to perform calculations), the ones containing numbers that started with 1, were much more worn than the other pages. The phenomenon was rediscovered in 1938 by the physicist Frank Benford, who checked it on a wide variety on data sets and was credited for it. In 1996, Ted Hill proved the result about mixed distributions mentioned above.

References:

See also: