Here’s a good article by Ted Hill:
Hill, TP. The First-Digit Phenomenon. American Scientist 86 (4), 358-363. (1998)
Here’s a list of papers
This law was first notice in 1881 by the astronomer Simon Newcomb, then again in 1938 by the physicist Frank Benford. They both noticed that the starting digits of a lot of real world statistics do not appear evenly but follow a logarithmic distribution, for example this would mean numbers that start with a 1 appear over 30% of the time.
A quick appeal to intuition will show this is true for data that grows exponentially (geometrically). If something grows by some multiplication factor, you will soon see that the distribution is logarithmic, i.e. that numbers starting with a 1 appear 30% of the time. This explains the law for a lot of things that grow in this way, like prices and populations. Yet this law also appears in other types of growth, including factorials and Fibonacci numbers.
However, remarkably, this law also describes what happens when you take data randomly from a variety of sources, such as you might do if you took numbers from a newspaper. Although this data comes from a variety of distributions, not just from exponential growth but many other distributions, yet still follows Benford’s Law. Although Benford observed this fact in his original paper, it was not proven until 1995 by Ted Hill.
However, you can still prove Benford’s Law without knowing this. If we can assume Benford’s Law exists, then it must be scale invariant, i.e. it would not matter which units we choose to make our measurements in – kilometres, miles, feet, centimetres or whatever. As I prove in this video, the only distribution that is scale invariant must be the logarithmic distribution. Hence Benford’s Law is logarithmic. This proof was first put forward by Roger Pinkham in 1961.
In 1992, Mark Nigrini wrote his PhD thesis on the detection of income tax invasion using Bedford’s Law, and his ideas are applied in the detection of fraud.
We see Benford’s Law in observational data because real data can be a complex mix of many distributions and because it is the distribution achieved when data is repeatedly multiplied, divided, or raised to integer powers. And, once achieved, the distribution persists under further multiplication, division, and raising to integer powers.