Ivars Peterson's MathTrek

June 29, 1998

### First Digits

Take a look at a newspaper page listing stock market prices. You might think that each of the numbers from 1 to 9 would occur equally often among the first digits of all the listed prices. Instead, however, you're very likely to find that numbers starting with 1 come up more often than numbers starting with 2, numbers starting with 2 come up more often than numbers starting with 3, and so on. In fact, 1 comes up about 30 percent of the time--much greater than the expected 11 percent. At the other end, the digit 9 occurs only about 5 percent of the time.

The earliest known report of this curious first-digit phenomenon was made by the astronomer and mathematician Simon Newcomb (1835-1909). He observed that the pages of heavily used books of logarithms were grimier at the beginning than at the end, suggesting that fellow scientists tended to look up smaller numbers more often than larger ones. In 1881 in a brief article in the American Journal of Mathematics, Newcomb conjectured that the occurrence of first digits follows a particular probability distribution.

The probability of a given first digit, n, is log10 (1 + 1/n).

"That the digits are not equally likely to appear comes as something of a surprise," notes Ted Hill of the Georgia Institute of Technology in Atlanta, "but to claim an exact law describing their distribution is indeed striking." Hill describes the first-digit phenomenon and reviews its mathematical foundations and applications in the July-August American Scientist.

Fifty-seven years after the publication of Newcomb's article, the physicist Frank Benford of General Electric also noted that the initial pages of logarithm books were more worn and smudged than later pages. Apparently unaware of Newcomb's work, he ended up proposing the same logarithm law for the distribution of first digits.

Benford, however, went one step further. He tested his conjecture on a wide range of data sets, from river basin areas and population figures to baseball statistics and numbers appearing in Reader's Digest articles. The data fit the postulated logarithm law amazingly well.

A portion of Benford's first-digit data table (percentages):

 1 2 3 4 5 6 7 8 9 Rivers, Area 31 16.4 10.7 11.3 7.2 8.6 5.5 4.2 5.1 Population 33.9 20.4 14.2 8.1 7.2 6.2 4.1 3.7 2.2 American League 32.7 17.6 12.6 9.8 7.4 6.4 4.9 5.6 3 Reader's Digest 33.4 18.5 12.4 7.5 7.1 6.5 5.5 4.9 4.2

That logarithmic relationship is now often called Benford's law.

It's useful to note that many tables of numbers do not follow a logarithmic distribution of first digits. Lists of telephone numbers in a given region, for instance, usually start with certain digits peculiar to the area. Tables of square roots also don't fit.

At the same time, the law does work for tables of physical constants, numbers appearing on newspaper front pages, accounting data, and a variety of scientific calculations.

There is even a general significant-digit law that includes the second and subsequent digits of given numbers. According to this law, Hill says, the second significant digits, though decreasing in relative frequency through the digits from 1 to 9, are much more uniformly distributed than the first digits.

The general law also implies that the significant digits are not independent. Instead, knowledge of one digit affects the likelihood of another. For example, the unconditional probability that the second digit is 2 is about .109. However, the probability that the second digit is 2, given that the first digit is 1, is roughly .115.

Why Benford's law applies to a wide range of data (though not all) has proved a tricky question. It wasn't until the mid-1990s that Hill provided a mathematical basis for Benford's law.

One needs to think of the data as coming from many different distributions instead of from one huge table or set of numbers. Using this idea, Hill came up with a new statistical form of the significant-digit law: If distributions are selected at random and random samples are taken from each of these distributions, the significant-digit frequencies of the combined sample will converge to Benford's distribution, even though the individual distributions selected may not closely follow the law. Hill calls it the "random samples from random distributions" theorem.

For example, lottery numbers, numbers distributed according to the standard bell curve of statistics, and atomic weights don't fit Benford's law. However, when the percentages obtained from these data sets are averaged, the resulting digital frequencies fit a logarithmic distribution much more closely.

"The random-sample theorem helps explain how the logarithm-table digital frequencies observed a century ago by Newcomb, and modern tax, census, and stock data, all lead to the same log distribution," Hill says. "The new theorem also helps predict the appearance of the significant-digit phenomenon in many different empirical contexts (including your morning paper) and thus helps justify some of the recent applications of Benford's law."

Those applications include the testing of mathematical models, the design of computers, and the detection of fraud or fabrication of data in financial documents and income tax returns.

Just a few decades ago, the significant-digit phenomenon was thought to be little more than a mathematical curiosity--one without real-life applications and a satisfactory mathematical explanation. Hill concludes, "Today the answer is much less obscure, is firmly couched in the modern mathematical theory of probability, and is seeing important applications to society."

References:

Benford, F. 1938. The law of anomalous numbers. Proceedings of the American Philosophical Society 78:551.

Boyle, J. 1994. An application of the Fourier series to the most significant digit problem. American Mathematical Monthly 101(November):879.

Hill, T.P. 1998. The first digit phenomenon. American Scientist 86(July-August):358.

______. 1995. The significant-digit phenomenon. American Mathematical Monthly 102(April):322.

Newcomb, S. 1881. Note on the frequency of the use of digits in natural numbers. American Journal of Mathematics 4:39.

Raimi, R.A. 1976. The first digit phenomenon. American Mathematical Monthly 83:521.

______. 1969. The peculiar distribution of first digits. Scientific American 221(December):109.

A definition of Benford's law can be found at http://www.astro.virginia.edu/~eww6n/math/Benford'sLaw.html.

Examples of how the first-digit phenomenon is being used to detect fraud can be found at
http://www.fm.co.za/97/0926/infotech/audit.htm,
and
http://www.bham.ac.uk/EAA/eaa97/abstracts/BUSTA.HTM.