An Introduction to Benford's Law

Arno Berger and Theodore P. Hill

Publisher:

Princeton University Press

Publication Date:

2015

Number of Pages:

248

Format:

Hardcover

Price:

75.00

ISBN:

9780691163062

Category:

Monograph

MAA Review
Table of Contents

[Reviewed by

Frank Benford

, on

10/28/2015

]

To forestall potential confusion, let me say at the outset that the author of this review is not the physicist Frank Benford for whom Benford’s law is named, but one of his grandsons.

My grandfather studied the empirical distribution of “first digits” (i.e., the leftmost digit, or most significant digit) by compiling a large (for 1938) and heterogeneous data set of 20,229 (decimal) numbers culled from newspapers, almanacs, scientific handbooks, and other sources. He found that the proportion of these numbers with first digit \(d\) (for every \(d\in\left\{1,2,\ldots,9\right\}\)) was given to a close approximation by the formula \[ \Pr\left(D_1=d\right)=\log\left(1+\frac{1}{d}\right)\text{,}\tag*{(1)} \] where “\(\log\)” here (and henceforth, unless noted otherwise) denotes the “common logarithm,” i.e., logarithm to base 10. I will say that a random variable having the probability mass function given by eq. (1) on \(\left\{1,2,\ldots,9\right\}\) has a “discrete logarithmic distribution.” Berger and Hill simply say that eq. (1) defines a “logarithmic distribution,” but I have my reasons, which will soon become clear, for adding the word “discrete.” A bar chart showing \(\log\left(1+1/d\right)\) for each \(d\in\left\{1,2,\ldots,9\right\}\) shows a nice monotonic decrease from \(\log\left(1+1/1\right)=\log\left(2\right)\approx 0.301\) to \(\log\left(1+1/9\right)\approx 0.046\).

Several things need to be said immediately about this equation.

Equation (1) is correctly named the “first digit law.” Benford’s law concerns the joint distribution of the first \(m\) significant digits, for every \(m\in\mathbb{N}\), so eq. (1) is only a part of Benford’s law (the most conspicuous part, to be sure).
The function \(\Pr\) may be interpreted as either “proportion” or “probability.” At the moment it’s premature to interpret it as “probability,” but we’ll take care of that soon enough.
If we interpret \(\Pr\) as “proportion,” then eq. (1) cannot hold exactly for any finite data set, as \(\Pr\left(D_1=d\right)\) is a rational number while \(\log\left(1+1/d\right)\) is irrational.

We’ll return to this last remark in a moment, but now it’s time to state Benford’s law in its most general form. This can be done in terms of the joint distribution of the first \(m\) significant digits, as noted above, but it is better to state it in terms of the significand function. For any \(x\neq 0\), the (decimal) significand of \(x\), denoted \(S\left(x\right)\), is defined to be the unique \(s\in[1,10)\) such that \(\left|x\right|=s\times 10^k\) for some (necessarily unique) integer \(k\). For example, \(S(314)=S\left(0.00314\right)=S\left(3.14\right)=3.14\). Explicitly, \[ S\left(x\right)=10^{\log\left|x\right|-\left\lfloor\log\left|x\right|\right \rfloor}=10^{\langle\log\left|x\right|\rangle},\tag*{(2)} \] where \(\langle y\rangle\equiv y-\left\lfloor y\right\rfloor\) denotes the fractional part of \(y\in\mathbb{R}\). Note that \(\langle y\rangle\) is necessarily an element of \([0,1)\). Hence, \[ \log S\left(x\right)=\langle\log\left|x\right|\rangle\quad\text{for all }x\neq 0.\tag*{(3)} \] As will soon be seen, this is a very useful equation. Following Berger and Hill, we adopt the conventions that \(S\left(0\right)\equiv 0\) and \(\log 0\equiv 0\). With these conventions, eq. (3) holds for all \(x\in\mathbb{R}\).

For any \(x\neq 0\), let \(D_1\left(x\right),\,D_2\left(x\right),\,D_3\left(x\right),\) etc., denote the first, second, third, etc. significant digits of \(x\). For example, \(D_1\left(3.1416\right)=3\), \(D_2\left(3.1416\right)=1\), \(D_3\left(3.1416 \right)=4\), and so on. (I’m being informal; Berger and Hill give a precise definition of \(D_j\) for all \(j\in\mathbb{N}\).) Note that \(D_1\left(x\right)\in\left\{1,2,\ldots,9\right\}\), but \(D_j\left(x\right)\in\left\{0,1,\ldots,9\right\}\) for all \(j\geq 2\). Without going into the details, \(S\left(x\right)\) may be expressed in terms of \(\left(D_1\left(x\right),\,D_2\left(x\right),\,\ldots\right)\), and vice versa. In particular, it may be seen that \[ D_1\left(x\right)=\left\lfloor S\left(x\right)\right\rfloor,\tag*{(4)} \] so \[ D_1\left(x\right)=d\Leftrightarrow S\left(x\right)\in[d,d+1).\tag*{(5)} \]

The significand function gives us a convenient and compact way to state Benford’s law. If \(x\) is a member of a large data set that satisfies Benford’s law (approximately), then \[ \Pr\left(S\left(x\right)\leq s\right)=\log s\qquad\text{for all }1\leq s<10.\tag*{(6)} \] The reader may confirm that eq. (6) implies eq. (1).

My grandfather considered his (re)discovery of eq. (1) and the rest of Benford’s law to be a scientific law rather than a mathematical result. Nonetheless, mathematicians have been having fun with it for 77 years now, and there’s no sign that interest in the mathematics of these phenomena has tapered off. The book reviewed here is intended to be a fairly complete summary of this work. As Berger and Hill (page 223) note, “the main goal of this book has been to develop a solid and comprehensive theoretical basis for the appearance of Benford’s law in a broad range of mathematical settings.” So far as I can judge, they have succeeded.

As noted above, eq. (1) (and also eq. (6)) cannot be satisfied exactly by any finite data set. To explore the mathematical aspects of Benford’s law, it’s therefore necessary to examine mathematical objects where eq. (6) might hold in an appropriate sense. There are three types of objects that Berger and Hill examine: sequences of real numbers defined on \(\mathbb{N}\), functions from \(\mathbb{R}_{+}\,\)into \(\mathbb{R}\), and real valued random variables.

Definition 1. A sequence \(\left(x_n\right)=\left(x_1,x_2,\ldots\right)\) of real numbers is said to be Benford, or a Benford sequence, if \[ \lim_{N\rightarrow\infty}\frac{\text{#}\left\{1\leq n\leq N\colon S\left(x_n\right)\leq s\right\}}{N}=\log s\qquad\text{for all }1\leq s<10.\tag*{(7)} \]

Examples (all discussed in Berger and Hill). (1) The sequence of natural numbers \(\left(n\right)\) is not Benford. (2) The sequence of prime numbers \(\left(p_n\right)=\left(2,3,5,\ldots\right)\) is not Benford. (3) The sequence of Fibonacci numbers \(\left(F_n\right)=\left(1,1,2,3,5,8,\ldots\right)\) is Benford. (4) Almost all geometric sequences are Benford. To be precise, the following may be shown: \(\left(a^n\right)\) is Benford if and only if \(\log a\notin\mathbb{Q}\). Hence, \(\left(2^n\right)\) is Benford, but \(\left(10^n\right)\) is not. (5) The sequence of factorials \(\left(n!\right)\) is Benford.

Definition 2. A (Borel measurable) function \(f\colon\mathbb{R}_{+}\rightarrow\mathbb{R}\) is Benford if \[ \lim_{T\rightarrow\infty}\frac{\lambda\left(\left\{t\in[0,T)\colon S\left(f\left(t\right)\right)\leq s\right\}\right)}{T}=\log s\qquad\text{for all }1\leq s<10,\tag*{(8)} \] where \(\lambda\) denotes Lebesgue measure.

Examples: (1) No linear function \(f\left(t\right)=at+b\) is Benford. (2) All non-trivial exponential functions (e.g., \(f\left(t\right)=e^{rt}\) or \(f\left(t\right)=10^{rt}\) with \(r\neq 0\)) are Benford.

To define what it means for a random variable to be Benford, it’s useful to give a preliminary definition.

Definition 3. Let \(S\) be a random variable taking values in \([1,10)\). We will say that \(S\) has a logarithmic distribution if \(S\) is absolutely continuous with distribution function given by \[ F\left(s\right)\equiv\Pr\left(S\leq s\right)=\log s\qquad\text{for all \ }1\leq s<10. \] It follows that the pdf of a random variable \(S\) with a logarithmic distribution is given by \[ f\left(s\right)= \begin{cases} \Lambda/s&\text{if}\quad s\in[1,10),\\ 0&\text{otherwise,} \end{cases} \tag*{(10)} \] where \(\Lambda\equiv\log e=1/\ln 10\).

Proposition 1. If \(S\) has a logarithmic distribution, then \(\log S\sim U[0,1)\). Conversely, if \(U\sim U[0,1)\), then \(10^U\) has a logarithmic distribution. I leave the proof to the reader.

Definition 4. A real valued random variable \(X\) is said to be Benford if \(S\left(X\right)\) has a logarithmic distribution, so \(\Pr\left(S\left(X\right)\leq s\right)=\log s\) for all \(1\leq s<10\).

Examples: (1) If \(X\) has a logarithmic distribution, then \(S\left(X\right)=X\) and \(X\) is Benford. (2) If \(U\sim U[0,1)\), then \(10^U\) is Benford. (3) For any \(k\in\mathbb{Z},\) define \(f_k\) as \[ f_k\left(x\right)= \begin{cases} \Lambda/x&\text{if}\quad x\in[10^k,10^{k+1})\\ 0&\text{otherwise.} \end{cases} \] If a random variable \(X\) is distributed on \(\mathbb{R}_{+}\,\)with pdf \(f_k\), then \(X\) is Benford. More generally, any random variable whose pdf is a convex combination of \(\left(f_k\right)_{k\in\mathbb{Z}}\) is Benford.

There is apparently nothing in Definition 4 that prevents a discrete random variable, say one defined on the natural numbers \(\mathbb{N}\), from being Benford. However, Berger and Hill show that no discrete random variable can be Benford.

None of the standard continuous random variables (normal, gamma, beta, etc.) are Benford. However, several of them are “asymptotically” Benford in the following sense. Let \(X_\alpha\) be a random variable whose distribution has a parameter \(\alpha\), and let \(F_\alpha\) denote the distribution function of \(\log S\left(X_\alpha\right).\,\,\)If \(X_\alpha\) were Benford, then \(\log S\left(X_\alpha\right)\) would have a \(U[0,1)\) distribution and \(F_\alpha\left(u\right)=u\) for all \(u\in[0,1)\). Hence, we may measure the deviation of \(X_\alpha\) from a Benford random variable by \[ R\left(\alpha\right)\equiv\max_{u\in[0,1)}\left|F_\alpha \left(u\right)-u\right|. \] We will say that \(X_\alpha\) is asymptotically Benford if \(R\left(\alpha\right)\) decreases to zero as \(\alpha\) approaches some limit.

Examples: (1) Suppose \(X_\alpha\sim\)Pareto\(\left(\alpha\right)\) where \(\alpha>0\), i.e., \(\Pr\left(X_\alpha>x\right)=x^{-\alpha}\) for all \(x\geq 1\). Then it may be shown that \(X_\alpha\) is asymptotically Benford as \(\alpha\) decreases to zero. (2) Suppose that \(X_\sigma\) is a log-normal random variable with parameters \(0\) and \(\sigma^2\). It may be shown that \(X_\sigma\) is asymptotically Benford as \(\sigma\rightarrow\infty\). In fact, the convergence of \(R\left(\sigma\right)\) to zero in this case is extremely rapid, so (quoting Berger and Hill) “a log-normal random variable with large variance is practically indistinguishable from a Benford random variable.”

From the remarks given above, it might seem that Benford random variables are quite rare. On the other hand, data sets that are (approximately) Benford are very common, if not ubiquitous. The contrast between these two observations is one of several puzzles that motivate people to study Benford’s law.

From Proposition 1 and eq. (3) we obtain

Proposition 2: If \(X\) is Benford, then \[ \log S\left(X\right)=\langle\log\left|X\right|\rangle\sim U[0,1).\tag*{(11)} \] We say of eq. (11) that “\(\log\left|X\right|\) is uniformly distributed modulo one,” henceforth abbreviated u.d. mod 1. This is the “uniform distribution characterization of Benford’s law” for random variables. A similar transformation may be applied to Benford sequences and Benford functions. Berger and Hill say of these transformations

The uniform distribution characterization of Benford’s law is undoubtedly the most basic and powerful of all characterizations, largely because the mathematical theory of uniform distributions modulo one is very well developed.

We begin with the formal definitions.

Definition 5. (a) A sequence of real numbers \(\left(y_n\right)\equiv\left(y_1,y_2,\ldots\right)\) is u.d. mod 1 if \[ \lim_{N\rightarrow\infty}\frac{\text{#}\left\{1\leq n\leq N\colon\langle y_n\rangle\leq u\right\}}{N}=u\qquad\text{for all }0\leq u<1. \] (b) A (Borel measurable) function \(g\colon\mathbb{R}_{+}\rightarrow\mathbb{R}\,\)is u.d. mod 1 if \[ \lim_{T\rightarrow\infty}\frac{\lambda\left(\left\{\tau\in[0,T)\colon\langle g\left(\tau\right)\rangle\leq u\right\}\right)}{T}=u\qquad\text{for all \ }0\leq u<1, \] where \(\lambda\) denotes Lebesgue measure. (c) A random variable \(Y\) is u.d. mod 1 if \[ \Pr\left(\langle Y\rangle\leq u\right)=u\qquad\text{for all }0\leq u<1. \]

The tight connections between Benford objects and objects that are uniformly distributed modulo one are listed in the following theorem.

Proposition 3. (Uniform distribution characterization). (a) A sequence of real numbers \(\left(x_n\right)\,\)is Benford iff the sequence \(\left(\log\left|x_n\right|\right)\) is u.d. mod 1. (b) A (Borel measurable) function \(f\colon\mathbb{R}_{+}\rightarrow\mathbb{R}\,\)is Benford iff the function \(g\colon\mathbb{R}_{+}\rightarrow\mathbb{R}\) defined as \(g\left(\tau\right)\equiv\log\left|f\left(\tau\right)\right|\) for all \(\tau\geq 0\) is u.d. mod 1. (c) A random variable \(X\) is Benford iff the random variable \(\log\left|X\right|\) is u.d. mod 1.

The many results known about objects that are u.d. mod 1 may hence be used to establish that many (sequences/functions/random variables) (are/are not) Benford. For example, the following theorem is known: the random variable \(Y\) is u.d. mod 1 iff \(kY+b\) is u.d. mod 1 for every integer \(k\neq 0\) and \(b\in\mathbb{R}.\,\,\)Combined with Proposition 3, this implies the following. Proposition 4. Suppose that \(X\) is a Benford random variable. Then, for all \(a\in\mathbb{R}\) and \(k\in\mathbb{Z}\) with \(ak\neq 0\), the random variable \(aX^k\) is Benford. For example, if \(X\) is Benford then so is \(X^{-1}\).

Here’s another striking theorem that emerges from this analysis. Proposition 5. Suppose that \(X\) and \(Y\) are independent random variables, and that \(\Pr\left(XY=0\right)=0\). Then if \(X\) is Benford, then so is \(XY\). Hence, if \(X_1,\ldots,X_n\,\)are independent random variables with \(\Pr\left(X_1=0\right)=\cdots=\Pr\left(X_n=0\right)=0\), then the product \[ \prod_{i=1}^nX_i \] is Benford if any one of the factor random variables is Benford.

We now turn to the topics of scale-, base-, and sum-invariance. In each of these, a mathematical model of some observed feature of a Benford data set is created and investigated. Throughout, we’ll assume the data set consists of \(N\) independent realizations of some Benford random variable \(X\).

It’s easiest to explain “scale-invariance” with an example. One of the series that went into grandfather’s data set was “Rivers, Area.” He didn’t record in his paper the units in which area was measured; he might have used square miles, or square kilometers, or acres, or hectares, or something else. However, if Benford’s law is in fact a real feature of the universe, the units of this measurement shouldn’t matter. In other words, if a different unit of measurement had been used, he expected that the first digit law would still hold. Mathematically, this amounts to the following (easily proven) assertion: if \(X\) is Benford, then so is \(cX\) for any \(c>0\).

Next, we consider “sum-invariance.” As first observed by Mark Nigrini, in a large data set approximately obeying Benford’s law, the sums of the significands in each “significant digit class” are all approximately the same. To express this formally, I’ll use the following nonstandard notation for indicator functions: \[ I\left(x\in A\right)= \begin{cases} 1&\text{if }x\in A\text{,}\\ 0&\text{if}\,\,x\notin A. \end{cases} \] In other words, \(I\left(S\right)\) is the truth value of the statement \(S\). (A more conventional notation (e.g., Ash (p. 35) or Casella and Berger (p. 113)) for this would be \(I_A\left(x\right)\). Chung (p. 38) uses the notation \(1_A\left(x\right)\). Berger and Hill replace “\(1\)” in Chung’s notation with a symbol that I can’t replicate with my word processor, but it looks like a “blackboard bold” 1.) Berger and Hill show that Nigrini’s observation may be “mathematized” into the assertion that if \(X\) is Benford the expected value \[ \mathbb{E}\left[ S \left(X\right) I \left[ d\leq S\left(X\right) < d+1\right]\right] \] does not depend on the digit \(d\in\left\{1,\ldots,9\right\}\). More generally, we may show that if \(X\) is Benford the expected value \[ \mathbb{E} \left[ S \left(X\right) I \left[ a\leq S\left(X\right) < b\right]\right] \] (where \(1\leq a < b\leq 10\)) depends only on the length of the interval \(b-a\) and not on the starting point \(a\). This too is easy to prove.

Finally, we turn to “base-invariance.” The use of 10 as the basis of our positional number system is an anatomical accident and arbitrary. We might just as well have used 8, or 12, or 16, or 20, or, in fact, any integer \(b\geq 2\). As my grandfather says,

If the view is accepted that phenomena fall into geometric series, then it follows that the observed logarithmic relationship is not a result of the particular numerical system, with its base, 10, that we have elected to use. Any other base, such as 8, or 12, or 20, to select some of the numbers that have been suggested at various times, would lead to similar relationships.

Similarly, in a discussion of explanations for Benford’s law, Raimi says that “every argument that applies to 10 applies to \(b\) mutatis mutandis.” I don’t think my grandfather (or anyone else until recently) actually counted the first digit frequencies of data written with alternative bases, but at my suggestion Alex Kossovsky (personal communication) modified one of his Excel programs to do this, and found, for a variety of bases \(b\), that the base \(b\) modification of eq. (1) fitted the data very well, both with real data sets and for numbers created by simulating a log-normal random variable.

The Berger-Hill formulation of “base-invariant significant digits,” however, bears no resemblance to the concept described above. They prove that any Benford object satisfies their concept of base-invariance, but in fact it is quite easy to find examples of base 10 Benford objects that are not Benford in alternative bases. For example, the sequence \(\left(8^n\right)\) is Benford in base 10, but not in base 8. For a second example, let \(X\) have a logarithmic distribution, so \(X\) is Benford in base 10. Numbers between 8 and 10 have first digit equal to 1 when written in base 8, so \[ \Pr\left(D_1^{(8)}=1\right)=\Pr\left(X\in[1,2)\cup[8,10)\right)=\frac{1}{\ln 10}\left(\ln 2+\ln\frac{5}{4}\right)\approx 0.39794. \] On the other hand, \[ \log_8\left(1+\frac{1}{1}\right)=\frac{\ln 2}{\ln 8}=\frac{1}{3}, \] so \(X\) does not satisfy the first digit law in base 8. In summary, regardless of whatever virtues their notion of base-invariance may have, it doesn’t seem to correspond very well with the intuitive notion. In light of Kossovsky’s results, it’s clear that further work is required in this aspect of Benford’s law.

Up to this point I’ve summarized what I see as the highlights of Chapters 1 through 5 of Berger and Hill’s book. In Chapter 6 they go on to explore the Benford properties of sequences determined iteratively, chaotic systems, and dynamic systems governed by differential equations. In Chapter 7 they extend the analysis to sequences of matrices (with applications to Markov chains and to systems governed by higher order difference equations), and to systems governed by higher order linear differential equations. In Chapter 8 they examine the Benford properties of sequences of random variables and mixtures of distributions. Chapter 9 discusses the application of finitely additive probability (as opposed to the standard countably additive theory of probability) to the study of Benford’s law. Chapter 10 discusses some recent applications of Benford’s law.

I have one final, non-serious, complaint about the book: in some sense the title may be accused of false advertising as there is very little of an introductory nature in this book. The prerequisites to reading it profitably include a thorough grounding in measure theoretic probability theory. A thorough grounding in Fourier analysis is needed to fully comprehend some parts, and ergodic theory is used at points. It seems to me that the book is really aimed at graduate students looking for a dissertation topic, and at professional mathematicians who want to have a convenient one volume summary of what is currently known about Benford’s law.

References

Ash, Robert (1972). Real Analysis and Probability. Academic Press.

Benford, Frank (1938). “The Law of Anomalous Numbers.” Proceedings of the American Philosophical Society.

Casella, George, and Roger Berger (2002). Statistical Inference (2nd. ed.). Wadsworth Group.

Chung, Kai Lai (1974). A Course in Probability Theory (2nd. ed.). Academic Press.

Raimi, Ralph (1976). “The First Digit Problem.” American Mathematical Monthly, Vol. 83, No. 7.

Frank Benford is a consulting applied mathematician. The website for his firm is BenfordAppliedMath.com.

Preface vii
1 Introduction 1
1.1 History 3
1.2 Empirical evidence 4
1.3 Early explanations 6
1.4 Mathematical framework 7
2 Significant Digits and the Significand 11
2.1 Significant digits 11
2.2 The significand 12
2.3 The significand s-algebra 14
3 The Benford Property 22
3.1 Benford sequences 23
3.2 Benford functions 28
3.3 Benford distributions and random variables 29
4 The Uniform Distribution and Benford's Law 43
4.1 Uniform distribution characterization of Benford's law 43
4.2 Uniform distribution of sequences and functions 46
4.3 Uniform distribution of random variables 54
5 Scale-, Base-, and Sum-Invariance 63
5.1 The scale-invariance property 63
5.2 The base-invariance property 74
5.3 The sum-invariance property 80
6 Real-valued Deterministic Processes 90
6.1 Iteration of functions 90
6.2 Sequences with polynomial growth 93
6.3 Sequences with exponential growth 97
6.4 Sequences with super-exponential growth 101
6.5 An application to Newton's method 111
6.6 Time-varying systems 116
6.7 Chaotic systems: Two examples 124
6.8 Differential equations 127
7 Multi-dimensional Linear Processes 135
7.1 Linear processes, observables, and difference equations 135
7.2 Nonnegative matrices 139
7.3 General matrices 145
7.4 An application to Markov chains 162
7.5 Linear difference equations 165
7.6 Linear differential equations 170
8 Real-valued Random Processes 180
8.1 Convergence of random variables to Benford's law 180
8.2 Powers, products, and sums of random variables 182
8.3 Mixtures of distributions 202
8.4 Random maps 213
9 Finitely Additive Probability and Benford's Law 216
9.1 Finitely additive probabilities 217
9.2 Finitely additive Benford probabilities 219
10 Applications of Benford's Law 223
10.1 Fraud detection 224
10.2 Detection of natural phenomena 225
10.3 Diagnostics and design 226
10.4 Computations and Computer Science 228
10.5 Pedagogical tool 230
List of Symbols 231
Bibliography 234
Index 245