A Selection of Problems from A.A. Markov’s Calculus of Probabilities: Problem 3 – Some Generalizations of Problem 2

Author(s): 
Alan Levine (Franklin and Marshall College)

 

This problem contains five generalizations of the previous problem. The first two are fairly straightforward, but later solutions become increasingly complex. Markov makes some interesting approximations involving Taylor series. There are ample opportunities for the reader to fill in details.

 

Задача 3ья. Из сосуда, содержащего \(n\) билетов с нумерами \(1, 2, 3, \dots, n\) и никаких других, вынимают одновременно \(m\) билетов, что мы назовём первым тиражем.

Затем вынутые билеты возвращают в сосуд и производят подобный же второй тираж m билетов. По окончании второго тиража вынутые билеты возвращают также в сосуд и производят третий тираж \(m\) билетов и т.д.

Требуется при \(k\) таких тиражах определить:

1) вероятность, что \(i\) определённых нумеров не появятся;

2) вероятность, что \(i\) определённых нумеров не появятся, а другие \(l\) определённых нумеров появятся;

3) вероятность, что \(l\) определённых нумеров появятся;

4) вероятность, что появятся только \(l\) определённых нумеров;

5) вероятность, что появятся все нумера.

3rd Problem. From a vessel containing \(n\) tickets with numbers \(1, 2, 3, \dots, n\), and no others, we select simultaneously \(m\) tickets, which we call the first draw.

Then the chosen tickets are returned to the vessel and we produce a similar second draw of \(m\) tickets. Upon finishing the second draw, the chosen tickets are again returned to the vessel and we produce a third draw of \(m\) tickets, etc.

It is required to determine, for \(k\) such draws:

1) the probability that \(i\) specific numbers do not appear;

2) the probability that \(i\) specific numbers do not appear, but another \(l\) specific numbers do appear;

3) the probability that \(l\) specific numbers appear;

4) the probability that only \(l\) specific numbers appear;

5) the probability that all numbers appear.

 

Continue to Markov's solution of Problem 3, Parts 1 and 2.

Continue to Markov's solution of Problem 3, Parts 3, 4, and 5.

Skip to statement of Problem 4.

 

A Selection of Problems from A.A. Markov’s Calculus of Probabilities: Problem 3 – Solution to Parts 1 and 2

Author(s): 
Alan Levine (Franklin and Marshall College)

 

Review statement of Problem 3.

 

Solution:   For brevity, let \[\left\lbrace \frac{p(p-1) \cdots (p-m+1)}{1\cdot2\cdot\cdots\cdot m}\right\rbrace^k = Z_p ,\] for any number \(p\).

On each draw, the number of selected tickets can represent any set of \(m\) numbers from all \(n\) numbers \(1, 2, \dots, n\).

Corresponding to this, on one draw we distinguish \(\frac{n(n-1)\cdots(n-m+1)}{1\cdot2\cdot\cdots\cdot m}\) equally likely cases, and on \(k\) draws, we distinguish  \(\left\lbrace \frac{n(n-1)\cdots(n-m+1)}{1\cdot2\cdot\cdots\cdot m}\right \rbrace^k = Z_n\) equally likely cases.9

Each of the preceding exhaustive and disjoint cases consists of the appearance of \(k\) specific sets of \(m\) numbers on the \(k\) draws considered.

Establishing in this way those cases that we will consider, and showing the total number of them, we proceed to determine the probabilities of the events referred to in the problem by calculating the number of cases favorable to them.

If \(i\) specific numbers \(\alpha_1, \alpha_2, \dots, \alpha_i\) do not appear, then for one draw, instead of  \(\frac{n(n-1)\cdots(n-m+1)}{1\cdot2\cdot\cdots\cdot m}\), there remain \(\frac{(n-i)(n-i-1)\cdots(n-i-m+1)}{1\cdot2\cdot\cdots\cdot m}\) cases, and for \(k\) draws, we have  \(\left\lbrace\frac{(n-i)(n-i-1)\cdots(n-i-m+1)}{1\cdot2\cdot\cdots\cdot m}\right\rbrace^k = Z_{n-i}\) cases, instead of \(\left\lbrace \frac{n(n-1)\cdots(n-m+1)}{1\cdot2\cdot\cdots\cdot m}\right \rbrace^k = Z_n\).

Thus, the probability that, for the \(k\) draws considered, \(i\) specific numbers do not appear is given by the fraction \[\frac{Z_{n-i}}{Z_n} = \left\lbrace\frac{(n-i)(n-i-1)\cdots(n-i-m+1)}{n(n-1)\cdots(n-m+1)}\right\rbrace^k.\]

Then the number of cases for which the \(i\) specific numbers \(\alpha_1, \alpha_2, \dots, \alpha_i\) do not appear and one specific number \(\beta_1\) does appear can be expressed by the difference \(\Delta Z_{n-i-1} = Z_{n-i} - Z_{n-i-1}\), where according to what we just said, \(Z_{n-i}\) represents the number of cases in which the numbers \(\alpha_1, \alpha_2, \dots, \alpha_i\) do not appear, and \(Z_{n-i-1}\) the number of those cases in which, besides the numbers \(\alpha_1, \alpha_2, \dots, \alpha_i\), the number \(\beta_1\) also does not appear.

In a similar manner, the number of cases in which the numbers \(\alpha_1, \alpha_2, \dots, \alpha_i\) do not appear and two specific numbers do appear can be expressed by the second difference \(\Delta^2 Z_{n-i-2} = \Delta Z_{n-i-1} - \Delta Z_{n-i-2}\), where \(\Delta Z_{n-i-1}\) represents the number of all cases in which the number \(\beta_1\) does appear and the numbers \(\alpha_1, \alpha_2, \dots, \alpha_i\) do not appear, and \(\Delta Z_{n-i-2}\) the number of those cases in which, besides \(\alpha_1, \alpha_2, \dots, \alpha_i\), the number \(\beta_2\) also does not appear.

In view of the possibility of continuing similar reasoning, it is not difficult to conclude that, in general, the number of cases in which \(i\) specific numbers do not appear and another \(l\) specific numbers do appear can be represented by the \(l\)th order difference \(\Delta^l Z_{n-i-l}\), which is equal to: \[Z_{n-i} - \frac{l}{1}Z_{n-i-1} + \frac{l(l-1)}{1\cdot2} Z_{n-i-2} - \cdots \pm Z_{n-i-l}.\]

Then the probability that, in the \(k\) draws considered, \(i\) specific numbers do not appear and another \(l\) specific numbers do appear is equal to10 \[\frac{\Delta^l Z_{n-i-l}}{Z_n}.\]

 

Continue to Markov's solution of Problem 3, Parts 3, 4, and 5.

Skip to statement of Problem 4.

 


[9] In other words, \(Z_p = \binom{p}{m}^k\) for any \(p\), so that \(Z_n = \binom{n}{m}^k\).

[10] For example, suppose \(n = 10, m = 3,\) and \(k = 5\). The total number of outcomes is \(Z_{10}=\binom{10}{3}^5=120^5\). For question (1), let \(i = 2\), so that we consider the number of selections without two numbers \(\alpha_1\) and \(\alpha_2\) in any of the 5 draws. There are \(Z_8=\binom{8}{3}^5=56^5\) such selections. Hence, the probability that \(\alpha_1\) and \(\alpha_2\) do not appear is \(\frac{Z_8}{Z_{10}}=\frac{56^5}{120^5} \approx 0.02213\). For question (2), the number of selections without \(\alpha_1\) and \(\alpha_2\) and containing another number \(\beta_1\) is: \[\Delta Z_7 = Z_8 - Z_7 = \binom{8}{3}^5-\binom{7}{3}^5 = 498~209~901.\] Of these, \[\Delta Z_6 = Z_7 - Z_6 = \binom{7}{3}^5 - \binom{6}{3}^5 = 49~321~875\] do not contain some other number \(\beta_2\). So, the number of selections without \(\alpha_1\) and \(\alpha_2\) and containing \(\beta_1\) and \(\beta_2\) is: \[\Delta^2 Z_6 = \Delta Z_7 - \Delta Z_6 = Z_8 - 2 Z_7 + Z_6 = 448~888~026,\] and the probability of this event is \(\frac{\Delta^2 Z_6}{Z_{10}} = \frac{448~888~026}{120^5} \approx 0.01804\). Similarly, \(\frac{\Delta^3 Z_5}{Z_{10}} = \frac{Z_8 - 3Z_7 + 3Z_6 - Z_5}{120^5} \approx 0.01618\).

 

A Selection of Problems from A.A. Markov’s Calculus of Probabilities: Problem 3 – Solution to Parts 3, 4 and 5

Author(s): 
Alan Levine (Franklin and Marshall College)

 

Review statement of Problem 3.

 

The other probabilities referred to in the problem represent three particular cases of the probability just found and, therefore, can be gotten from the expression \(\frac{\Delta^l Z_{n-i-l}}{Z_n}\) for particular assumptions about \(i\) and \(l\): \[\mathrm{1)}\  i=0, \ \ \ \ \mathrm{2)}\  i=n-l,\ \ \ \ \mathrm{3)}\  i=0, l=n.\]

Letting \(i = 0\), we get the following expression for the probability of the appearance of \(l\) specific numbers: \[\frac{\Delta^l Z_{n-l}}{Z_n} = 1 - \frac{l}{1}\left(\frac{n-m}{n}\right)^k + \frac{l(l-1)}{1\cdot2} \left(\frac{n-m}{n}\right)^k \left(\frac{n-m-1}{n-1}\right)^k - \cdots \]

Letting \(i = n - l\), we find that the probability of the appearance of \(l\) specific numbers and the nonappearance of the remaining is equal to:

\[\begin{align} \frac{\Delta^l Z_0}{Z_n} &= \frac{Z_l}{Z_n} - \frac{l}{1}\frac{Z_{l-1}}{Z_n} + \frac{l(l-1)}{1 \cdot 2} \frac{Z_{l-2}}{Z_n} - \cdots \\ &= \left(\frac{n-m}{n}\right)^k \left(\frac{n-m-1}{n-1}\right)^k \cdots \left(\frac{l-m+1}{l+1}\right)^k \left\lbrace 1 - \frac{l}{1} \left(\frac{l-m}{l}\right)^k + \cdots \right\rbrace. \end{align}\]
 Finally, the probability of the appearance of all \(n\) numbers is equal to:
\[\frac{\Delta^n Z_0}{Z_n} = 1 - \frac{n}{1}\left(\frac{n-m}{n}\right)^k + \frac{n(n-1)}{1\cdot2} \left(\frac{n-m}{n}\right)^k \left(\frac{n-m-1}{n-1}\right)^k - \cdots.\]

Stopping at the last formula and noting that it requires tedious calculation for large values of \(n\), we derive two approximate formulas for it.

For getting the first approximate formula, assume that all numbers \(\left(\frac{n-m-1}{n-1}\right)^k\), \(\left(\frac{n-m-2}{n-2}\right)^k, \dots\), are equal to \(\left(\frac{n-m}{n}\right)^k\) which, for brevity, we denote by the letter \(t\).

Under this assumption, the indicated formula immediately gives us: \[\frac{\Delta^n Z_0}{Z_n} \approx (1-t)^n,\] where \(t = \left(\frac{n-m}{n}\right)^k\).

For the second approximation, we note that for small values of \(i\), the ratio \(\left(\frac{n-m-i}{n-i}\right)^k : \left(\frac{n-m}{n}\right)^k \), which is equal to \(\left\lbrace 1 - \frac{im}{(n-i)(n-m)}\right\rbrace^k\), differs little from \(1 - \frac{kim}{n(n-m)}\), and the product \[\left(1 - \frac{km}{n(n-m)} \right) \left(1 - \frac{2km}{n(n-m)} \right) \cdots \left(1 - \frac{ikm}{n(n-m)} \right) \] differs little from \[1-\frac{km(1+2+\cdots+i)}{n(n-m)} = 1 - \frac{kmi(i+1)}{2n(n-m)}.\]

On this basis, we accept \(t^{i+1}\left(1 - \frac{kmi(i+1)}{2n(n-m)}\right)\) as the approximate value of each product \[\left(\frac{n-m}{n}\right)^k \left(\frac{n-m-1}{n-1}\right)^k \cdots \left(\frac{n-m-i}{n-i}\right)^k.\]

Substituting this approximate expression in place of the exact one in the formula, we get \[\frac{\Delta^n Z_0}{Z_n} \approx (1-t)^n - \frac{kmt^2(n-1)}{2(n-m)}(1-t)^{n-2}\approx (1-t)^n\left\lbrace1-\frac{kmt^2}{2}\right\rbrace,\] since we assume the numbers \(\frac{n-1}{n-m}\) and \(\frac{1}{(1-t)^2}\) are close to 1.

We apply our approximate formula to finding the number of draws under the hypothesis that the probability of the appearance of all numbers was approximately equal to a given number \(\frac{1}{C}\).

The first approximation formula gives \((1-t)^n \approx \frac{1}{C}\) from which we deduce \(n \log(1-t) \approx -nt \approx -\log C\); but \(t = \left(\frac{n-m}{n}\right)^k\) and, thus, \(\log t = k \log\left(1- \frac{m}{n}\right) \approx - \frac{km}{n}\).11

Comparing the approximate equations \(-nt \approx -\log C\) and \(\log t \approx -\frac{km}{n}\), we find \[k \approx \frac{n(\log n- \log \log C)}{m} .\]

In the subsequent calculation, let \(\frac{\log C}{n} = t_0\) and \(\frac{n(\log n- \log \log C)}{m} = k_0\), so that \(t_0\) and \(k_0\) are the approximate values of the numbers \(t\) and \(k\).

The second approximate expression of the probability gives \((1-t)^n \left(1-\frac{kmt^2}{2}\right) \approx \frac{1}{C}\), from which, by performing the approximate calculations, we deduce
\[\log C \approx nt + \frac{nt^2}{2} + \frac{kmt^2}{2} \approx nt  + \frac{(n+k_0 m) t_0^2}{2},\]
\[t \approx \frac{\log C}{n} \left[ 1-\frac{(n+k_0 m)t_0^2}{2 \log C}\right],\]
and then \[-\log t \approx \log n- \log \log C + \frac{(n + k_0 m) t_0^2}{2 \log C}.\]

On the other hand, we have \[ -\log t = -k \log\left(1-\frac{k}{m}\right) \approx \frac{km}{n} + \frac{k_0 m^2}{2n^2}.\]

Finally, equating one approximate expression for $\log t$ to the other, we arrive at the approximate equation \[ \frac{km}{n} + \frac{k_0 m^2}{2n^2} \approx \log n - \log \log C + \frac{(n + k_0 m) t_0^2}{2 \log C},\] from which we easily deduce
\[\begin{align} k &\approx \frac{n}{m} \left\lbrace \log n - \log \log C + \frac{k_0 m}{2 n^2}(\log C - m) + \frac{1}{2n} \log C\right\rbrace \\
    &\approx \frac{1}{m} \left\lbrace (\log n - \log \log C) \left( n + \frac{1}{2} \log C - \frac{m}{2} \right) + \frac{1}{2} \log C\right\rbrace.
\end{align}\]

For example, let us assume \(n= 90,\ m=5,\ C=2\).
 
Then

\(\log n = 4.4998\dots\), \(\log C = 0.69314\dots\),
\(\log \log C = -0.3665\dots\), \(n + \frac{1}{2} \log C - \frac{m}{2} = 87.84657\dots\). 

and, performing a simple calculation by the previous approximate formula, we get \[k = \frac{4.8663 \times 87.8466 + 0.346}{5} \approx 85.5.\]

According to this result, it can be verified that the probability of the appearance of all 90 numbers in 85 draws is somewhat less than one-half, and for 86 draws is already bigger than one-half.

 

Continue to statement of Problem 4.

 


[11] Here, Markov used “log” to denote natural (base \(e\)) logarithms. Later, he used the same notation for common (base 10) logarithms!

The Taylor series for \(\ln(1-t)\) is \(-t -\frac{t^2}{2} - \frac{t^3}{3} - \cdots \approx -t\), for small values of t. (The series converges for \(-1 \leq t < 1\).)  Later, he included the quadratic term for a better approximation.