Regression to the Mean: A Mini-Primary Source Project for Statistics Students

Author(s): 
Dominic Klyve (Central Washington University)

One of the greatest fears in sports performance is that, following a very successful run (of games, shots, field goal kicks, etc.), an individual or team may be “jinxed” and will revert to more average performance. In baseball this has happened so often to second-year players that there is a name for it: the feared “sophomore slump.”

On a seemingly unrelated note ...

Oft-repeated in history books is the lamentation that the progeny of a great man (it does usually seem to be a man in these books) failed to live up to the talent / hard work / brilliance of their parent. The great founding king gives way to a middling prince. Or, in a less regal example, American historian Paul Nagel has described how four generations of the Adams family betrayed an inexorable “Descent from Glory” [Nagel].

Each of these phenomena (from quite unrelated fields) can be explained by the same underlying principle, which is today known as regression to the mean. It states that, given a repeated set of observations, very extreme values will be followed by less extreme ones. This tendency was first noted by Englishman Francis Galton in 1886. Galton seems not only to have been the first person to pose the question of why regression occurs, but in the same paper, he became the first to give an answer. The mini-Primary Source Project presented here offers students the opportunity to learn about regression to the mean by reading from his pioneering paper on the topic.

Francis Galton (1822–1911) himself is remembered for many things, but for our purposes it may be best to think of him as one of history’s all-time-champion measurers. Nothing that could be quantified escaped his interest, and he was unafraid to use his measurements to draw sweeping conclusions about the world. He studied the lifespan of monarchs and concluded, on the assumption that more people pray for them than pray for average people, that prayer is ineffective in prolonging life. He collected weather data and constructed the first weather map, discovering the phenomenon of “anticyclones” in the process. He even created a “beauty map” of Britain by walking through different cities and making secret records of the attractiveness of women he passed using a device that he called, seemingly with no sense of irony, a “pricker” [Holt].

With an inheritance left to him by his father, Galton gave full rein to his desire to measure things, setting up an “anthropometric laboratory” in London in 1884. He advertised widely, and “thousands of people streamed in and obligingly submitted to measurement of their height, weight, reaction time, pulling strength, color perception, and so on” [Holt, p. 57].


Sir Francis Galton, 1890s
Photograph by Eveleen Myers (née Tennant), Public Domain


Galton's 1861 Weather Map
Public Domain

In our age of “big data,” many people believe that the company or individual with the best data will have a competitive advantage. More than a century ago, Francis Galton wanted to take many measurements because he believed they would tell him something interesting. Happily, he was correct; with access to an unprecedented amount of anthropometric data, he began to draw conclusions that would otherwise have been impossible, and he thereby became the first person to note the phenomenon of regression.

In the Primary Source Project Regression to the Mean, students read parts of Galton’s paper, “Regression Towards Mediocrity in Hereditary Stature” [Galton], to explore the phenomenon and the cause of regression. They are given the opportunity to reason from Galton’s original data of parental and offspring heights, and to explore probabilistic thinking in this setting. Via an optional section of the project, students can also explore errors in reasoning that arise in our world today from a lack of understanding of regression.

The complete project Regression to the Mean (pdf file) is ready for student use, and the LaTeX source code is available from the author by request. A set of instructor notes that explain the purpose of the project and guide the instructor through the goals of each of the individual sections is appended at the end of the student project.

This project is the thirteenth in A Series of Mini-projects from TRIUMPHS: TRansforming Instruction in Undergraduate Mathematics via Primary Historical Sources that is planned for publication in Convergence, for use in courses ranging from first-year calculus to analysis, number theory to topology, and more. Links to other mini-PSPs in the series appear below, including the statistics mini-PSP Seeing and Understanding Data. The full TRIUMPHS collection also offers a more extensive “full-length” PSP for use in teaching the p-value.

Acknowledgments

The development of the student project Euler's Calculation of the Sum of the Reciprocals of the Squares has been partially supported by the TRansforming Instruction in Undergraduate Mathematics via Primary Historical Sources (TRIUMPHS) project with funding from the National Science Foundation’s Improving Undergraduate STEM Education Program under Grants No. 1523494, 1523561, 1523747, 1523753, 1523898, 1524065, and 1524098. Any opinions, findings, and conclusions or recommendations expressed in this project are those of the author and do not necessarily reflect the views of the National Science Foundation.

References

Galton, Francis. "Regression towards mediocrity in hereditary stature." The Journal of the Anthropological Institute of Great Britain and Ireland 15 (1886): 246–263.

Holt, Jim. “Sir Francis Galton, the Father of Statistics … and Eugenics." In When Einstein Walked with Gödel: Excursions to the Edge of Thought, pp. 51–68. Farrar, Straus and Giroux, 2018.

Nagel, Paul C. Descent from Glory: Four generations of the John Adams family. Harvard University Press, 1999.