I have been afflicted with a chronic condition that ensures that at least once each year, usually in August or September, I will feel miserable. I am a Cubs fan. This year, the annual attack came a month later than usual, but it was much stronger than in past years. As I was recovering, I was asked to review Teaching Statistics Using Baseball, by Jim Albert. Heeding the old saying that you can prove anything with statistics, I sought to cheer myself up by searching for some sort of vindication for my years spent rooting for the lovable losers from Chicago, beyond the affirming regularity with which I get to say, "Wait 'til next year!"
Well, I'm still looking for the vindication, but this is a delightful book. Albert has been applying his love of baseball to his vocation of teaching statistics for some time now, and this text uses baseball as a framework to introduce and explore statistical topics. He has created a window through which the statistically-minded baseball fan can explore, explain, and debunk conventional wisdom concerning the national pastime. Albert uses this text for an introductory statistics course focused on baseball, but it's far more valuable as a resource for non-trivial applications and projects for any introductory statistics course, and as a gift for that baseball fan in your department.
Albert has fun with the layout of the book. All but the first of the nine innings, sorry, chapters, begin with a list of "What's on deck," followed by a series of case studies. Each deals with the topic of the chapter, each begins with a list of topics covered in the study, and all are connected in a logical flow. Each exercise set starts with a leadoff exercise involving Ricky Henderson, widely regarded as the greatest leadoff hitter of all time. The chapter topics follow a standard introductory statistics curriculum, moving from analyzing a single batch of data, to comparisons and relationships between data sets, topics in probability, statistical inference, and Markov chains. The case studies deal with questions like, "Is John Olerud Streaky?" and "How Important is a Run?" The years 1927, 1961, 1999, and 2001 are compared for the respective difficulty of the home run records set in those seasons, and the same is done for years of great batting averages. Which was the most impressive seasonal batting average: Tony Gwynn's .394 in 1994, George Brett's .390 in 1980, Rod Carew's .388 in 1977, or Ted Williams' .406 in 1941, the last seasonal average over .400? Albert's answer might surprise you.
Albert uses tools from a bygone era to introduce probability topics: statistically accurate tabletop games, like Start-O-Matic and All-Star. For those unfamiliar with such games, they used spinners, die rolls and printed cards to accurately recreate the performance of actual baseball players. For those with a predilection for these types of games, this is all great fun. I, myself, spent a good bit of time on one exercise which created statistically accurate spinners for several of the all-time great players, based on their lifetime stats. However, anyone who has tried to teach introductory probability to young adults using games like Monopoly, Risk, or Poker, has discovered that today's students aren't familiar with these types of games. Even baseball fans now get their gaming electronically, where the statistical information is hidden in the programming.
This is just one instance of a larger issue with using this book as a text. The effectiveness of using these topics to motivate statistical concepts depends a great deal on your students' interest in baseball, or the instructor's ability to motivate it. Some introduction to the game and its most common statistics is included, but it isn't enough to get people interested in the game if they aren't already. If you're intrigued by the batting average example above, you'll find the entire book entertaining. But not everyone cares about these kinds of questions, which does not recommend this book for a general introductory statistics course. Further, there are many items that are glossed over in the text, with the assumption that the reader already has the requisite skill or is getting information from some other source. For example, chapter 9 assumes facility with matrix and vector multiplication, and further complicates the matter by listing the top row of a 24 x 24 matrix as an 8 x 3 grid. More troubling from a statistical standpoint is that the terms "random" and "independent" are thrown around without any definitions, leaving the interpretation of the terms to the reader's instincts, or the instructor's presentation. And struggling students will find no boxed definitions or numbered formulas to help them through the exercises.
But the characteristics that fail to recommend it as a text are positive boons to an instructor or interested fan. It's easy to find what you're looking for, the chapters are relatively independent of one another, and the exercises, which nicely extend the ideas of the case studies, are laid out in the same order as the topics in the chapter. Albert has gone out of his way to help the reader obtain data sets. Several sources are listed and summarized, (including http://www.baseball-reference.com and http://www.baseball1.com), and the particular sets used in the case studies and homework are contained on the book's website at http://personal.bgsu.edu/~albert/teachball.htm. His datasets are very easy to download and manipulate. Some of the raw data sites involved a bit more work to get up and running, but Albert includes at least a general outline of what to do.
The biggest payoff for the baseball fan is the last chapter, in which an inning of baseball is modeled as a Markov chain. There are 25 states, with 8 different placements of runners on the bases for each of 0, 1 or 2 outs, and the absorbing state of 3 outs. Transitional probabilities are approximated using large amounts of actual data. Then, the values of different types of plays can be calculated and compared. These range form the mildly entertaining (does a home run really kill the rally?) to the convention-busting (are sacrifice bunts worth the sacrifice?). How reliable does a base stealer need to be in order to make an attempt worth the risk? These are new ways to look at an old game, and these fresh insights will have you yelling at your favorite team's manager with renewed vigor and statistical evidence that he's a fool.
This book is a wonderful blending of baseball and statistical topics, and anyone interested in both will find hours of enjoyment in its pages. It is highly recommended as a resource (though not as a text) for teaching introductory statistics, and as reading for the statistically inclined baseball fan.
For further information regarding rigorous statistical examination of baseball issues, consult another book by Jim Albert (with Jay Bennett), Curve Ball: Baseball, Statistics, and the Role of Chance in the Game, or contact the Society for American Baseball Research at http://www.sabr.org. They are the proponents of Sabermetrics, the practice of using proper mathematical tools to analyze the national pastime.
Steve Morics (Steven_Morics@redlands.edu) is Associate Professor of Mathematics at the University of Redlands in southern California. His interests include combinatorics, mathematics and politics, music, and cheering his beloved Cubs to the 2004 World Series championship.