Beyond Basic Statistics: Tips, Tricks, and Techniques Every Data Analyst Should Know

Kristin H. Jarman
John Wiley
The Basic Library List Committee suggests that undergraduate mathematics libraries consider this book for acquisition.

William J. Satzer
This book is a sequel of sorts to the author’s The Art of Data Analysis: How to Answer Almost Any Question Using Basic Statistics. That book aims to supplement the teaching of basic statistics by focusing on examples. In so doing, the author provides students the additional context they need to integrate and apply the basic tools and methods of statistics.

The current book explores more advanced topics in data analysis and assumes a modest background in statistics at about the level of the previous book. A fundamental principle in data analysis is that it’s very easy to get it wrong. The author approaches this artfully; she notes at least a couple of occasions when she — a veteran data analyst with several years of experience — went seriously wrong. What can happen? You can ask the wrong question. You can ask the right question, but inadvertently answer the wrong one. You can gather data that is wrong for the question you care about. You can use an inappropriate statistical technique. You can do everything else right but misinterpret the results.

After the introductory material each chapter concentrates on one data analysis question or tool by looking at just a single application. For example, one chapter uses nutrition and diet and discusses sampling strategies for gathering relevant data. In so doing it gently but quite effectively introduces ideas about the design of experiments and research methods. Other chapters consider: political polling with an emphasis on sample size calculations and statistical power; normality testing on the distribution of the lengths of Hollywood marriages; robust estimation of attendance at Sumo wrestling events in the US; chi-squared techniques for detecting cheating in a dice game; and nonparametric testing of the hypothesis that Godzilla is more popular than King Kong using fifty-six top-ten classic movie monster lists.

One of my favorites was the chapter on outlier detection that used the News of the Weird website data to identify states with reports of weirdness in the high outlier range. (Florida is prominent on the basis of both total population and per capita weird reports, but North Dakota, New Hampshire and Montana win outlier status on a per capita basis. Just for completeness it should be said that Alabama and Wyoming were lowest on the per-capita weirdness scale, but not outliers.)

The last chapter provides a very instructive warning to aspiring data analysts based on one of the author’s first experiences on the job. She was assigned to find a predictive relationship between three measured biomedical variables and the associated level of toxin in the blood. After examining the data, she found a complicated quadratic relationship that gave a very good fit. Too good, in fact: it was a classic case of overfitting. But her boss, a good mentor, was more cautious, encouraged a follow-up test, and saved her from a major embarrassment.

This is a consistently entertaining and instructive book. The stories are interspersed with serious background material, and the combination works very well. This would be entirely suitable as supplementary reading for a statistics course or for an independent reading project.

Bill Satzer ( is a senior intellectual property scientist at 3M Company, having previously been a lab manager at 3M for composites and electromagnetic materials. His training is in dynamical systems and particularly celestial mechanics; his current interests are broadly in applied mathematics and the teaching of mathematics.

