You are here

Truth or Truthiness

Howard Wainer
Publisher: 
Cambridge University Press
Publication Date: 
2015
Number of Pages: 
232
Format: 
Hardcover
Price: 
27.95
ISBN: 
9781107130579
Category: 
General
[Reviewed by
Bill Satzer
, on
06/14/2020
]
The concept of truth has taken rather a hard beating over the last few years. Howard Wainer has some suggestions for restoring it. He is a statistician and an author whose writing over the years has consistently addressed the need to apply statistics and data analysis with care and present the results with clarity and integrity.
 
Wainer takes his definition of truthiness from Stephen Colbert who says that it is a quality characterizing a “truth” that a person uses in an argument and “claims to know intuitively ‘from the gut’ or because ‘it feels right’ without regard to evidence, logic, intellectual examination, or facts”. Wainer also notes that ideas that lean on truthiness are sometimes called “rapid ideas”; he says that means “they only make sense if you say them fast”.
 
The book has three parts: thinking like a data scientist, communicating like a data scientist, and applying data science tools to education. While this work will not make anyone a data scientist, it might encourage readers to be more attentive to the evidence behind statements that they hear or read, more likely to ask questions, and perhaps more skeptical overall. Wainer’s approach is to provide some general guidance and a lot of examples in the form of loosely connected case studies. 
 
Each chapter is designed to suggest something of the way a data scientist thinks, and how to begin to approach what appear to be very challenging questions. Underlying the whole book are some critical ideas about evidence and its role in science. Wainer calls out some essential components. These include making hypotheses explicit, developing sound evidence to test these hypotheses, and ensuring reproducibility.
 
Some of the several examples and more detailed case studies that Wainer presents stand out as particularly worthwhile. He discusses how studies with missing data might be handled, and he notes that how they are handled can have a considerable effect on reported results. Careless or even devious treatment of missing information can seriously bias the results. He describes one school district that used a pre-test and a post-test to measure learning across a school year. The superintendent arranged for the best students to miss the pre-test and the worst students to miss the post-test. Unsurprisingly, the measured learning achievement results looked especially good that year. Wainer provides some more nuanced approaches for handling missing data than simply ignoring them. One of the more important applications is with medical trials when people drop out before the trial is completed.
 
Wainer also addresses the issues of causal inference in several of his case studies. While controlled and randomized experiments remain the gold standard for efforts to ascribe causality, they are often completely impractical or unethical. Wainer describes how a strong case for causality can be made from observational studies when they are supported by independent information and basic science. He describes an analysis using observational data of the relationship between fracking and earthquakes that provides a fairly strong argument of the former causing the latter.
 
The second part of the book, called “Communicating Like a Data Scientist” is somewhat disappointing. Much of it deals with graphical presentation of results, and the examples are of limited scope. Wainer gives good general advice, but one might have wished for more varied examples. One unusual example in this part is a suggestion for how to engage empathy in conveying the results of genetic testing.
 
The remainder of the book discusses using the tools of data science in education. One of the case studies here looks at the length of exams (such as the ones for college or professional school entrance, or licensing exams). Wainer makes the case that many, perhaps most, of these exams are too long and could be shortened considerably without influencing their reliability.
 
This is an amusing book with some good general guidance about identifying, using, and presenting evidence in a variety of contexts. It might be best used as a source of examples for a basic data analysis course.

 

Bill Satzer (bsatzer@gmail.com), now retired from 3M Company, spent most of his career as a mathematician working in industry on a variety of applications ranging from speech recognition and network modeling to optical films and ceramic fiber-reinforced composites. Along the way he learned more about ceramics and alloys of aluminum than he could have imagined in graduate school. He did his PhD work in dynamical systems.