R is a programming language for statistics. Julian Faraway appears to have an excellent reputation within the R community, at least if we judge by how often his work is cited approvingly. The work at hand seems to bear this out, and suggests that he offers sage advice, not only on how to code things in R, but also on what to code. To explain that assessment, we need first to explain what this book is about.

Many MAA members will have taught or taken an introductory statistics course. There one might study “regression,” which in that context generally means fitting lines to bivariate data. Students often get the impression that the process is called “linear” because we fit lines, but statisticians generally means something else by “linear” here. The idea is that the equations we have to solve to find the slope and intercept are linear. A good point of reference here might be a exercise commonly set whenever students are learning to solve systems of linear equations in more than two variables. The student might be given the coordinates of three points in the plane and asked to find the equation \(y=ax^2+bx+c\) of a parabola that passes through those points. The student then plugs the coordinates of each point in turn into the quadratic template and obtains three linear equations in \(a\), \(b\) and \(c\).

A statistician would consider fitting a quadratic (or any polynomial) to data an instance of linear regression because the equations we solve for the coefficients are linear. (Of course, a student in a first course rarely sees these equations — only a formula for the solution.) The real next step is not fitting curves, but fitting data in higher dimensions. This is much more complicated, and generally takes up an entire course in (linear) multiple regression. This book covers topics beyond that course. In addition to assuming the reader has taken an introductory statistics course, and a regression course, the author assumes the reader is comfortable with using the vocabulary and notation of matrix algebra, though of course R will handle all the computations. R is a good choice here as it could be hard to find another program that covers all the many techniques discussed in this book. No prior knowledge of R is assumed, but experience with some programming language will be very helpful.

Faraway groups his extensions of multiple regression into three classes. For the sake of brevity, this review will simplify and paraphrase what those classes are, we hope without misrepresenting the author’s intent. The first generalization is to the assumption in all of the foregoing that, for each fixed level of values of the independent variables, \(y\) is a random variable following a normal distribution. Perhaps the simplest example that does not meet that assumption is predicting a binary outcome such as the winner of the World Series, or the survival of a patient. Two outcomes is small compared to the possible values of a normal distribution which are of the cardinality of the continuum. The book covers many other sorts of failures of the normality assumption as well, and many remedies.

The second class of extensions is made up of situations where the observations are not independent. Classical examples are time series and repeated measures, such as a patient’s blood pressure taken at various times over the course of a year.

The final category is “nonparametric” regression. Often in statistics “nonparametric” is taken to mean that we make no assumptions about population parameters but here it means we do not even try to estimate parameters. As a simple example of this, fitting a quadratic equation to bivariate data involves assuming the true relation is indeed quadratic, and estimating the coefficients of the quadratic. But in many an engineering lab, a smooth curve may be fit to data by eye, perhaps with the aid of French curves. This means we have no equation for the curve, but we may be content with graphical predictions or interpolations. If that seems like a big disadvantage, bear in mind that fitting a parametric model requires that we know the right form of equation to fit. Nonparametric fitting lets the data determine the curve. This third class includes some data science methods such as regression trees and neural networks.

Now that we know what the book is about, let us examine what sort of book it is. It is probably not a textbook for the material covered, which is very diverse. Instead, it might profitably be used as a lab manual in conjunction with one or more textbooks on the methods. Numerous references are included in each chapter to classic texts and articles. The book also makes an excellent reference for the non-specialist in these areas. In addition to the references, the reader gets a brief summary of the main results and issues, followed by example R code for applying the methods. There is sufficient detail about R that one could figure out how to do these analyses oneself, but there is no systematic presentation of the language itself — just enough to deal with the situation du jour. For a student, the most useful part of this book may be the exercises, which could serve as a model for all other applied statistics textbooks. Rather than drill exercises, here the reader is typically asked to apply multiple methods to a real data set as a way of learning about both the methods and the data. The examples in the text are similar, with many data sets well chosen to make an important point.

We end with as assortment of minor issues. The paper makes it hard to write notes or highlight without showing through to the other side. The text would benefit greatly from the inclusion of color graphics. There are too many typos. The author does not always separate exploration from inference. (Classical inference is for testing hypotheses generated before the data are gathered. There is nothing wrong with exploring data to generate new hypotheses, but those hypotheses normally need to be tested on new data.) The writing style is generally clear but terse and sometimes choppy.

Despite some minor flaws, this book is highly recommended as a reference, lab manual, or source of examples to extend book learning to real situations.

After a few years in industry, Robert W. Hayden (bob@statland.org) taught mathematics at colleges and universities for 32 years and statistics for 20 years. In 2005 he retired from full-time classroom work. He contributed the chapter on evaluating introductory statistics textbooks to the MAA’s Teaching Statistics.