You are here

Introduction to Linear Models and Statistical Inference

John Wiley
Number of Pages: 

Introductory courses in statistics range from the course dealing with conceptual understandings of statistics without the use of any formulas or derivations, to the most abstract mathematical statistics course. In the first type, students get very little understanding of the basis of statistics, while in the second type they get very little sense of applications of statistical theory. This book falls in the middle of these two extremes.

The book consists of two parts. The first part (217 pages) is a standard introduction to descriptive statistics, random variables and statistical inference. The second part (323 pages) takes up the linear model with one, two and several independent variables. There is much more material than can be comfortable covered in one semester, and had the work been presented in two volumes, students coming from a variety of introductory courses could have used the second part nicely alone.

The first test of a textbook is to examine the exercises. Here there are many, but several commit the cardinal sin of asking the students to compute something. Period. There is no final question about the meaning of what has been computed. The second test is to see whether the authors are up on current trends in statistical education. The StatEd section of the American Statistical Association is very active, and the international conference on statistical education held every four years has produced excellent proceedings. The book seems to fail the second test. For one thing, students now are much more visually stimulated by MTV and other videos, and the book would have benefited from a much livelier layout where definitions would stand out in boxes and uses of a variety of fonts, maybe even a dash of color here and there.

There are too many irritating moments. For example, why bother with the pdf of the chi-square distribution since that formula does not seem to be ever used anywhere, why not the F distribution also? The study of observational data is not an experiment. Using state data to regress the number of births on population size does not give a regression coefficient that tells us anything about what would happen if we changed the population of a state. Instead, the coefficient tells us that if we chose two states that differ by one population unit, then their births will on the average differ by the size of the regression coefficient. Why use Greek letters to denote population mean and variance and not for the population regression intercept and slope? Why not say that a significance level tells us how often we would get data from a population where the null hypothesis is true, instead of the probability of rejecting a true null hypothesis. “The sample mean is really a proportion…” (p. 159). I thought it was the other way around, with the proportion being a special case of the mean. Why not say that the way to interpret a confidence interval is if the study was repeated a large number of times, then 95% of the intervals would contain the true parameter value. Whether our particular interval belongs to the large set of many intervals that contain the parameter or the small set of intervals that do not contain the true parameter value is anyone’s guess. That way, students may actually have a chance to understand what is going on instead of turning the interval around and saying that we are 95% confident that our one interval contains the true population value. Whatever that means?

The second half is more enjoyable than the first. Others have written better introductions to statistics, while the second half is a very solid introduction to linear models. It includes nice discussions of regression diagnostics and model building, together with optional sections on the use of linear algebra. But why miss the opportunity of representing the three sums of squares by a right triangle with the length of the hypotenuse as the square root of the total sum of square and the lengths of the other sides as the square roots of the regression and the residual sums of squares and thereby the correlation coefficient as the cosine of an angle?

Gudmund R. Iversen holds a PhD in statistics from Harvard University and is Professor Emeritus of Statistics at Swarthmore College where he taught statistics for many years.

Date Received: 
Wednesday, August 17, 2005
Include In BLL Rating: 
Steven J. Janke and Frederick C. Tinsley
Publication Date: 
Gudmund R. Iversen


Introduction: Statistical Questions.

1. Data: Plots and Location.

2. Data: Dispersion and Correlation.

3. Random Variables: Probability and Density.

4. Random Variables: Expectation and Variance.

5. Statistical Inference.

6. Simple Linear Models.

7. Linear Model Diagnostics.

8. Linear Models: Two Independent Variables.

9. Linear Models: Several Independent Variables.

10. Model Building.

11. Extended Linear Models.

Appendix A: Data References.

Appendix B: MINITAB Reference.

Appendix C: Introduction to Linear Algebra.

Appendix D: Statistical Tables.



Publish Book: 
Modify Date: 
Wednesday, October 19, 2005