You are here

Modern Regression Methods

Thomas P. Ryan
Publisher: 
John Wiley
Publication Date: 
2008
Number of Pages: 
642
Format: 
Hardcover
Edition: 
2
Series: 
Wiley Series in Probability and Statistics
Price: 
115.00
ISBN: 
9780470081860
Category: 
Textbook
[Reviewed by
Martha K. Smith
, on
04/3/2009
]

I have taught regression analysis eleven times, and have never found a book I completely like [1]. At first I thought the book under review might be the one I had long hoped for, but I was disappointed. It has many strong points — but some weak ones as well, particularly for an undergraduate course.

Since this review is for a mathematics audience rather than a statistical one, I’ll start with my standard explanation-for-mathematicians of the difference between pure mathematics and (frequentist [2]) statistics. Here goes:

In pure mathematics, we are concerned with statements of the form “A implies B”. Our main goal is to prove them. We sometimes also want to discover them, and, of course, we use statements that are already proved to prove new ones. In statistics, many statements of this form are also relevant, but they are not as big a part of the process of doing statistics as they are of doing pure mathematics. The A parts of those statements that are useful in statistics typically list model assumptions — things like, “errors are normally distributed with the same variance for each group”. Doing statistics well requires asking (and answering, usually not definitively, but as best we can with the information available) questions of the following sorts:

  • Is A true in this particular context?
  • What does B tell us in this particular context? Does it tell us something useful?
  • How far can we depart from A and still get something close to B?
  • If A is not near enough to true in this particular context, is there an A' that is close enough to true in this context and that implies a B' that tells us something useful in this context?

Unfortunately, statistics is often not done well. Many users of statistics frequently ignore (typically because of ignorance of their importance) the above questions. The book under review squarely addresses these important questions from the start. In this respect, it is better than any other regression textbook I have seen. In addition, the author provides summaries of what techniques are available in the various statistical software packages for regression, provides a good collection of exercises illustrating points that are often misunderstood, and provides copious references to details not included in the book. For these reasons alone, I strongly recommend the book as a reference for anyone teaching or using regression.

So what are its weaknesses? Why would I not use it as a textbook? A major weakness of the book as a textbook is that, as it progresses through the chapters that the author suggests for an undergraduate course, it sounds more like a guide to the literature than an introductory textbook. This is appropriate for its title, but not for an undergraduate textbook (and not even for a textbook for a master’s level course).

I am also concerned that the author frequently says “obvious” or “obviously.” This is not a petty complaint, but a serious concern for an undergraduate textbook. First, “obvious” is a subjective term; what may seem obvious to the author may not seem obvious to the student. So using “obvious” is not good human relations when teaching. Even more serious is that students all too often think something is obvious when it warrants extensive justification — or even when it is not true. Indeed, the author often appropriately use phrases such as, “does not have the meaning that would seem to be self-evident” (p. 3) when such situations might arise. However, my experience is that when the instructor or text-book author uses “obvious,” students tend to pick up the phrase and use it when something is false or needs substantial explanation.

Thus my recommendation is: If you teach regression, be sure to buy this book, and read at least the first two chapters carefully. Use what you learn from the book to improve, when needed, on whatever textbook you use, and to motivate yourself to emphasize to your students that there is more about regression than can be covered in an undergraduate course. Use the software summaries to help you decide what software to use for your class. Keep the book handy to refer to when you encounter the inevitable limitations of your software, have other questions, or need a good additional example or exercise.

Even if you only teach simple regression as part of an overview class, being familiar with the material in the first two chapters of this book should help you be aware of the cautions that are required in using and interpreting even simple regression. You will also find the book helpful when colleagues from other departments come to you with questions (as will undoubtedly happen if you are a mathematician teaching regression, since if there were enough statisticians at your school, they would be teaching the course).

If students are likely to do independent research projects in statistics at your school, or if your school has a graduate program in any field that uses regression, be sure your library has a copy, too.

Although I don’t recommend the book as an undergraduate textbook, I think the author has done an admirable job of pointing out most of the intricacies of regression, providing a guide to regression methods going beyond the standard ones, reviewing available software (and providing regression macros for Minitab on the ftp site accompanying the book), and providing a wide variety of instructive exercises and examples.


Notes:

[1] I finally settled on the third textbook I tried, Cook and Weisberg’s Applied Regression Including Computing and Graphics, Wiley, 1999. However, I have reworked a lot of the exposition and added some other things appropriate to the particular course I teach, resulting over the years in lecture notes that are available at http://www.ma.utexas.edu/users/mks/384Gfa08/384G08syl.html. (The course I have taught is primarily a master’s level course, but an undergraduate course with slightly different assignments and exams meets with it.) The software (called arc) developed to accompany Cook and Weisberg’s text has many features that lend it well to use for teaching. Unfortunately, the software has not been supported, so that the Macintosh, and in some cases the Unix, versions are no longer readily usable. The Windows version is still viable, however.

[2] Bayesian statistics would involve a slightly more complicated explanation; this review concerns only frequentist statistics.


Martha Smith is a soon-to-be-retired Professor of Mathematics at the University of Texas at Austin. Visit her home page at http://www.ma.utexas.edu/users/mks/ for contact information and links to a variety of things. In a few months, there should be a link to a new website on Common Misteaks in Statistics.

 

Preface.

1. Introduction. 

1.1 Simple Linear Regression Model.

1.2 Uses of Regression Models.

1.3 Graph the Data!

1.4 Estimation of ß0 and ß1.

1.5 Inferences from Regression Equations.

1.6 Regression Through the Origin.

1.7 Additional Examples.

1.8 Correlation.

1.9 Miscellaneous Uses of Regression.

1.10 Fixed Versus Random Regressors.

1.11 Missing Data.

1.12 Spurious Relationships.

1.13 Software.

1.14 Summary.

Appendix.

References.

Exercises.

2. Diagnostics and Remedial Measures. 

2.1 Assumptions.

2.2 Residual Plots.

2.3 Transformations.

2.4 Influential Observations.

2.5 Outliers.

2.6 Measurement Error.

2.7 Software.

2.8 Summary.

Appendix.

References.

Exercises.

3. Regression with Matrix Algebra. 

3.1 Introduction to Matrix Algebra.

3.2 Matrix Algebra Applied to Regression.

3.3 Summary.

Appendix.

References.

Exercises.

4. Introduction to Multiple Linear Regression. 

4.1 An Example of Multiple Linear Regression.

4.2 Centering And Scaling.

4.3 Interpreting Multiple Regression Coefficients.

4.4 Indicator Variables.

4.5 Separation or Not?

4.6 Alternatives to Multiple Regression.

4.7 Software.

4.8 Summary.

References.

Exercises.

5. Plots in Multiple Regression. 

5.1 Beyond Standardized Residual Plots.

5.2 Some Examples.

5.3 Which Plot?

5.4 Recommendations.

5.5 Partial Regression Plots.

5.6 Other Plots For Detecting Influential Observations.

5.7 Recent Contributions to Plots in Multiple Regression.

5.8 Lurking Variables.

5.9 Explanation of Two Data Sets Relative to R2.

5.10 Software.

5.11 Summary.

References.

Exercises.

6. Transformations in Multiple Regression. 

6.1 Transforming Regressors.

6.2 Transforming Y.

6.3 Further Comments on the Normality Issue.

6.4 Box-Cox Transformation.

6.5 Box-Tidwell Revisited.

6.6 Combined Box-Cox and Box-Tidwell Approach.

6.7 Other Transformation Methods.

6.8 Transformation Diagnostics.

6.9 Software.

6.10 Summary.

References.

Exercises.

7. Selection of Regressors. 

7.1 Forward Selection.

7.2 Backward Elimination.

7.3 Stepwise Regression.

7.4 All Possible Regressions.

7.5 Newer Methods.

7.6 Examples.

7.7 Variable Selection for Nonlinear Terms.

7.8 Must We Use a Subset?

7.9 Model Validation.

7.10 Software.

7.11 Summary.

Appendix.

References.

Exercises.

8. Polynomial and Trigonometric Terms. 

8.1 Polynomial Terms.

8.2 Polynomial-Trigonometric Regression.

8.3 Software.

8.4 Summary.

References.

Exercises.

9. Logistic Regression. 

9.1 Introduction.

9.2 One Regressor.

9.3 A Simulated Example.

9.4 Detecting Complete Separation, Quasicomplete Separation and Near Separation.

9.5 Measuring the Worth of the Model.

9.6 Determining the Worth of the Individual Regressors.

9.7 Confidence Intervals.

9.8 Exact Prediction.

9.9 An Example With Real Data.

9.10 An Example of Multiple Logistic Regression.

9.11 Multicollinearity in Multiple Logistic Regression.

9.12 Osteogenic Sarcoma Data Set.

9.13 Missing Data.

9.14 Sample Size Determination.

9.15 Polytomous Logistic Regression.

9.16 Logistic Regression Variations.

9.17 Alternatives to Logistic Regression.

9.18 Software for Logistic Regression.

9.19 Summary.

Appendix.

References.

Exercises.

10. Nonparametric Regression. 

10.1 Relaxing Regression Assumptions.

10.2 Monotone Regression.

10.3 Smoothers.

10.4 Variable Selection.

10.5 Important Considerations in Smoothing.

10.6 Sliced Inverse Regression.

10.7 Projection Pursuit Regression.

10.8 Software.

10.9 Summary.

Appendix.

References.

Exercises.

11. Robust Regression. 

11.1 The Need for Robust Regression.

11.2 Types of Outliers.

11.3 Historical Development of Robust Regression.

11.4 Goals of Robust Regression.

11.5 Proposed High Breakdown Point Estimators.

11.6 Approximating HBP Estimator Solutions.

11.7 Other Methods for Detecting Multiple Outliers.

11.8 Bounded Influence Estimators.

11.9 Multistage Procedures.

11.10 Other Robust Regression Estimators.

11.11 Applications.

11.12 Software for Robust Regression.

11.13 Summary.

References.

Exercises.

12. Ridge Regression. 

12.1 Introduction.

12.2 How Do We Determine K?.

12.3 An Example.

12.4 Ridge Regression for Prediction.

12.5 Generalized Ridge Regression.

12.6 Inferences in Ridge Regression.

12.7 Some Practical Considerations.

12.8 Robust Ridge Regression.

12.9 Recent Developments in Ridge Regression.

12.10 Other Biased Estimators.

12.11 Software.

12.12 Summary.

Appendix.

References.

Exercises.

13. Nonlinear Regression. 

13.1 Introduction.

13.2 Linear Versus Nonlinear Regression.

13.3 A Simple Nonlinear Example.

13.4 Relative Offset Convergence Criterion.

13.5 Adequacy of the Estimation Approach.

13.6 Computational Considerations.

13.7 Determining Model Adequacy.

13.7.1 Lack-of-Fit Test.

13.8 Inferences.

13.9 An Application.

13.10 Rational Functions.

13.11 Robust Nonlinear Regression.

13.12 Applications.

13.13 Teaching Tools.

13.14 Recent Developments.

13.15 Software.

13.16 Summary.

Appendix.

References.

Exercises.

14. Experimental Designs for Regression. 

14.1 Objectives for Experimental Designs.

14.2 Equal Leverage Points.

14.3 Other Desirable Properties of Experimental Designs.

14.4 Model Misspecification.

14.5 Range of Regressors.

14.6 Algorithms for Design Construction.

14.7 Designs for Polynomial Regression.

14.8 Designs for Logistic Regression.

14.9 Designs for Nonlinear Regression.

14.10 Software.

14.11 Summary.

References.

Exercises.

15. Miscellaneous Topics in Regression. 

15.1 Piecewise Regression and Alternatives.

15.2 Semiparametric Regression.

15.3 Quantile Regression.

15.4 Poisson Regression.

15.5 Negative Binomial Regression.

15.6 Cox Regression.

15.7 Probit Regression.

15.8 Censored Regression and Truncated Regression.

15.8.1 Tobit Regression.

15.9 Constrained Regression.

15.10 Interval Regression.

15.11 Random Coefficient Regression.

15.12 Partial Least Squares Regression.

15.13 Errors-in-Variables Regression.

15.14 Regression with Life Data.

15.15 Use of Regression in Survey Sampling.

15.16 Bayesian Regression.

15.17 Instrumental Variables Regression.

15.18 Shrinkage Estimators.

15.19 Meta-Regression.

15.20 Classification and Regression Trees (CART).

15.21 Multivariate Regression.

References.

Exercises.

16. Analysis of Real Data Sets. 

16.1 Analyzing Buchanan’s Presidential Vote in Palm Beach County in 2000.

16.2 Water Quality Data.

16.3 Predicting Lifespan?

16.4 Scottish Hill Races Data.

16.5 Leukemia Data.

16.6 Dosage Response Data.

16.7 A Strategy for Analyzing Regression Data.

16.8 Summary.

References.

Exercises.

Index.