You are here

PREP Regression Analysis

MAA PREP Workshop in Statistics for Summer 2003

Regression Analysis: The Heart of Statistical Methodology

Dates: July 23 through 27, 2003

Location: Oberlin College

Presenters:

Richard L. Scheaffer

Department of Statistics

University of Florida

Gainesville, FL 32611

Phone: 352-378-1996

Fax: 352-392-5175

Email: scheaffe@stat.ufl.edu

Jeffrey A. Witmer

Oberlin College

Cox 101

Oberlin, OH 44074

Phone: (440) 775-8410

Fax: (440) 775-6638

Email: jeff.witmer@oberlin.edu

Abstract

Regression, in its many facets, is probably the most widely use statistical methodology in existence. It is the basis of modeling, whether the modeling is directed toward searching for associations among variables in observational studies or establishing treatment differences in designed experiments. The workshop will cover the data analytic techniques appropriate to modern use of regression analysis, as well as the inferential procedures most widely used with this methodology. Beginning with establishing principles and concepts through simple linear regression, the course will build to discussions of multiple regression, including models involving categorical response variables. Regression is an appropriate topic to serve as the basis of a second course in statistics, for those who have taken or taught an introductory course. It need not be calculus based but will rely heavily on statistical software.

Overview:

The goal of the workshop is to provide participants with an understanding of regression principles and a working knowledge of regression techniques. Beginning with simple linear regression, the course will cover classical multiple regression techniques for continuous response variables as well as modern logistic regression methods for categorical responses.

Readings from a variety of textbooks on the subject and some lessons on available software and web applets will be provided for the participants in advance of the workshop so that so some awareness of the tools available can be established before the workshop commences.

Workshop participants will engage in hands-on activities, individual and group, that involve the regression analyses of real data from observational studies and designed experiments. This will lead to their planning lessons on the subject that are to be taught to students at their home college during the coming school year. Follow up on these lessons will be by e-mail and telephone.

Content Outline:

Text: Ramsey, F. and Schafer, D. (2002). The Statistical Sleuth, 2nd ed. Belmont, CA: Duxbury Press. (The numbers in parentheses refer to this text.)

Sessions: Two sessions each morning and two each afternoon (except the last day) yield a total of 18 sessions of about 1.5 hours each. These are numbered consecutively in the following outline.

1. Overview, pre-program evaluation, and opening activity

2. Simple Linear Regression ’ the basics (7.2, 7.3)

Least squares estimation

Residuals

Sampling distributions of the estimators

Introduction to Data Desk

3. Inference for Simple Linear Regression (7.4)

Inference for slope and intercept

Estimation of a mean response

Prediction of a future value

4. Model Assessment (8.2-8.6)

Graphical tools

Transformations

Analysis of variance for regression

Lack-of-fit

R-squared

Normal probability plots

5, 6. Multiple Regression (9.1-9.6)

Multiple explanatory variables

Constructed explanatory variables ’ curvature, categories, and interaction

Scatterplot matrix

7, 8. Inference for Multiple Regression (10.2-10.4)

Inference for single coefficients

Inference for linear combinations of coefficients

Estimating a mean response

Predicting a future response

Hierarchical models-testing groups of coefficients

9,10. Basic theory of regression using matrices

11,12. Model Checking (11.2-11.6)

Influence and leverage

Partial residual plots

Weighted regression

13. Variable Selection Methods-a brief overview (12.2-12.7)

Multicolinearity

Automated variable selection techniques

14,15. Models for Two-Way Classifications (13.2-13.5)

Additive and nonadditive models

Randomized block v. completely randomized design

Orthogonal contrasts

Multiple comparisons

16. Adjustment for Serial Correlation-a brief overview (15.2-15.5)

17. Logistic Regression for Binary Responses (20.2-20.5)

The logit transformation

Maximum likelihood estimation

Inference for coefficients-deviance

Projects and lab activities will be mixed into these sessions throughout the week.

Technology: Standard regression software (as listed above) will be demonstrated, as will some free software that can be downloaded from the web. Illustrative statistics applets and data sets from various sources on the web will be introduced, including those referenced on the ASA’s electronic Journal of Statistics Education.

Facilities and Resources:

Participants will have access to computers running some standard statistical software and will have access to the web.

Cost:

Room and board are provided for all selected participants, but participants must pay a $100 registration fee and fund their own transportation to and from Oberlin.

Applying for Participation:

Applications may be made through MAA at /prep.

Applications should be sent in by March 31, 2003.

Dummy View - NOT TO BE DELETED