MAA PREP Workshop in Statistics for Summer 2003
Regression Analysis: The Heart of Statistical Methodology
Dates: July 23 through 27, 2003
Location: Oberlin College
Presenters:
Richard L. Scheaffer
Department of Statistics
University of Florida
Gainesville, FL 32611
Phone: 352-378-1996
Fax: 352-392-5175
Email: scheaffe@stat.ufl.edu
Jeffrey A. Witmer
Oberlin College
Cox 101
Oberlin, OH 44074
Phone: (440) 775-8410
Fax: (440) 775-6638
Email: jeff.witmer@oberlin.edu
Abstract
Regression, in its many facets, is probably the most widely use statistical methodology in existence. It is the basis of modeling, whether the modeling is directed toward searching for associations among variables in observational studies or establishing treatment differences in designed experiments. The workshop will cover the data analytic techniques appropriate to modern use of regression analysis, as well as the inferential procedures most widely used with this methodology. Beginning with establishing principles and concepts through simple linear regression, the course will build to discussions of multiple regression, including models involving categorical response variables. Regression is an appropriate topic to serve as the basis of a second course in statistics, for those who have taken or taught an introductory course. It need not be calculus based but will rely heavily on statistical software.
Overview:
The goal of the workshop is to provide participants with an understanding of regression principles and a working knowledge of regression techniques. Beginning with simple linear regression, the course will cover classical multiple regression techniques for continuous response variables as well as modern logistic regression methods for categorical responses.
Readings from a variety of textbooks on the subject and some lessons on available software and web applets will be provided for the participants in advance of the workshop so that so some awareness of the tools available can be established before the workshop commences.
Workshop participants will engage in hands-on activities, individual and group, that involve the regression analyses of real data from observational studies and designed experiments. This will lead to their planning lessons on the subject that are to be taught to students at their home college during the coming school year. Follow up on these lessons will be by e-mail and telephone.
Content Outline:
Text: Ramsey, F. and Schafer, D. (2002). The Statistical Sleuth, 2nd ed. Belmont, CA: Duxbury Press. (The numbers in parentheses refer to this text.)
Sessions: Two sessions each morning and two each afternoon (except the last day) yield a total of 18 sessions of about 1.5 hours each. These are numbered consecutively in the following outline.
1. Overview, pre-program evaluation, and opening activity
2. Simple Linear Regression — the basics (7.2, 7.3)
Least squares estimation
Residuals
Sampling distributions of the estimators
Introduction to Data Desk
3. Inference for Simple Linear Regression (7.4)
Inference for slope and intercept
Estimation of a mean response
Prediction of a future value
4. Model Assessment (8.2-8.6)
Graphical tools
Transformations
Analysis of variance for regression
Lack-of-fit
R-squared
Normal probability plots
5, 6. Multiple Regression (9.1-9.6)
Multiple explanatory variables
Constructed explanatory variables — curvature, categories, and interaction
Scatterplot matrix
7, 8. Inference for Multiple Regression (10.2-10.4)
Inference for single coefficients
Inference for linear combinations of coefficients
Estimating a mean response
Predicting a future response
Hierarchical models-testing groups of coefficients
9,10. Basic theory of regression using matrices
11,12. Model Checking (11.2-11.6)
Influence and leverage
Partial residual plots
Weighted regression
13. Variable Selection Methods-a brief overview (12.2-12.7)
Multicolinearity
Automated variable selection techniques
14,15. Models for Two-Way Classifications (13.2-13.5)
Additive and nonadditive models
Randomized block v. completely randomized design
Orthogonal contrasts
Multiple comparisons
16. Adjustment for Serial Correlation-a brief overview (15.2-15.5)
17. Logistic Regression for Binary Responses (20.2-20.5)
The logit transformation
Maximum likelihood estimation
Inference for coefficients-deviance
Projects and lab activities will be mixed into these sessions throughout the week.
Technology: Standard regression software (as listed above) will be demonstrated, as will some free software that can be downloaded from the web. Illustrative statistics applets and data sets from various sources on the web will be introduced, including those referenced on the ASA’s electronic Journal of Statistics Education.
Facilities and Resources:
Participants will have access to computers running some standard statistical software and will have access to the web.
Cost:
Room and board are provided for all selected participants, but participants must pay a $100 registration fee and fund their own transportation to and from Oberlin.
Applying for Participation:
Applications may be made through MAA at http://www.maa.org/prep.
Applications should be sent in by March 31, 2003.