You are here

Statistical Models: Theory and Practice

David Freedman
Cambridge University Press
Publication Date: 
Number of Pages: 
[Reviewed by
Ita Cirovic Donev
, on

Statistics is all around us. Just by going to work in the morning we can observe numerous examples of the use of statistics and immediately think of methods/models to analyze the observed data. Freedman'sgreat qualityas an author is the ability to provide you with this statistical vision, if you don't posses it already.

This book is truly an eye opener. It provides essential rigorous insight into statistical modeling. It differs from others in many aspects. Most statistics books, especially the more technical ones, are filled with theorems, proofs and examples you will never encounter in practice, either because of they are too simple or because they are extremely complex and are there to serve only as counterexamples. By contrast, this book provides real examples taken from real studies. The theorems and the corresponding proofs are presented in an elegant and intelligent manner. The author answers the questions the reader/researcher should ask. Among modeling books, this one is a gem.

The topics covered are the usual ones, such as the MLE, logit/probit modeling, path modeling and bootstrap, among the others. All of these are more or less familiar to every statistician and statistics student. Statistical Models is intended as a second course, so it builds onthe standard introductory material, but adding something special: of the theory and complexity of the subject. The author is very careful in presenting the theoretical ideas. He strives to explain almost all the bits and pieces. Rather than just presentinga theorem and its proof he gives the reasoning behind it. This is what should be highly appreciated. The more you read on, the morethis bookslooks like a step-by-step guide to statistical modeling. The writing is so clear and attractive that it doesn't allow you to get confused or lost so that you would stop reading.

I think that the most important part of the book (and generally where the most understanding will potentially come from) are the exercises. These are truly teaching exercises. If you take this book on as means for a second course in statistics by just simply reading the book without doing the exercises you will not get far. It is hard to describe the form of the exercises. Some concentrate on the basic understanding of the subject with questions (very good for class discussions) like "is MLE biased on unbiased?", some are in the form of a study, some analyze the output of the model, some are proofs, etc. There are exercises for everyone's taste, so to speak. And, even better, there are solutions at the end of the book!

Who is this book for? It should definitely find its place on the graduate student's bookshelf as well as on the bookshelf of a serious statistical researcher. Having completed a serious first course in statistics and some linear models the book could be easily used for self-study.

It is definitely not enough to know just how to plug one model into the software and get its output. We also need the "insider information," and this is exactly what this book offers. In any case, it will definitely raise you to the next level.

Ita Cirovic Donev is a PhD candidate at the University of Zagreb. She hold a Masters degree in statistics from Rice University. Her main research areas are in mathematical finance; more precisely, statistical mehods of credit and market risk. Apart from the academic work she does consulting work for financial institutions.

Preface ix

1 Observational Studies and Experiments
1.1 Introduction 1
1.2 The HIP trial 4
1.3 Snow on cholera 6
1.4 Yule on the causes of poverty 9
Exercise set A 13
1.5 End notes 14

2 The Regression Line
2.1 Introduction 18
2.2 The regression line 18
2.3 Hooke’s law 22
Exercise set A 23
2.4 Complexities 23
2.5 Simple vs multiple regression 25
Exercise set B 26
2.6 End notes 28

3 Matrix Algebra
3.1 Introduction 29
Exercise set A 30
3.2 Determinants and inverses 31
Exercise set B 33
3.3 Random vectors 35
Exercise set C 35
3.4 Positive definite matrices 36
Exercise set D 37
3.5 The normal distribution 38
Exercise set E 39
3.6 If you want a book on matrix algebra 40

4 Multiple Regression
4.1 Introduction 41
Exercise set A 44
4.2 Standard errors 45
Things we don’t need 48
Exercise set B 49
4.3 Explained variance in multiple regression 50
Association or causation? 52
4.4 Generalized least squares 52
4.5 Examples on GLS 55
Exercise set C 56
4.6 What happens to OLS if the assumptions break down? 57
4.7 Normal theory 57
Statistical significance 60
Exercise set D 60
4.8 The F-test 61
“The” F-test in applied work 63
Exercise set E 63
4.9 Data snooping 64
Exercise set F 65
4.10 Discussion questions 65
4.11 End notes 72

5 Path Models
5.1 Stratification 75
Exercise set A 80
5.2 Hooke’s law revisited 81
Exercise set B 82
5.3 Political repression during the McCarthy era 82
Exercise set C 84
5.4 Inferring causation by regression 85
Exercise set D 87
5.5 Response schedules for path diagrams 88
Selection vs intervention 95
Structural equations and stable parameters 95
Ambiguity in notation 96
Exercise set E 96
5.6 Dummy variables 97
Types of variables 98
5.7 Discussion questions 99
5.8 End notes 106

6 Maximum Likelihood
6.1 Introduction 109
Exercise set A 113
6.2 Probit models 114
Why not regression? 117
The latent-variable formulation 117
Exercise set B 118
Identification vs estimation 119
What if the Ui are N(μ, σ2 ? 120
Exercise set C 120
6.3 Logit models 121
Exercise set D 122
6.4 The effect of Catholic schools 123
More on table 3 126
Latent variables 126
Response schedules 127
The second equation 128
Mechanics: bivariate probit 130
Why a model rather than a cross-tab? 132
Interactions 132
More on the second equation 133
Exercise set E 133
6.5 Discussion questions 135
6.6 End notes 142

7 The Bootstrap
7.1 Introduction 148
Exercise set A 159
7.2 Bootstrapping a model for energy demand 160
Exercise set B 166
7.3 End notes 167

8 Simultaneous Equations
8.1 Introduction 169
Exercise set A 174
8.2 Instrumental variables 174
Exercise set B 177
8.3 Estimating the butter model 177
Exercise set C 178
8.4 What are the two stages? 178
Invariance assumptions 179
8.5 A social-science example: education and fertility 180
More on Rindfuss et al 184
8.6 Covariates 184
8.7 Linear probability models 185
The assumptions 186
The questions 188
Exercise set D 188
8.8 More on IVLS 189
Some technical issues 189
Exercise set E 191
Simulations to illustrate IVLS 191
Further reading on econometric technique 192
8.9 Issues in statistical modeling 192
8.10 Critical literature 195
Response schedules 199
8.11 Evaluating the models in chapters 6–8 200
8.12 Summing up 200

References 201

Answers to Exercises 216

The Computer Labs 267

Appendix: Sample MATLAB Code 283

Gibson on McCarthy 288
Evans and Schwab on Catholic Schools 316
Rindfuss et al on Education and Fertility 350
Schneider et al on Social Capital 375

Index 404