I am a huge fan of Sheldon Ross’s books on probability models and stochastic processes, so I was very interested in reviewing his Introductory Statistics. It is of the “trust me” genre, i.e., there are no proofs and few derivations. No calculus is used, and there are no simulations to attempt to convince the reader of the validity of the methods (i.e. nothing like what is in Introduction to Statistical Thinking (With R, Without Calculus) by Benjamin Yakir (the link takes you to a free pdf of that book).
Introductory Statistics is designed for a first service course for non-science majors and has all the standard topics you would expect and a few extras, but unfortunately it stops short. For example, it very clearly explains what an ANOVA is and how to do it, but unfortunately it stops there. The student will know how to test to see if several means are different, but not know what to do next if in fact they are. I guess tough decisions had to be made in order to keep the book to its already large size of about 800 pages.
I question some of the choices made in the production (as opposed to writing) of the text. There are (the much maligned) pie charts and (horrific) 3d bar charts throughout, and they are not being shown as examples of what not to do. There are sections in small font that are in some cases the most important parts (such as the central limit theorem), but then there are historical asides in the same format. In the few very simple R code samples, symbols that are not easily reproduced on the keyboard are used. For example, how many people, especially in the target audience, would know that to get \(\mu\) on the keyboard you need to use alt-230? And \(\surd\) is used in the code too, which R does not accept. The output of the R commands is formatted differently than it actually looks when you run it. None of this is a big deal for somebody already comfortable with the material, but can cause a lot of time to be wasted for the beginners who will be using this book, as will the various typos.
The book is supposed to come with a cd of material that was not supplied with the review copy — I hope that the datsasets used in the book are on the disc so that students do not have to type everything in.
There are a gazillion exercises, many pages of appendices and statistical tables at the back of the book, along with answers to odd numbered exercises — except for the last chapter, which brings me to another negative point. The last chapter is entitled “Machine Learning and Big Data”, but there is nothing about big data in the chapter and so it seems to be a marketing decision to call it that.
There are some interesting topics included that are not in most introductory stats texts, such as the Gini index, bandit problems, and quality control. But there is no discussion of Bayesian methods, nor of the issues arising from NHST (null hypothesis significance testing), nor multiple testing issues.
In summary, Ross’s writing (as always) is very clear. But there are too many issues with this text to recommend it.
Peter Rabinovitch is a Senior Performance Engineer at Akamai who has been doing data science since long before “data science” was a thing.