For the few readers out there who may not know, R is a free software environment for statistical computing and graphics. Designed in the 1990s by Ross Ihaka and Robert Gentleman and based upon the successful S language developed at Bell Labs, it is maintained by a core team of about thirty volunteers, with the assistance of many other computing and statistic professionals who contribute “packages” that greatly extend the capabilities of the environment. In addition to being free, R is both powerful and state-of-the-art. For some time it has been the tool of choice for statisticians in academia, but recently it has made significant inroads in the life science sciences, social sciences, and even in finance and actuarial science. See the Comprehensive R Network (http://www.r-project.org/index.html) for more information and for links to a download site.
There are probably over one hundred books in print that aim to teach the reader how to use R, usually in the context of a specific area of application, and often assuming that the reader has minimal interest in exploring the powers of the language; this book takes a somewhat different approach. Part One offers a more-systematic-than-usual introduction to R. Although it assumes no particular computer science prerequisites, it challenges the reader to take advantage of the vector and matrix manipulation capabilities of the R language. In particular Section 4.2 on the apply family of functions and related functions for matrices, arrays and data frames is by far the most friendly and helpful introduction to the subject that I have seen. A good deal of attention is also paid to the complexities of reading data into R and manipulating data tables into a form suitable for statistical analysis. End-of-chapter exercises are provided. Some are rather challenging, but thankfully there are answers in the back of the book.
Part Two of the book begins with a quick start to R — for readers lacking the patience to study Part One — and follows with coverage of various procedures in inferential statistics: basic hypothesis testing, several types of linear models (broken up as regression, analysis of variance and analysis of covariance), classification (discriminant analysis, etc., but logistic regression is also placed here), exploratory multivariate analysis — done in what the authors call “a French way” — and classification. The datasets are not skewed toward any particular field of application. All datasets, along with the R-code in the book, are available on the website for the text.
R is a command-line interpreted scripting language: it is not wrapped up inside a graphical user interface such as you will find for Excel, Minitab, SPSS or other major “statistical packages.” For those who insist on a GUI there is Rcmdr, a contributed package that provides GUI access to most of the commonly-used statistical procedures. At the end of each chapter the authors indicate how to use Rcmdr, whenever it can handle the routine under discussion. This reviewer believes, however, that the advantages of learning the R language directly more than justify the effort required. Recently, that effort has been considerably reduced by the release of RStudio (http://www.rstudio.com/) , a free integrated development environment (IDE) for R. The RStudio IDE, which was not available when the authors of R for Statistics were writing their book, offers many options for organizing one’s work, getting help on the syntax of R-functions, accessing and searching the history of one’s R sessions, as well as downloading and updating contributed packages.
If you require more flexibility in statistical practice than what is afforded by a standard commercial statistical package but do not want to pay for SAS or S-Plus, then R is for you. If in addition you are not a trained programmer but you aspire to write code that is efficient and perhaps, from time to time, clever, then this book is a fine place for you to start learning R.
Homer White is Professor of Mathematics at Georgetown College, in Kentucky. A typical Jack-of-All-Trades small-college mathematician, he enjoys the teaching of statistics at all levels, statistical consultation, and even institutional research. His interests and occasional forays into research in the history of mathematics include the geometrical works of Leonhard Euler and the mathematics of classical India.
An Overview of R
Reading Data from File
Concatenating Data Tables
Conventional Graphical Functions
Graphical Functions with lattice
Making Programs with R
Creating a Function
Introduction to the Statistical Methods
A Quick Start with R
Opening and Closing R
The Command Prompt
Attribution, Objects, and Function
Importing (or Inputting) Data
Confidence Intervals for a Mean
Chi-Square Test of Independence
Comparison of Two Means
Testing Conformity of a Proportion
Comparing Several Proportions
The Power of a Test
Simple Linear Regression
Multiple Linear Regression
Partial Least Squares (PLS) Regression
Analysis of Variance and Covariance
One-Way Analysis of Variance
Multi-Way Analysis of Variance with Interaction
Analysis of Covariance
Linear Discriminant Analysis
Exploratory Multivariate Analysis
Principal Component Analysis
Multiple Correspondence Analysis
Ascending Hierarchical Clustering
The k-Means Method
The Most Useful Functions
Writing a Formula for the Models
The Rcmdr Package
The FactoMineR Package
Answers to the Exercises