You are here

Exploratory Multivariate Analysis by Example Using R

François Husson, Sébastien Lê, and Jérôme Pagès
Chapman & Hall/CRC
Publication Date: 
Number of Pages: 
Computer Science and Data Analysis Series
BLL Rating: 

The Basic Library List Committee suggests that undergraduate mathematics libraries consider this book for acquisition.

[Reviewed by
Russel Jay Hendel
, on

This book explores four essential and basic methods for multivariate exploratory data analysis: principal component analysis, correspondence analysis, multiple correspondence analysis, and hierarchical ascendant classification. The principal innovations are an emphasis on geometric intuition and the use of the R calculator system. It is an excellent book which I would strongly recommend as a secondary text, supporting or accompanying the main text for any advanced undergraduate or graduate course in multivariate analysis.

The Examples: The book explores 12 sophisticated examples, three for each of the four exploratory data analysis methods. Both the examples and the code that analyzes them may be freely accessed from the book’s web site at Other useful websites, such as website for the free download of the R system, are also found in the introduction.

The fact that these examples are free means that an instructor of a multivariate data course can review these examples relative to other books and other software in order to intelligently evaluate whether (s)he wishes to use the book as a secondary text. A further bonus is that an instructor may decide to use these examples, which are rich and illustrative of basic concepts, without selecting use of the book for his/her course.

Geometry: Quite simply, the book consists of a series of triplets of tables, pictures, and discussions. The math is there, but is explained heuristically and geometrically, not in terms of algebraic manipulations. This is refreshing since the reader sees the various clusters and associations rather than becomes aware of them as consequences of formal manipulations. I cite the authors’ stated goals from the introduction.

The book has been designed for scientists whose aim is not to become statisticians but who feel the need to analyze data themselves. It is therefore addressed to practitioners who are confronted with the analysis of data. From this perspective it is application-oriented; formalism and mathematics writing have been reduced as much as possible while examples and intuition have been emphasized. Specifically, an undergraduate level is sufficient to capture all the concepts involved.

Here is another perspective: Most texts spend a lot of time on the underlying algebraic theory and have a few pictures; by contrast, Exploratory Multivariate Analysis, spends a lot of time on the pictures and has a few formulae.

Another way to appreciate this book is to compare it with an excellent Mathematics Magazine article with similar goals. “Spectral Analysis of the Supreme Court,” by B. Lawson, M. Orrison, and D. Uminsky (Math Mag. 79(5)(2006), 340–346) is an expository article whose goals are "to convince you that spectral analysis is worth looking into." Despite the expository nature of the article, it has no pictures (actually to be fair, it has a picture of the Supreme Court justices). Instead it has three tables, a few equations, and makes its case for spectral analysis by appealing to intuitive Linear Algebra concepts (such as vectors of means).

One further difference between the article and Exploratory Multivariate Analysis may be worth mentioning. The article studies justice-coalitions as implied by voting patterns. In other words, it studies the objects (that is, the justices) but not the attributes (the voting issues). Contrastively, Exploratory Multivariate Analysis studies the individual-variable duality and shows how data can lead to clustering inferences in both the individuals and attributes. Thus despite the emphasis on intuition, Exploratory Multivariate Analysis does not sacrifice detail; it explores all aspects of the subject.

R calculator: To teach any advanced data analysis, the instructor must also teach use of a software package. The principal advantages of R are that it is free and puts emphasis on visualization. The authors detail information about R commands in the appendices. As far as I know the R calculator has the same capabilities as other multivariate software.

In summary, this is a compact book with a plethora of visualizations teaching all subtleties of major data exploratory methods. It would supplement well any primary textbook in an advanced undergraduate or graduate course in multivariate analysis.

Russell Jay Hendel, [email protected], holds a PhD. in theoretical mathematics and an Associateship from the Society of Actuaries. He teaches at Towson University. His interests include discrete number theory, applications of technology to education, problem writing, actuarial science and the interaction between mathematics, art and poetry.

Principal Component Analysis (PCA)
Data — Notation — Examples
Studying Individuals
Studying Variables
Relationships between the Two Representations NI and NK
Interpreting the Data
Implementation with FactoMineR
Additional Results
Example: The Decathlon Dataset
Example: The Temperature Dataset
Example of Genomic Data: The Chicken Dataset

Correspondence Analysis (CA)
Data — Notation — Examples
Objectives and the Independence Model
Fitting the Clouds
Interpreting the Data
Supplementary Elements (= Illustrative)
Implementation with FactoMineR
CA and Textual Data Processing
Example: The Olympic Games Dataset
Example: The White Wines Dataset
Example: The Causes of Mortality Dataset

Multiple Correspondence Analysis (MCA)
Data — Notation — Examples
Defining Distances between Individuals and Distances between Categories
CA on the Indicator Matrix
Interpreting the Data
Implementation with FactoMineR
Example: The Survey on the Perception of Genetically Modified Organisms
Example: The Sorting Task Dataset

Data — Issues
Formalising the Notion of Similarity
Constructing an Indexed Hierarchy
Ward’s Method
Direct Search for Partitions: K-means Algorithm
Partitioning and Hierarchical Clustering
Clustering and Principal Component Methods
Example: The Temperature Dataset
Example: The Tea Dataset
Dividing Quantitative Variables into Classes

Percentage of Inertia Explained by the First Component or by the First Plane
R Software

Bibliography of Software Packages