Publisher:

Chapman & Hall/CRC

Number of Pages:

228

Price:

79.95

ISBN:

9781439835807

This book explores four essential and basic methods for multivariate exploratory data analysis: principal component analysis, correspondence analysis, multiple correspondence analysis, and hierarchical ascendant classification. The principal innovations are an emphasis on geometric intuition and the use of the **R** calculator system. It is an excellent book which I would strongly recommend as a secondary text, supporting or accompanying the main text for any advanced undergraduate or graduate course in multivariate analysis.

**The Examples:** The book explores 12 sophisticated examples, three for each of the four exploratory data analysis methods. Both the examples and the code that analyzes them may be freely accessed from the book’s web site at http://factominer.free.fr/book. Other useful websites, such as website for the free download of the **R** system, are also found in the introduction.

The fact that these examples are free means that an instructor of a multivariate data course can review these examples relative to other books and other software in order to intelligently evaluate whether (s)he wishes to use the book as a secondary text. A further bonus is that an instructor may decide to use these examples, which are rich and illustrative of basic concepts, without selecting use of the book for his/her course.

**Geometry:** Quite simply, the book consists of a series of triplets of tables, pictures, and discussions. The math is there, but is explained heuristically and geometrically, not in terms of algebraic manipulations. This is refreshing since the reader *sees* the various clusters and associations rather than becomes aware of them as consequences of formal manipulations. I cite the authors’ stated goals from the introduction.

The book has been designed for scientists whose aim is not to become statisticians but who feel the need to analyze data themselves. It is therefore addressed to practitioners who are confronted with the analysis of data. From this perspective it is application-oriented; formalism and mathematics writing have been reduced as much as possible while examples and intuition have been emphasized. Specifically, an undergraduate level is sufficient to capture all the concepts involved.

Here is another perspective: Most texts spend a lot of time on the underlying algebraic theory and have a few pictures; by contrast, *Exploratory Multivariate Analysis*, spends a lot of time on the pictures and has a few formulae.

Another way to appreciate this book is to compare it with an excellent *Mathematics Magazine* article with similar goals. “Spectral Analysis of the Supreme Court,” by B. Lawson, M. Orrison, and D. Uminsky (*Math Mag.* **79**(5)(2006), 340–346) is an expository article whose goals are "to convince you that spectral analysis is worth looking into." Despite the expository nature of the article, it has no pictures (actually to be fair, it has a picture of the Supreme Court justices). Instead it has three tables, a few equations, and makes its case for spectral analysis by appealing to intuitive Linear Algebra concepts (such as vectors of means).

One further difference between the article and *Exploratory Multivariate Analysis* may be worth mentioning. The article studies justice-coalitions as implied by voting patterns. In other words, it studies the objects (that is, the justices) but not the attributes (the voting issues). Contrastively, *Exploratory Multivariate Analysis *studies the individual-variable duality and shows how data can lead to clustering inferences in both the individuals and attributes. Thus despite the emphasis on intuition, *Exploratory Multivariate Analysis* does not sacrifice detail; it explores all aspects of the subject.

**R calculator:** To teach any advanced data analysis, the instructor must also teach use of a software package. The principal advantages of **R** are that it is free and puts emphasis on visualization. The authors detail information about **R** commands in the appendices. As far as I know the **R** calculator has the same capabilities as other multivariate software.

In summary, this is a compact book with a plethora of visualizations teaching all subtleties of major data exploratory methods. It would supplement well any primary textbook in an advanced undergraduate or graduate course in multivariate analysis.

Russell Jay Hendel, RHendel@Towson.Edu, holds a PhD. in theoretical mathematics and an Associateship from the Society of Actuaries. He teaches at Towson University. His interests include discrete number theory, applications of technology to education, problem writing, actuarial science and the interaction between mathematics, art and poetry.

Date Received:

Thursday, November 25, 2010

Reviewable:

Yes

Series:

Computer Science and Data Analysis Series

Publication Date:

2011

Format:

Hardcover

Audience:

Category:

Textbook

Russel Jay Hendel

07/5/2011

**Principal Component Analysis (PCA)**

Data — Notation — Examples

Objectives

Studying Individuals

Studying Variables

Relationships between the Two Representations *N _{I}* and

Interpreting the Data

Implementation with FactoMineR

Additional Results

Example: The Decathlon Dataset

Example: The Temperature Dataset

Example of Genomic Data: The Chicken Dataset

**Correspondence Analysis (CA) **Data — Notation — Examples

Objectives and the Independence Model

Fitting the Clouds

Interpreting the Data

Supplementary Elements (= Illustrative)

Implementation with FactoMineR

CA and Textual Data Processing

Example: The Olympic Games Dataset

Example: The White Wines Dataset

Example: The Causes of Mortality Dataset

**Multiple Correspondence Analysis (MCA)**

Data — Notation — Examples

Objectives

Defining Distances between Individuals and Distances between Categories

CA on the Indicator Matrix

Interpreting the Data

Implementation with FactoMineR

Addendum

Example: The Survey on the Perception of Genetically Modified Organisms

Example: The Sorting Task Dataset

**Clustering**

Data — Issues

Formalising the Notion of Similarity

Constructing an Indexed Hierarchy

Ward’s Method

Direct Search for Partitions: K-means Algorithm

Partitioning and Hierarchical Clustering

Clustering and Principal Component Methods

Example: The Temperature Dataset

Example: The Tea Dataset

Dividing Quantitative Variables into Classes

**Appendix**

Percentage of Inertia Explained by the First Component or by the First Plane

R Software

**Bibliography of Software Packages **

**Bibliography **

**Index**

Publish Book:

Modify Date:

Tuesday, August 16, 2011

- Log in to post comments