You are here

Analysis of Multivariate and High-Dimensional Data

Inge Koch
Cambridge University Press
Publication Date: 
Number of Pages: 
Cambridge Series in Statistical and Probabilistic Mathematics
BLL Rating: 

The Basic Library List Committee suggests that undergraduate mathematics libraries consider this book for acquisition.

[Reviewed by
Michael Sutherland
, on

This is a delightful book. Inge Koch has written a modern, handsome, beautifully printed and formatted multivariate data analysis textbook. I like everything in it: the typesetting, the Preface (Theory and Data — “No Choose, Do!”), the Chapter structuring, the modern bibliography and the overall writing philosophy (plenty of references for the academic in me which don’t get in the way of Koch’s graceful open explanatory style). History and theory interplay with data analysis and scholarship but none of them drag you down as you catch up with where modern multivariate analysis has gone in the last 30 or so years. All due respect to the classic foundations of PCA, CCA and MDA, but you will be whisked away to the frontiers of non-Gaussian quests for lower dimensional representations a la Independent Components Analysis (Blind Source Separation), Projection Pursuit and High Dimension Low Sample Size PCA extensions.

The book is terrific. I’ve taken advantage of my JSTOR access and this book’s excellent modern bibliography to catch up on the modern non-Gaussian action. A quick look at the table of contents will show you that the first 8 chapters are traditional (well, for a multivariate book) but it is the last 5 chapters (180 pages) that I have fallen in love with: Towards Non-Gaussianity; Independent Components Analysis; Projection Pursuit; Kernel and More Independent Component Methods and Feature Selection and Principal Components Analysis Revisited.

Koch’s book brings me up to date on what has happened to “principal-components-like” thinking, algorithms and data practice with the birth of cheap, very fast, very large computing capabilities. We have become and are continuing to become capable (theoretically, algorithmically and with real data) of venturing beyond the multivariate normal of classical multivariate statistical practice. This book opens up the algorithmic and mathematical predictive modeling possibilities beyond the first two moments! I’m seeing applications in financial engineering, industrial engineering and the biological and social sciences.

For me personally there has also been a simultaneous fascination with developments of Statistical Learning methods (e.g., trees, forests, bagging, boosting, etc.) and the immense growth in anyone’s computing capabilities due to the ubiquity and support of the R computing environment. Indeed, I’d suggest much of the value I’ve found in Koch’s text is directly related to my simultaneous fondness for using R while studying her texts methods and the related “Big Data” predictive modeling methods in a text such as The Elements of Statistical Learning by Hastie, Tibshirani and Freidmann (with a new and exciting introductory version freely available on the internet).

Modern statistical practice and its mathematics is finally open to the simple observation that the multivariate normal is but one “star” in a “constellation” of useful probability distributions… and there has been a burst of algorithmic predictive capability waiting for its companion mathematical understanding to be developed. As Koch heads into her final five non-standard chapters (who else has High Dimension Low Sample Size discussed in their text?) she says, “…the purpose of this section is to give the reader a taste of further developments in the pursuit of interesting structure in multivariate and high dimensional data.”

Koch’s characteristic note is her willingness to use the fuzzy concept of “interesting structure” to guide us into and through what has happened to multivariate analysis over the last few decades. I like “interesting structure”!

Later in the book, having given us an excellent tour of her “interesting structures” she calls for the continued development of multiple methods and associated algorithms to explore the variety of complexities in high dimensional data and the variety of needs in the analyses. Koch’s statement is simply, “If possible, I recommend working with more than one approach and comparing the results.” Hooray! Like the whole book, it is a smart, level headed message, echoing the wisdom of Box’s dictum, “All models are wrong, some models are useful” and of Hand’s concept of the probability lever: small changes in the veracity of assumptions may have tremendous leverage on one’s conclusions.

PCA was a great place to start to think about what we measure, how much we can measure, and how we can wrestle with the dimensionality issue. Koch’s book reminds me that it still is a great place to start; but my how it has grown! 

Mike Sutherland ([email protected]) is a semi retired statistical consultant who works on interesting academic and business problems. He was a founding faculty member of Hampshire College, then moved to the University of Massachusetts to become the Director of the Statistical Consulting Center.

Part I. Classical Methods:
1. Multidimensional data
2. Principal component analysis
3. Canonical correlation analysis
4. Discriminant analysis

Part II. Factors and Groupings:
5. Norms, proximities, features, and dualities
6. Cluster analysis
7. Factor analysis
8. Multidimensional scaling

Part III. Non-Gaussian Analysis:
9. Towards non-Gaussianity
10. Independent component analysis
11. Projection pursuit
12. Kernel and more independent component methods
13. Feature selection and principal component analysis revisited