You are here

Compositional Data Analysis in Practice

Michael Greenacre
Publisher: 
Chapman and Hall/CRC
Publication Date: 
2018
Number of Pages: 
136
Format: 
Hardcover
Price: 
140.00
ISBN: 
9781138316614
Category: 
Monograph
[Reviewed by
Fabio Mainardi
, on
10/18/2021
]
Compositional data are non-negative data where the sum of the variables is constant, usually 1 (proportions) or 100 (percentages). They are encountered in a variety of contexts and many examples come from various branches of mathematical ecology, when the state of a given population at a given time is characterized by the relative abundance of species. Other typical examples come from geology (e.g composition of rocks) or nutrition (e.g. macro-nutrient composition of foods).
 
Because of the constraint on the sum of variables, the statistical analysis of this kind of data requires some special techniques. For example, it is easy to see that each of the variables in the dataset must be negatively correlated with at least one of the other variables: this is known as the “negative correlation bias”. Therefore, the interpretation of pairwise correlations between the variables is problematic. One way to avoid this and other pitfalls was proposed by John Aitchison and is based on the simple approach of working with ratios of variables (more precisely, log-ratios, the logarithm being a simple device to transform multiplicative scales to additive scales).  However, working with ratios, and logarithms, excludes the possibility of zero values in the dataset. This is not realistic, since zero measurements do occur in practice, in all of the examples mentioned above: one species can very well be absent at a specific site and at a given time point, or a food might totally lack a particular fatty acid. This inability to deal with zeroes is often seen as the Achilles’ heel of Aitchison’s approach. Several workarounds have been proposed in the literature, including many imputation methods replacing zeroes with ‘very small’ positive values. One of the most original contributions of this book is an alternative approach, based on correspondence analysis performed on power-transformed data.
 
Compared to other books on compositional data analysis, Greenacre’s book is more focused on the applications and less on the mathematical foundations. His approach to compositional data analysis is simple and pragmatic, with interpretability as a key objective.
 
The book is beautifully illustrated, contains a wealth of examples, and a list of very useful appendices. In particular, Appendix C contains the R code used to generate most of the results and plots in the book and is based on the author’s package easyCODA (available on CRAN). 
 
I think this book is an excellent complement to more theoretical resources, like the notes by Pawlowsky-Glahn, Egozcue, and  Tolosana-Delgado. I recommend it to anyone wishing to learn about this relatively unknown field of data analysis.

 

Fabio Mainardi ([email protected]) is a mathematician working as a senior data scientist at Nestlé Research, Switzerland. His mathematical interests are number theory, functional analysis, discrete mathematics and probability.