You are here

Scientific Data Analysis

Graham Currell
Oxford University Press
Publication Date: 
Number of Pages: 
[Reviewed by
William J. Satzer
, on

This is a kind of guidebook designed to teach scientific data analysis skills step by step to students from a broad range of scientific and engineering fields. No prior experience with statistics is expected and virtually no mathematical background past basic algebra is required. The book provides no formal treatment of probability and relies instead on a general conceptual and intuitive knowledge. There is, nonetheless, quite a bit of statistics here.

The author has worked for a number of years to develop data analysis modules and self-study materials for science students. This book is one manifestation of that work. He has organized the book into two distinct parts. The first is a bottom-up approach to develop the basic statistical tools and concepts. The second part starts with experimental data and proceeds top-down to describe the techniques that can be applied to analyze the data.

Part I, “Understanding the Statistics”, begins with the core concepts of statistics — data visualization, distributions and distribution parameters, uncertainty, and an introduction to hypothesis testing. This is followed by long chapters on regression, hypothesis testing, and comparing data (largely correlation and association). While the author assumes a fairly minimal level of mathematical background, this is not a cookbook approach. Instead, the book has a surprising level of sophistication and subtlety. For example, the treatment of hypothesis testing includes both parametric and nonparametric techniques and includes a discussion of Monte Carlo resampling.

The basics are also done very well. For example, early in the treatment of regression there is a case study of a spectrophotometer calibration. The author carries out a linear regression across a range somewhat wider than the primary range of quantities measured by the instrument. He then follows up by computing and plotting residuals and regression statistics for the full range of measurements as well for the smaller primary range of interest for this particular instrument. This is a wonderful example of teaching good practice by example.

The author uses case studies to good effect. He also shows the use of Excel, Minitab and the SPSS statistics package keystroke-by-keystroke for many examples. An accompanying website includes videos that demonstrate analyses using those software packages. At first this seemed like overkill to me, but I realize that at least some students would appreciate this extra level of attention and would benefit from the additional level of detail.

Part II, “Analyzing Experimental Data”, is intended to address the question: “Now that I have this data, what do I do next?” It starts with basic tasks such as preparing the data, identifying the relevant variables, and understanding the uncertainties in the data. The approach is initially purely exploratory, but then moves on to selection of possible analysis methods, consideration of transformation or weighting of data, and then determination of those data characteristics necessary for further testing.

The author then addresses data analysis techniques appropriate for single and multiple response variables, related variables and frequency. He does this in depth using example data and well-chosen case studies.

This is an appealing introduction that would be accessible to a variety of students at the college level. Its strengths are clarity and directness with an abundance of good examples and case studies. I would have wished for a more explicit discussion of the usual statistical bugaboos — misuse of p-values, misunderstanding the results of hypothesis tests, and blindly applying linear regression to everything in sight. But the author prefers to teach good practice by example, and he does that very well.

Bill Satzer ([email protected]) is a senior intellectual property scientist at 3M Company, having previously been a lab manager at 3M for composites and electromagnetic materials. His training is in dynamical systems and particularly celestial mechanics; his current interests are broadly in applied mathematics and the teaching of mathematics.

Part I - Understanding the statistics
1. Statistical concepts
2. Regression analysis
3. Hypothesis testing
4. Comparing data
Part II - Analysing experimental data
5. Project data analysis
6. Single response variables
7. Related variables
8. Frequency data
9. Multiple variables