Data mining or knowledge discovery, the process of building empirical statistical models for classification or prediction from large data sets, has received a great deal of attention in recent years. With the huge data sets made possible by the growth of the internet, data mining techniques have become very important in e-commerce. In addition to its applications in business, data mining techniques are increasingly being used in the sciences to analyze data in fields as diverse as molecular biology, remote sensing, astronomy, and high energy particle physics.
The data sets analyzed in scientific applications of data mining are often very different from data sets analyzed in commercial applications. In commercial applications the data typically consists of web logs, customer transactions, and other data that can be obtained from relational databases. In scientific applications, the data might consist of astronomical or remote sensing images, DNA sequences, or microarray gene expression data. Scientific applications of data mining often require significant amounts of data preparation. For example, data from different sensors is combined, noise is removed from the data by filtering, and images are preprocessed to identify specific objects.
Kamath's book provides an introduction to applications of data mining in the sciences. Instead of focusing narrowly on statistical techniques for building classification and prediction models, Kamath looks at the larger process of scientific data mining, including data preparation issues and visualization of results as well as statistical procedures.
The book begins with a chapter of examples of scientific applications of data mining, followed by chapters on types of scientific data and an overview of the data mining process. Later chapters cover specific steps in the scientific data mining process, including image denoising, data fusion, image segmentation, feature extraction, and dimension reduction. A chapter on statistical methods introduces approaches to clustering, classification, and prediction, including neural networks, support vector machines, and decision trees. This is followed by chapters on data visualization and software packages for scientific data mining. The author has provided a very complete bibliography, with references to both original research and surveys on various topics relevant to scientific data mining.
This book will be of interest to graduate students and researchers looking for a broad overview and introduction to scientific data mining. The material on denoising, image segmentation, and feature extraction will be of particular interest to readers who want to analyze image data. For readers who want a more in depth discussion of statistical techniques in data mining, the book by Hastie et al. (2009) would be a more appropriate starting point.
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition, Springer, 2009.
Brian Borchers is a professor of Mathematics at the New Mexico Institute of Mining and Technology. His interests are in optimization and applications of optimization in parameter estimation and inverse problems.