You are here

Foundations of Data Science

Avrim Blum, John Hopcroft, and Ravindran Kannan
Cambridge University Press
Publication Date: 
Number of Pages: 
[Reviewed by
Brian Borchers
, on
This is a textbook for a course on the foundations of data science at the advanced undergraduate or graduate level.  It provides a very broad overview of the foundations of data science that should be accessible to well-prepared students with backgrounds in computer science, linear algebra, and probability theory.
The authors start out the book with an important chapter on high dimensional geometry, random projections, and the Johnson-Lindenstrauss lemma.  The core of the book then introduces the singular value decomposition, perceptrons, the Vapnik-Chervonenkis dimension, k-means and spectral clustering algorithms, algorithms for the analysis of streaming data. nonnegative matrix factorization, and Byesian belief networks.  An appendix provides a useful summary of facts from linear algebra, probability, and analysis that are used throughout the book.  The exercises at the end of each chapter range from simple calculations to more challenging proofs and computing projects.
These are all important topics in the theory of machine learning and it is refreshing to see them introduced together in a textbook at this level.  However, the book also includes material on other topics including Markov chains, random graph models, linear and semidefinite programming and wavelets that are arguably not fundamental to data science.  There is far more material than might be taught in a typical semester-long course.  An instructor using this textbook would need to be selective in choosing what chapters to cover.
This is a book that has clearly been written from the point of view of computer science for students of computer science rather than statistics or mathematics.  It will be of interest to instructors who want to introduce data science to computer science students but might not be suitable for use in a course that approaches machine learning and data science from the statistical side.  Compare this book with The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman that covers much of the same ground from a very different statistical perspective.


Brian Borchers is a professor of mathematics at New Mexico Tech and the editor of MAA Reviews.