While a graduate student at Georgia Tech, my favorite mathematics professor, Fred Andrew, asked the following question: What is the least squares fit for \(x\) to the two equations: \(x=0\) and \(x=1\)?

The question is easy to understand, yet it’s obvious that no single value of \(x\) will satisfy both equations. You have to make a tradeoff. But how? One way, and the way discussed in the book is to find a value of \(x\), call it \(x'\), that minimizes the squared error. For this example that error function is: \( E = (x-0)^2 + (x-1)^2\).

If we apply the usual matrix-based approach and write the equations as: \[\left[ \begin{array}{c}1 \\1 \end{array} \right]x =\left[ \begin{array}{c}0 \\1 \end{array}\right] \]

Then we see that we have the equations in the form \(Ax = b\) to which we can find the least squares solution as: \(A^T A x = A^T b\) and solve for \(x\). With the original two equations, we have: \[ A = \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \]

So we find \(x = \frac{1}{2}\). In fact, we can write the squared error as \(E\) where: \[ E = \left( x- 1 \right)^2 + \left( x - 0 \right)^2 \] and if we like, we can plot it to show how the error function behaves:

The optimum value for \(x\) is where this plot is a minimum, at \(x' = 0.5\). (Interestingly, but beside the point, is if we change the error norm from a squared error (\(L^2\)) to, say, the \(L^1\) error then any value of \(x\) will do.) If this, or are a little bit more than this, is what you know of least squares data fitting, then this book will show you how much more there is to know. But it won’t show you in an expository style.

The authors dive into the topics without regard for the background of the reader. They begin with the overall idea of data fitting and immediately head into matrices to formulate problems and provide solution methods. They present an excellent, though highly mathematical, discussion on data fitting with various functions such as exponential functions and trigonometric functions. The explanations throughout the book are terse. The reader must have a good background in linear algebra and be proficient with statistics to really grasp the material.

The discussion of singular value decomposition, for example, is good but quite terse. It helps, and this may sound odd, to have a working knowledge of the material before reading the book.

For example, the section titled “Constrained linear least squares problems” is where we find problems that one would usually use Lagrange multipliers to solve. But here the authors present approaches based on orthogonal transformations. Transformations, in general, are wonderful; they lend insight into the structure of the problem and provide a good method to see how variables interact. Still, I would have liked to have seen the Lagrange multiplier method at this point.

Also, the writers put the burden of knowing outside references on the reader. That is, where references are usually given to show credit to others, in this book the references are used as pointers to support poorly explained statements. For example, in the same section noted above, we find the sentence: “[A] strategy described in [150], based on interchanges, has been found to give satisfactory results.” The authors could have told the reader the strategy rather than forcing him to find it elsewhere. These externally referenced comments pepper the text and while certainly worthwhile to show the authors’ breadth of knowledge, they do little to help the reader who wants to learn from the book. This writing style is reminiscent of a thesis, not a text book.

Overall, the book is broad collection of techniques that at first glance seems comprehensive. And there are many ideas here. But when looked at closely the techniques are only briefly described and leave the reader wanting more. For example, there is a short discussion on neural networks at the end of the book. Neural networks are common today but the authors would have done well to introduce them with more detail and explanation. As it is, if the reader didn’t understand neural networks before reading the book, he would not understand them any better afterward. (My personal preferences go to texts such as *Matrix Computations* by Golub and Van Loan. Those authors discussed similar ideas but they presented the material in a more readily understandable manner.)

The authors of this book, it should be said, have a sense of humor. We see it only once (which is more than I’ve seen it in many other books): “It [interior point methods] is, of course, not devoid of its own complications (principle of conservation of difficulty!).” That’s the best a reader will find.

The book has many figures (all black and white, even when color would have shown more information) which are appreciated as they are needed to understand the problems and the behavior of solutions. However, the figures are shrunken so labels and legends are hard to read and the plots are difficult to discern.

In sum, this book reads as study in various least squares techniques with little thought to explain to the reader how she might actually implement them. Nor, as I think of it, does the text give a good summary of techniques so that a practitioner will know when to apply one versus another. If you want to learn or use least squares methods, I recommend you look elsewhere for a text book.

David Mazel holds a Ph.D. from Georgia Tech and is a practicing engineer in the Washington, DC, area. He welcomes your feedback to mazeld at gmail dot com.