Griewank has been joined by Walther in this update of the primary reference and textbook in the field of Automatic Differentiation (AD), also known as Algorithmic or Computational Differentiation, which is distinct from Numerical or Symbolic Differentiation. This approach to computing derivatives is, at its heart, a numerical method, since computations are done in floating-point arithmetic, although there is no approximation as in difference quotients. The “functions” that are to be differentiated are given by (potentially large) computer programs, where sensitivities of output values are desired with respect to input variables. Roughly, the “derivative” is an augmented program or computing environment, where each arithmetic operation or elementary function evaluation is automatically expanded to also compute the derivative value, using the chain rule. Processing such program code efficiently leads to many issues and techniques in computer science (data structures, algorithms, complexity,…), numerical analysis, graph theory, and other mathematics.
The audience for this book could be anyone interested in taking a serious look at the conceptual details of how this is accomplished efficiently. Readers completely unfamiliar with AD might prefer a gentler introduction of the basic ideas in expository articles. To study further, Evaluating Derivatives is the only text to summarize the field in light of the many developments over the last 25 years.
There are valuable lessons for a designer intending to implement AD in software for nonlinear computational problems. Students may use it as an introductory textbook; exercises are included. Nice expository features include 27 Rules or morals that are highlighted throughout the book, descriptive names for propositions, and returning to a simple “lighthouse” example to introduce or clarify ideas. The principles and techniques are conceptual in the sense that they are language-independent, although computing concerns such as memory access are addressed very explicitly.
The motivation for AD and this book is implementation in specific languages and for real-world applications. In particular, optimization can be facilitated by using AD to compute sensitivities of a simulation program with respect to model parameters. Information on language-specific AD software tools can be found at www.autodiff.org, as well as a database of over 1000 publications on AD and its applications. Often, users might want to consult this text to help explain or reference ideas encountered when using a software tool or in studying an AD application.
AD researchers can use this reference similarly. For example, this reviewer is particularly interested in higher-order derivatives, enabled by formulas that compute elementary functions of Taylor series. Chapter 13 is a good reference summarizing the topic, at least from the authors' perspective, and is relatively independent of the rest of the book.
This new edition improves the presentation of where payoffs and challenges arise in AD. The “cheap gradient principle” is introduced with examples and implications before a new Chapter 4 that expands the explanation and analysis of the memory issues and complexity bounds. To roughly understand this, consider computing the gradient of a scalar function of n variables. The relatively simple forward mode of AD accumulates all n partial derivatives at each computational step of the function (or program), a task proportional to n times work of the function evaluation. In contrast, the reverse mode only computes scalar partials with respect to the immediate arguments of each computational step. Then needed products of these are accumulated corresponding to a reverse sweep through the function steps. This way, the gradient computation takes at most 4 times the work of the function evaluation, regardless of n. However, enabling this reverse sweep involves recording program structure, either in a “tape” data structure, or by literally writing corresponding program code much as a compiler would do. Conceptually, AD operates on the computational graph (or parse tree) of the function.
This second edition is updated and reorganized throughout. More is included in Chapter 9 about graph-theoretic methods to reduce the number of arithmetic operations needed to accumulate the derivative values. This includes a brief description of how the problem of finding the minimal number of operations is equivalent to a classical NP-complete problem. A new Chapter 15 is a rewritten and expanded from the section on Implicit and Iterative Differentiation. Finally, there is a notational change so that all vectors will consistently be column vectors. This helps, but it may still take awhile for readers to get used to the notation and terminology, as set out in Chapters 2 and 3. Nevertheless, this is the book to “get up to speed” in Automatic Differentiation.
Richard Neidinger is professor of mathematics at Davidson College in North Carolina. His interests include numerical analysis, dynamical systems, and undergraduate mathematics education. He introduces automatic differentiation using MATLAB in a first numerical methods course for undergraduates. Email: email@example.com.
2: A Framework for Evaluating Functions;
3: Fundamentals of Forward and Reverse;
4: Memory Issues and Complexity Bounds;
5: Repeating and Extending Reverse;
6: Implementation and Software;
7: Sparse Forward and Reverse;
8: Exploiting Sparsity by Compression;
9: Going beyond Forward and Reverse;
10: Jacobian and Hessian Accumulation;
11: Observations on Efficiency;
12: Reversal Schedules and Checkpointing;
13: Taylor and Tensor Coefficients;
14: Differentiation without Differentiability;
15: Implicit and Iterative Differentiation;
List of Figures;
List of Tables;
Assumptions and Definitions;
Propositions, Corollaries, and Lemmas;