You are here

From Shortest Paths to Reinforcement Learning

Paolo Brandimarte
Publication Date: 
Number of Pages: 
EURO Advanced Tutorials on Operational Research
[Reviewed by
Bill Satzer
, on
Richard Bellman developed the method of dynamic programming in the 1950's. It is an optimization technique and -- despite its name -- is not associated with any algorithm, software or computer implementation. Instead, it is a very general and flexible principle that can be applied to optimization problems that breaks a problem into progressively smaller subproblems and links them together recursively. Bellman's principle of optimality says that, in policy decisions, whatever the initial state and decision might be, the remaining decisions should be optimal with respect to the state following the first decision.
The current book is part of an advanced tutorial series. It aims to introduce the concept of dynamic programming and offer an overview of the kinds of modeling and applications where it can be applied. Dynamic programming (DP) has been used in a very broad range of applications from operations research to economics, control theory and machine learning. The author introduces DP by using it to find the shortest path on a directed graph. While this problem has other more efficient solutions using other methods, this simple example gives a good illustration of how DP works.
The idea of system state is central to DP, which requires a mathematical model to describe its evolution over time including state variables, decision variables and external inputs. The state of the system at time t+1 should depend on the state observed at time t, the decision made at time t after observing the state, and the influence of external inputs during the subsequent interval. The author treats problems that are continuous and discrete, deterministic and stochastic, finite horizon (requiring a finite number of decisions) and infinite horizon (where there is no clear termination condition.) The author presents many examples that incorporate these variations. 
Dynamic programming offers the advantage of great versatility, power, and generality but often requires considerable effort to develop an effective implementation. Sometimes DP can computationally expensive when the dimensionality of the problem is large, but the author offers some approximation strategies that can be quite effective. Reinforcement learning (an approach midway between supervised and unsupervised learning) provides one clear example and is described in the last part of the book.
Each chapter includes suggestions for further reading and a thorough bibliography. There are no exercises. The mathematical level is roughly at the upper undergraduate level. MATLAB code is provided throughout to give examples of experimenting with and implementing DP algorithms.
The author describes a lot of variations of dynamic programming. There are so many, each presented in only a few pages, that a reader who is new to the subject might find the author’s treatment very challenging.


Bill Satzer (, now retired from 3M Company, spent most of his career as a mathematician working in industry on a variety of applications. He did his PhD work in dynamical systems and celestial mechanics.