Reinforcement Learning and Optimal Control

Dimitri Bertsekas




This book considers large and challenging multistage decision problems, which can be solved in principle by dynamic programming, but their exact solution is computationally intractable. It can be used as a textbook or for self-study in conjunction with instructional videos and slides, and other supporting material, which are available from the author's website. The book discusses solution methods that rely on approximations to produce suboptimal policies with adequate performance. These methods are known by several essentially equivalent names: reinforcement learning, approximate dynamic programming, and neuro-dynamic programming. They underlie, among others, the recent impressive successes of self-learning in the context of games such as chess and Go. One of the aims of the book is to explore the common boundary between artificial intelligence and optimal control, and to form a bridge that is accessible by workers with background in either field. Another aim is to organize coherently the broad mosaic of methods that have proved successful in practice while having a solid theoretical and/or logical foundation. This may help researchers and practitioners to find their way through the maze of competing ideas that constitute the current state of the art. The mathematical style of this book is somewhat different than other books by the same author. While we provide a rigorous, albeit short, mathematical account of the theory of finite and infinite horizon dynamic programming, and some fundamental approximation methods, we rely more on intuitive explanations and less on proof-based insights. We also illustrate the methodology with many example algorithms and applications.


Dimitri Bertsekas is McAffee Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology, and a member of the National Academy of Engineering. He has researched a broad variety of subjects from optimization theory, control theory, parallel and distributed computation, systems analysis, and data communication networks. He has written numerous papers in each of these areas, and he has authored or coauthored seventeen textbooks. Professor Bertsekas was awarded the INFORMS 1997 Prize for Research Excellence in the Interface Between Operations Research and Computer Science for his book "Neuro-Dynamic Programming", the 2001 ACC John R. Ragazzini Education Award, the 2009 INFORMS Expository Writing Award, the 2014 ACC Richard E. Bellman Control Heritage Award for "contributions to the foundations of deterministic and stochastic optimization-based methods in systems and control," the 2014 Khachiyan Prize for Life-Time Accomplishments in Optimization, and the 2015 George B. Dantzig Prize. In 2018, he was awarded, jointly with his coauthor John Tsitsiklis, the INFORMS John von Neumann Theory Prize, for the contributions of the research monographs "Parallel and Distributed Computation" and "Neuro-Dynamic Programming". In 2001, he was elected to the United States National Academy of Engineering for "pioneering contributions to fundamental research, practice and education of optimization/control theory".