Reinforcement Learning Solution to a Benchmark Time-Optimal Control Problem

November 2, 2007
Sanjay Joshi, Mechanical and Aeronautical Engineering, University of California, Davis

Reinforcement learning methods originated with reward-punishment studies in psychology, and were then extended to machine learning algorithms. The advantage of reinforcement learning methods is that they do not require any knowledge of a systems dynamics, and use experience gained from interaction with the actual system (or simulation thereof) to obtain control solutions. In this paper, we apply traditional RL to a well-known simply-posed minimum time optimal control problem using the Sarsa-Lambda reinforcement learning method. It is well-known by control researchers that the true analytic optimal solution is a bang-bang solution. In fact, analytical proof of optimality for Sarsa-Lambda has yet to be achieved for either discrete state or continuous state optimal control problems (though it is an active area of research). The current study showed that Sarsa-Lambda did produce nearly-optimal bang-bang results for the given benchmark problem-without any explicit a-priori knowledge of the system dynamics. However, generalization of the numerical solution from a single initial condition to other initial conditions was not immediate.

Sanjay Joshi joined the Mechanical and Aeronautical Engineering Department at UC Davis in 2001. Currently, he directs the Robotics, Autonomous Systems, and Controls Laboratory (RASCAL). He graduated with a B.S. from Cornell University in 1990, and a Ph.D. in Control Systems from UCLA in 1996. From 1991-1994 and 1996-2000, he was a Member of the Technical Staff at NASA's Jet Propulsion Laboratory in the Guidance and Control Analysis Group. While at JPL, he participated in several NASA programs including NASA's Deep Space I (which tested deep space Ion Propulsion and took photographs of a comet's core), NASA's Topex/Poseidon Mission (which measures the height of the world's oceans for meteorological study of Earth), the NASA Origins Program (next generation telescopes), and the Mars Robotics Program (in cooperating autonomous rovers). From 2000-2001, he was a Visiting Assistant Professor of Engineering at Harvey Mudd College in Claremont, California. He is a Senior Member of the AIAA and a Member of the IEEE. From 1999-2003, he served on the Conference Editorial Board of the IEEE Control Systems Society. Currently, he is a member of the AIAA Guidance, Navigation, and Controls Technical Committee (-2010).