EEC256 – Stochastic Optimization in Dynamic Systems

4 units – Spring Quarter; alternate years

Lecture: 4 hours

Prerequisite: EEC 260 or equivalent

Grading: Letter; homework, classwork, and exams as determined by instructor.

Catalog Description:

Markov Decision Processes (MDP), dynamic programming, multi-armed bandit and restless bandit, Partially observable MDP, optimal stopping, stochastic scheduling, sequential detection and quickest change detection, competitive MDP and game theory, applications in dynamic systems such as queueing networks, communication networks, and social economic systems.

Expanded Course Description:

  1. Review of Markov Theory
    1. Classification of states: transience vs. recurrence
    2. Stationary distribution and ergodicity
    3. Applications in dynamic systems: stability analysis of queueing networks
  2. Fundamentals of Markov Decision Processes (MDP)
    1. Finite-horizon MDP and dynamic programming
    2. Random-horizon MDP: stochastic shortest path and optimal stopping
    3. Infinite-horizon MDP under discounted and average reward criteria
  3. Special Classes of MDP and Sequential Stochastic Optimization
    1. Multi-armed bandit and restless bandit problems
    2. Partially observable MDP
    3. Sequential detection and quickest change detection
    4. Stochastic scheduling
  4. Introduction to Competitive MDP and Game Theory
    1. Static games and finite dynamic games
    2. Competitive MDP and stochastic games


  1. Markov Decision Processes: Discrete Stochastic Dynamic Programming, by M.L. Puterman, Wiley, 2005.
  2. Introduction to Stochastic Dynamic Programming, by S.M. Ross, Academic Press, 1995.
  3. Markov Chains, by J.R. Norris, Cambridge University Press, 1997

Instructor: Zhao


Last revised: Spring 2012