CS Peer Talks

Last-Iterate Convergence for Learning in Markov Games with Bandit Feedback

  • Weiqiang Zheng, Yale University
  • Time: 2024-01-13 16:00
  • Host: Turing Class Research Committee
  • Venue: Room 204, Courtyard No.5, Jingyuan


Online learning in multi-player games captures many modern machine learning applications, ranging from generative adversarial networks and adversarial training to robust optimization and multi-agent reinforcement learning. Understanding last-iterate convergence in games is crucial since the last iterate characterizes the stability of the learning process and is widely used in practice.

In this talk, we study the problem of learning in two-player zero-sum Markov games, focusing on developing decentralized learning algorithms with non-asymptotic last-iterate convergence rates to Nash equilibrium. We first present a simple algorithm with $O(t^{-1/8})$ last-iterate convergence rate in two-player zero-sum matrix games with bandit feedback. To the best of our knowledge, this is the first result that obtains finite last-iterate convergence rate given access to only bandit feedback. We then extend our result to the setting of two-player zero-sum Markov games, providing the first set of decentralized algorithms with non-asymptotic last-iterate/path convergence rates. This talk is based on joint work with Yang Cai, Haipeng Luo, and Chen-Yu Wei.



Weiqiang Zheng is a third-year PhD student in Computer Science at Yale University, advised by Prof. Yang Cai. He received his bachelor's degree in Computer Science from Turing Class at Peking University. He has a broad research interest in game theory, online learning, and optimization.