Improving Policy Optimization: Algorithms and Foundations
- Baoxiang Wang, the Chinese University of Hong Kong
- Time: 2020-04-05 14:00
- Host: Dr. Yuqing Kong
- Venue: Online Talk
Reinforcement learning (RL) studies algorithmic approaches to optimize the policy in sequential decision processes. The recent success of RL in a variety of applications has demonstrated its usefulness but also leaves room for improvements. In this talk we discuss methods for variance reduction for high-dimensional action spaces, aiming to prevent the sample complexity from growing exponentially in the number of dimensions. The divide-and-conquer technique we used to achieve this is very general to be applied to other areas. Beyond these algorithmic studies we present our first step toward understanding sequential decisions, through a classic example of the Gambler's problem.
Baoxiang Wang is a sixth-year PhD student at the Department of Computer Science and Engineering, The Chinese University of Hong Kong. He is advised by Siu On Chan and Andrej Bogdanov. During his PhD, he spent a year in Edmonton visiting a joint lab by University of Alberta and RBC Institute of Research. He obtained his bachelor's degree at the School of Information Security, Shanghai Jiao Tong University. His research interest lies on reinforcement learning, online learning, and learning theory.