Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design
- Jiaqi Yang, Yao Class
- Time: 2021-04-08 11:00
- Host: Turing Class Research Committee
- Venue: Room 101, Courtyard No.5, Jingyuan+Online Talk
In large-scale online learning, function approximation on contextual information was necessary, while frequently switching policy could hinder parallelism. Motivated by this dilemma, we studied the the minimum necessary adaptivity to achieve minimax optimal regret in linear bandits. In this talk, I will discuss about our paper (arXiv:2007.01980) on this problem. We showed a surprising separation on how the minimum necessary adaptivity scaled with the time horizon T . Specifically, we proved that it scaled with θ(logT) for adversarial contexts, and with θ(loglogT) for stochastic contexts. Our results improved upon the previous state-of-the-art bounds, which were [Ω(loglog T), O(logT)] for both stochastic and adversarial cases, and gave a complete picture about the adaptivity constraints in linear bandits. Along the way, we proposed the distributional optimal design, a natural extension of the optimal experiment design, and provided a both statistically and computationally efficient learning algorithm for the problem, which could be of independent interest. The paper was jointly done with Yufei Ruan and Yuan Zhou, both from UIUC, and has been accepted to STOC 2021.
Jiaqi Yang is currently a senior undergraduate at Tsinghua University. He is broadly interested in the theoretical foundations of machine learning. His research results have been accepted to top conferences such as STOC, ICLR, AISTATS. Prior to college, he received a silver medal from the International Olympiad in Informatics (IOI) 2017, and a gold medal from the Chinese National Olympiad in Informatics (NOI) 2016.
- Online link: https://meeting.tencent.com/s/jy0DhMvrgCcB