CFCS Youth Talks

Efficient Mining of Big Data Using Coresets

  • Dr. Shaofeng Jiang, Aalto University
  • Time: 2021-03-04 15:00
  • Host: Dr. Kuan Cheng
  • Venue: Online Talk


Big data has been the foundation of many modern applications. However, data sets are growing in a rate that classical algorithms cannot handle, and new computational models, such as distributed and streaming algorithms, are considered. A common approach to design algorithms for those models is algorithm-driven: examine each existing algorithm and adapt it to new models in an ad-hoc way.

We consider a different, data-driven approach, where the big data is reduced to a small summary on which existing classical algorithms run efficiently. This data-reduction idea is often implemented as a coreset, which is a subset of data that preserves the error of a certain type of queries.

Among the many applications of coresets, coresets for clustering have been a very fruitful research direction. In this talk, we give a general introduction to coresets, and we discuss recent advances of coresets for clustering problems. We conclude with future directions.


Shaofeng Jiang is currently an assistant professor at Aalto University. Before he joined Aalto, he has been working as a postdoctoral researcher with Robert Krauthgamer at the Weizmann Institute of Science during 2017 - 2020, and he obtained his PhD from the University of Hong Kong in 2017. His research interest is theoretical computer science, with an emphasis on algorithms for massive data sets, approximation algorithms, and online algorithms. His works have been regularly published in prestigious venues such as FOCS, SODA, ICML and NeurIPS. He was a recipient of an MSRA Fellowship Nomination Award, and an Outstanding Achievements in Postdoctoral Research Prize at the Weizmann Institute of Science.


  • Admmission