Events
Events
CS Peer Talks

Near Linear Time Approximation Schemes for Clustering of Partially Doubling Metrics

  • Di Yue, University of Toronto
  • Time: 2026-05-15 15:00
  • Host: Turing Class Research Committee
  • Venue: Room 204, Courtyard No.5, Jingyuan

Abstract

In the metric $k$-median problem we are given a finite metric space $(X\cup Y, \mathbf{d})$ and the objective is to compute a set of $k$ centers $C\subseteq Y$ that minimizes $\sum_{p\in X} \min_{c\in C} \mathbf{d}(p,c)$. In general metric spaces, the best polynomial time algorithm, which is due to Cohen-Addad, Grandoni, Lee, Schwiegelshohn, and Svensson, computes a $(2+\epsilon)$-approximation for arbitrary constant $\epsilon>0$. However, if the metric space has bounded doubling dimension, a near linear time $(1+\epsilon)$-approximation algorithm is known due to the work of Cohen-Addad, Feldmann, and Saulpic. In this paper, we show that the $(1+\epsilon)$-approximation algorithm can be generalized to the case when either $X$ or $Y$ has bounded doubling dimension (but the other set not). The case when $X$ has bounded doubling dimension is motivated by the assumption that even though $X$ is part of a high-dimensional space, it may be that it is close to a low-dimensional structure. The case when $Y$ has bounded doubling dimension is perhaps more natural. It is motivated by specific clustering problems where the centers are low-dimensional. Specifically, our work in this setting implies the first near linear time approximation algorithm for the $(k,\ell)$-median problem under discrete Fr\'echet distance when $\ell$ is constant. In order to solve the case when $Y$ has a bounded doubling dimension, we introduce a form of dimension reduction that replaces points from $X$ by sets of points in $Y$. To solve the case when $X$ has a bounded doubling dimension, we generalize Talwar's decomposition of doubling metrics to our setting. The running time of our algorithms is $2^{2^t} \tilde O(n+m)$ where $t=O(\mathrm{ddim} \log \frac{\mathrm{ddim}}{\epsilon})$ and where $\mathrm{ddim}$ is the doubling dimension of $X$ (resp.\ $Y$). The results also extend to the metric (uncapacitated) facility location problem. We believe that our techniques are likely applicable to other problems. Based on the joint work with Anne Driemel, Jan Höckendorff, Ioannis Psarros and Christian Sohler.

Biography

59fc8029c0ad61f25239d6a76758e75c.png

Di Yue is an incoming PhD student in the theory group at University of Toronto, advised by Aleksandar Nikolov. Previously, he obtained his BS degree from Peking University in 2025. His research interest lies in theoretical computer science, with current focus on discrepancy theory. He is also interested in computational problems related to metric spaces and high-dimensional geometry.