Near Linear Time Approximation Schemes for Clustering of Partially Doubling Metrics

Events

Home > Events > CS Peer Talks

CS Peer Talks

Near Linear Time Approximation Schemes for Clustering of Partially Doubling Metrics

Di Yue, University of Toronto
Time: 2026-05-15 15:00
Host: Turing Class Research Committee
Venue: Room 204, Courtyard No.5, Jingyuan

Abstract

In the metric $k$-median problem we are given a finite metric space $(X\cup Y, \mathbf{d})$ and the objective is to compute a set of $k$ centers $C\subseteq Y$ that minimizes $\sum_{p\in X} \min_{c\in C} \mathbf{d}(p,c)$. In general metric spaces, the best polynomial time algorithm, which is due to Cohen-Addad, Grandoni, Lee, Schwiegelshohn, and Svensson, computes a $(2+\epsilon)$-approximation for arbitrary constant $\epsilon>0$. However, if the metric space has bounded doubling dimension, a near linear time $(1+\epsilon)$-approximation algorithm is known due to the work of Cohen-Addad, Feldmann, and Saulpic. In this paper, we show that the $(1+\epsilon)$-approximation algorithm can be generalized to the case when either $X$ or $Y$ has bounded doubling dimension (but the other set not). The case when $X$ has bounded doubling dimension is motivated by the assumption that even though $X$ is part of a high-dimensional space, it may be that it is close to a low-dimensional structure. The case when $Y$ has bounded doubling dimension is perhaps more natural. It is motivated by specific clustering problems where the centers are low-dimensional. Specifically, our work in this setting implies the first near linear time approximation algorithm for the $(k,\ell)$-median problem under discrete Fr\'echet distance when $\ell$ is constant. In order to solve the case when $Y$ has a bounded doubling dimension, we introduce a form of dimension reduction that replaces points from $X$ by sets of points in $Y$. To solve the case when $X$ has a bounded doubling dimension, we generalize Talwar's decomposition of doubling metrics to our setting. The running time of our algorithms is $2^{2^t} \tilde O(n+m)$ where $t=O(\mathrm{ddim} \log \frac{\mathrm{ddim}}{\epsilon})$ and where $\mathrm{ddim}$ is the doubling dimension of $X$ (resp.\ $Y$). The results also extend to the metric (uncapacitated) facility location problem. We believe that our techniques are likely applicable to other problems. Based on the joint work with Anne Driemel, Jan Höckendorff, Ioannis Psarros and Christian Sohler.

Biography

Di Yue is an incoming PhD student in the theory group at University of Toronto, advised by Aleksandar Nikolov. Previously, he obtained his BS degree from Peking University in 2025. His research interest lies in theoretical computer science, with current focus on discrepancy theory. He is also interested in computational problems related to metric spaces and high-dimensional geometry.