Nested and outlier embeddings into trees
arXiv:2601.15470v1 Announce Type: new
Abstract: In this paper, we consider outlier embeddings into HSTs and ultrametrics. In particular, for $(X,d)$, let $k$ be the size of the smallest subset of $X$ such that all but that subset (i.e. the “outlier set”) can be probabilistically embedded into the space of HSTs with expected distortion at most $c$. Our primary result is showing that there exists an efficient algorithm that takes in $(X,d)$ and a target distortion $c$ and samples from a probabilistic embedding with at most $O(frac k epsilon log^2k)$ outliers and distortion at most $(32+epsilon)c$, for any $epsilon>0$. This leads to better instance-specific approximations for certain instances of the buy-at-bulk and dial-a-ride problems, whose current best approximation algorithms go through HST embeddings.
In order to facilitate our results, we largely focus on the concept of compositions of nested embeddings introduced by [Chawla and Sheridan 2024]. A nested embedding is a composition of two embeddings of a metric space $(X,d)$ — a low distortion embedding of a subset $S$ of nodes, and a higher distortion embedding of the entire metric. The composition is a single embedding that preserves the low distortion over $S$ and does not increase distortion over the remaining points by much. In this paper, we expand this concept from the setting of deterministic embeddings to the setting of probabilistic embeddings. We show how to find good nested compositions of embeddings into HSTs, and combine this with an approximation algorithm of [Munagala et al. 2023] to obtain our results.