Fitting distances to tree metrics and ultrametrics are two widely used
methods in hierarchical clustering, primarily explored within the context of
numerical taxonomy. Given a positive distance function
$D:\binom{V}{2}\rightarrow\mathbb{R}_{>0}$, the goal is to find a tree (ou
ultrametric) $T$ including all elements of set $V$ such that the difference
between the distances among vertices in $T$ and those specified by $D$ is
minimized. In this paper, we initiate the study of ultrametric and tree metric
fitting problems in the semi-streaming model, where the distances between pairs
of elements from $V$ (with $|V|=n$), defined by the function $D$, can arrive in
an arbitrary order. We study these problems under various distance norms:
For the $\ell_0$ objective, we provide a single-pass polynomial-time
$\tilde{O}(n)$-space $O(1)$ approximation algorithm for ultrametrics and prove
that no single-pass exact algorithm exists, even with exponential time.
Next, we show that the algorithm for $\ell_0$ implies an $O(\Delta/\delta)$
approximation for the $\ell_1$ objective, where $\Delta$ is the maximum and
$\delta$ is the minimum absolute difference between distances in the input.
This bound matches the best-known approximation for the RAM model using a
combinatorial algorithm when $\Delta/\delta=O(n)$.
For the $\ell_\infty$ objective, we provide a complete characterization of
the ultrametric fitting problem. We present a single-pass polynomial-time
$\tilde{O}(n)$-space 2-approximation algorithm and show that no better than
2-approximation is possible, even with exponential time. We also show that,
with an additional pass, it is possible to achieve a polynomial-time exact
algorithm for ultrametrics.
Finally, we extend the results for all these objectives to tree metrics by
using only one additional pass through the stream and without asymptotically
increasing the approximation factor.
Cet article explore les excursions dans le temps et leurs implications.
Télécharger PDF:
2504.17776v1