We show that both clustering and subspace embeddings can be performed in the
streaming model with the same asymptotic efficiency as in the central/offline
setting.
For $(k, z)$-clustering in the streaming model, we achieve a number of words
of memory which is independent of the number $n$ of input points and the aspect
ratio $\Delta$, yielding an optimal bound of
$\tilde{\mathcal{O}}\left(\frac{dk}{\min(\varepsilon^4,\varepsilon^{z+2})}\right)$
words for accuracy parameter $\varepsilon$ on $d$-dimensional points.
Additionally, we obtain amortized update time of
$d\,\log(k)\cdot\text{polylog}(\log(n\Delta))$, which is an exponential
improvement over the previous $d\,\text{poly}(k,\log(n\Delta))$. Our method
also gives the fastest runtime for $(k,z)$-clustering even in the offline
setting.
For subspace embeddings in the streaming model, we achieve $\mathcal{O}(d)$
update time and space-optimal constructions, using
$\tilde{\mathcal{O}}\left(\frac{d^2}{\varepsilon^2}\right)$ words for $p\le 2$
and $\tilde{\mathcal{O}}\left(\frac{d^{p/2+1}}{\varepsilon^2}\right)$ words for
$p>2$, showing that streaming algorithms can match offline algorithms in both
space and time complexity.
Dieser Artikel untersucht Zeitreisen und deren Auswirkungen.
PDF herunterladen:
2504.16229v1