Hierarchical clustering is a fundamental unsupervised machine learning task
with the aim of organizing data into a hierarchy of clusters. Many applications
of hierarchical clustering involve sensitive user information, therefore
motivating recent studies on differentially private hierarchical clustering
under the rigorous framework of Dasgupta’s objective. Tuttavia, it has been
shown that any privacy-preserving algorithm under edge-level differential
privacy necessarily suffers a large error. To capture practical applications of
this problem, we focus on the weight privacy model, where each edge of the
input graph is at least unit weight. We present a novel algorithm in the weight
privacy model that shows significantly better approximation than known
impossibility results in the edge-level DP setting. In particular, our
algorithm achieves $O(\log^{1.5}n/\varepsilon)$ multiplicative error for
$\varepsilon$-DP and runs in polynomial time, where $n$ is the size of the
input graph, and the cost is never worse than the optimal additive error in
existing work. We complement our algorithm by showing if the unit-weight
constraint does not apply, the lower bound for weight-level DP hierarchical
clustering is essentially the same as the edge-level DP, i.e.
$\Omega(n^2/\varepsilon)$ additive error. As a result, we also obtain a new
lower bound of $\tilde{\Omega}(1/\varepsilon)$ additive error for balanced
sparsest cuts in the weight-level DP model, which may be of independent
interest. Finally, we evaluate our algorithm on synthetic and real-world
datasets. Our experimental results show that our algorithm performs well in
terms of extra cost and has good scalability to large graphs.
Questo articolo esplora i giri e le loro implicazioni.
Scarica PDF:
2504.15580v1