Federated learning (FL) is a machine learning paradigm that facilitates
massively distributed model training with end-user data on edge devices
directed by a central server. However, the large number of heterogeneous
clients in FL deployments leads to a communication bottleneck between the
server and the clients. This bottleneck is made worse by straggling clients,
any one of which will further slow down training. To tackle these challenges,
researchers have proposed techniques like client sampling and update
compression. These techniques work well in isolation but combine poorly in the
downstream, server-to-client direction. This is because unselected clients have
outdated local model states and need to synchronize these states with the
server first.
We introduce FedFetch, a strategy to mitigate the download time overhead
caused by combining client sampling and compression techniques. FedFetch
achieves this with an efficient prefetch schedule for clients to prefetch model
states multiple rounds before a stated training round. We empirically show that
adding FedFetch to communication efficient FL techniques reduces end-to-end
training time by 1.26$\times$ and download time by 4.49$\times$ across
compression techniques with heterogeneous client settings. Our implementation
is available at https://github.com/DistributedML/FedFetch
Este artículo explora los viajes en el tiempo y sus implicaciones.
Descargar PDF:
2504.15366v1