Model serving systems have become popular for deploying deep learning models
for various latency-sensitive inference tasks. While traditional
replication-based methods have been used for failure-resilient model serving in
the cloud, such methods are often infeasible in edge environments due to
significant resource constraints that preclude full replication. To address
this problem, this paper presents FailLite, a failure-resilient model serving
system that employs (i) a heterogeneous replication where failover models are
smaller variants of the original model, (ii) an intelligent approach that uses
warm replicas to ensure quick failover for critical applications while using
cold replicas, et (iii) progressive failover to provide low mean time to
recovery (MTTR) for the remaining applications. We implement a full prototype
of our system and demonstrate its efficacy on an experimental edge testbed. Our
results using 27 models show that FailLite can recover all failed applications
with 175.5ms MTTR and only a 0.6% reduction in accuracy.
Cet article explore les excursions dans le temps et leurs implications.
Télécharger PDF:
2504.15856v1