FailLite: Failure-Resilient Model Serving for Resource-Constrained Edge Environments
FailLite: Failure-Resilient Model Serving for Resource-Constrained Edge Environments
Model serving systems have become popular for deploying deep learning models for various latency-sensitive inference tasks. While traditional replication-based methods have been used for failure-resilient model serving in the cloud, such methods are often infeasible in edge environments due to significant resource constraints that preclude full replication. To address this problem, this paper presents FailLite, a failure-resilient model serving system that employs (i) a heterogeneous replication where failover models are smaller variants of the original model, (ii) an intelligent approach that uses warm replicas to ensure quick failover for critical applications while using cold replicas, and (iii) progressive failover to provide low mean time to recovery (MTTR) for the remaining applications. We implement a full prototype of our system and demonstrate its efficacy on an experimental edge testbed. Our results using 27 models show that FailLite can recover all failed applications with 175.5ms MTTR and only a 0.6% reduction in accuracy.
Li Wu、Walid A. Hanafy、Tarek Abdelzaher、David Irwin、Jesse Milzman、Prashant Shenoy
计算技术、计算机技术
Li Wu,Walid A. Hanafy,Tarek Abdelzaher,David Irwin,Jesse Milzman,Prashant Shenoy.FailLite: Failure-Resilient Model Serving for Resource-Constrained Edge Environments[EB/OL].(2025-04-22)[2025-05-21].https://arxiv.org/abs/2504.15856.点此复制
评论