首页|Enhancing Network Failure Mitigation with Performance-Aware Ranking

Enhancing Network Failure Mitigation with Performance-Aware Ranking

来源：

英文摘要

Cloud providers install mitigations to reduce the impact of network failures within their datacenters. Existing network mitigation systems rely on simple local criteria or global proxy metrics to determine the best action. In this paper, we show that we can support a broader range of actions and select more effective mitigations by directly optimizing end-to-end flow-level metrics and analyzing actions holistically. To achieve this, we develop novel techniques to quickly estimate the impact of different mitigations and rank them with high fidelity. Our results on incidents from a large cloud provider show orders of magnitude improvements in flow completion time and throughput. We also show our approach scales to large datacenters.

作者：Pooria Namyar、Arvin Ghavidel、Daniel Crankshaw、Daniel S. Berger、Kevin Hsieh、Srikanth Kandula、Ramesh Govindan、Behnaz Arzani

作者单位：

学科分类：通信无线通信

推荐引用：Pooria Namyar,Arvin Ghavidel,Daniel Crankshaw,Daniel S. Berger,Kevin Hsieh,Srikanth Kandula,Ramesh Govindan,Behnaz Arzani.Enhancing Network Failure Mitigation with Performance-Aware Ranking[EB/OL].(2025-06-23)[2025-07-16].https://arxiv.org/abs/2305.13792.点此复制

Enhancing Network Failure Mitigation with Performance-Aware Ranking

Enhancing Network Failure Mitigation with Performance-Aware Ranking

评论