|国家预印本平台
首页|Correct Black-Box Monitors for Distributed Deadlock Detection: Formalisation and Implementation (Technical Report)

Correct Black-Box Monitors for Distributed Deadlock Detection: Formalisation and Implementation (Technical Report)

Correct Black-Box Monitors for Distributed Deadlock Detection: Formalisation and Implementation (Technical Report)

来源:Arxiv_logoArxiv
英文摘要

Many software applications rely on concurrent and distributed (micro)services that interact via message-passing and various forms of remote procedure calls (RPC). As these systems organically evolve and grow in scale and complexity, the risk of introducing deadlocks increases and their impact may worsen: even if only a few services deadlock, many other services may block while awaiting responses from the deadlocked ones. As a result, the "core" of the deadlock can be obfuscated by its consequences on the rest of the system, and diagnosing and fixing the problem can be challenging. In this work we tackle the challenge by proposing distributed black-box monitors that are deployed alongside each service and detect deadlocks by only observing the incoming and outgoing messages, and exchanging probes with other monitors. We present a formal model that captures popular RPC-based application styles (e.g., gen_servers in Erlang/OTP), and a distributed black-box monitoring algorithm that we prove sound and complete (i.e., identifies deadlocked services with neither false positives nor false negatives). We implement our results in a tool called DDMon for the monitoring of Erlang/OTP applications, and we evaluate its performance. This is the first work that formalises, proves the correctness, and implements distributed black-box monitors for deadlock detection. Our results are mechanised in Coq. DDMon is the companion artifact of this paper.

Radosław Jan Rowicki、Adrian Francalanza、Alceste Scalas

10.1145/3763069

计算技术、计算机技术

Radosław Jan Rowicki,Adrian Francalanza,Alceste Scalas.Correct Black-Box Monitors for Distributed Deadlock Detection: Formalisation and Implementation (Technical Report)[EB/OL].(2025-08-26)[2025-09-02].https://arxiv.org/abs/2508.14851.点此复制

评论