|国家预印本平台
首页|GALA: Can Graph-Augmented Large Language Model Agentic Workflows Elevate Root Cause Analysis?

GALA: Can Graph-Augmented Large Language Model Agentic Workflows Elevate Root Cause Analysis?

GALA: Can Graph-Augmented Large Language Model Agentic Workflows Elevate Root Cause Analysis?

来源:Arxiv_logoArxiv
英文摘要

Root cause analysis (RCA) in microservice systems is challenging, requiring on-call engineers to rapidly diagnose failures across heterogeneous telemetry such as metrics, logs, and traces. Traditional RCA methods often focus on single modalities or merely rank suspect services, falling short of providing actionable diagnostic insights with remediation guidance. This paper introduces GALA, a novel multi-modal framework that combines statistical causal inference with LLM-driven iterative reasoning for enhanced RCA. Evaluated on an open-source benchmark, GALA achieves substantial improvements over state-of-the-art methods of up to 42.22% accuracy. Our novel human-guided LLM evaluation score shows GALA generates significantly more causally sound and actionable diagnostic outputs than existing methods. Through comprehensive experiments and a case study, we show that GALA bridges the gap between automated failure diagnosis and practical incident resolution by providing both accurate root cause identification and human-interpretable remediation guidance.

Yifang Tian、Yaming Liu、Zichun Chong、Zihang Huang、Hans-Arno Jacobsen

计算技术、计算机技术

Yifang Tian,Yaming Liu,Zichun Chong,Zihang Huang,Hans-Arno Jacobsen.GALA: Can Graph-Augmented Large Language Model Agentic Workflows Elevate Root Cause Analysis?[EB/OL].(2025-08-17)[2025-09-05].https://arxiv.org/abs/2508.12472.点此复制

评论