首页|Multi-Timescale Gradient Sliding for Distributed Optimization

Multi-Timescale Gradient Sliding for Distributed Optimization

来源：

英文摘要

We propose two first-order methods for convex, non-smooth, distributed optimization problems, hereafter called Multi-Timescale Gradient Sliding (MT-GS) and its accelerated variant (AMT-GS). Our MT-GS and AMT-GS can take advantage of similarities between (local) objectives to reduce the communication rounds, are flexible so that different subsets (of agents) can communicate at different, user-picked rates, and are fully deterministic. These three desirable features are achieved through a block-decomposable primal-dual formulation, and a multi-timescale variant of the sliding method introduced in Lan et al. (2020), Lan (2016), where different dual blocks are updated at potentially different rates. To find an $\epsilon$-suboptimal solution, the complexities of our algorithms achieve optimal dependency on $\epsilon$: MT-GS needs $O(\overline{r}A/\epsilon)$ communication rounds and $O(\overline{r}/\epsilon^2)$ subgradient steps for Lipchitz objectives, and AMT-GS needs $O(\overline{r}A/\sqrt{\epsilon\mu})$ communication rounds and $O(\overline{r}/(\epsilon\mu))$ subgradient steps if the objectives are also $\mu$-strongly convex. Here, $\overline{r}$ measures the ``average rate of updates'' for dual blocks, and $A$ measures similarities between (subgradients of) local functions. In addition, the linear dependency of communication rounds on $A$ is optimal (Arjevani and Shamir 2015), thereby providing a positive answer to the open question whether such dependency is achievable for non-smooth objectives (Arjevani and Shamir 2015).

作者：Junhui Zhang、Patrick Jaillet

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Junhui Zhang,Patrick Jaillet.Multi-Timescale Gradient Sliding for Distributed Optimization[EB/OL].(2025-06-18)[2025-07-25].https://arxiv.org/abs/2506.15387.点此复制

Multi-Timescale Gradient Sliding for Distributed Optimization

Multi-Timescale Gradient Sliding for Distributed Optimization

评论