|国家预印本平台
首页|Convergence Analysis of the Last Iterate in Distributed Stochastic Gradient Descent with Momentum

Convergence Analysis of the Last Iterate in Distributed Stochastic Gradient Descent with Momentum

Convergence Analysis of the Last Iterate in Distributed Stochastic Gradient Descent with Momentum

来源:Arxiv_logoArxiv
英文摘要

Distributed stochastic gradient methods are widely used to preserve data privacy and ensure scalability in large-scale learning tasks. While existing theory on distributed momentum Stochastic Gradient Descent (mSGD) mainly focuses on time-averaged convergence, the more practical last-iterate convergence remains underexplored. In this work, we analyze the last-iterate convergence behavior of distributed mSGD in non-convex settings under the classical Robbins-Monro step-size schedule. We prove both almost sure convergence and $L_2$ convergence of the last iterate, and derive convergence rates. We further show that momentum can accelerate early-stage convergence, and provide experiments to support our theory.

Difei Cheng、Ruinan Jin、Hong Qiao、Bo Zhang

计算技术、计算机技术

Difei Cheng,Ruinan Jin,Hong Qiao,Bo Zhang.Convergence Analysis of the Last Iterate in Distributed Stochastic Gradient Descent with Momentum[EB/OL].(2025-05-16)[2025-07-16].https://arxiv.org/abs/2505.10889.点此复制

评论