Convergence Analysis of the Last Iterate in Distributed Stochastic Gradient Descent with Momentum
Convergence Analysis of the Last Iterate in Distributed Stochastic Gradient Descent with Momentum
Distributed stochastic gradient methods are widely used to preserve data privacy and ensure scalability in large-scale learning tasks. While existing theory on distributed momentum Stochastic Gradient Descent (mSGD) mainly focuses on time-averaged convergence, the more practical last-iterate convergence remains underexplored. In this work, we analyze the last-iterate convergence behavior of distributed mSGD in non-convex settings under the classical Robbins-Monro step-size schedule. We prove both almost sure convergence and $L_2$ convergence of the last iterate, and derive convergence rates. We further show that momentum can accelerate early-stage convergence, and provide experiments to support our theory.
Difei Cheng、Ruinan Jin、Hong Qiao、Bo Zhang
计算技术、计算机技术
Difei Cheng,Ruinan Jin,Hong Qiao,Bo Zhang.Convergence Analysis of the Last Iterate in Distributed Stochastic Gradient Descent with Momentum[EB/OL].(2025-05-16)[2025-07-16].https://arxiv.org/abs/2505.10889.点此复制
评论