|国家预印本平台
首页|Online Learning-guided Learning Rate Adaptation via Gradient Alignment

Online Learning-guided Learning Rate Adaptation via Gradient Alignment

Online Learning-guided Learning Rate Adaptation via Gradient Alignment

来源:Arxiv_logoArxiv
英文摘要

The performance of an optimizer on large-scale deep learning models depends critically on fine-tuning the learning rate, often requiring an extensive grid search over base learning rates, schedules, and other hyperparameters. In this paper, we propose a principled framework called GALA (Gradient Alignment-based Learning rate Adaptation), which dynamically adjusts the learning rate by tracking the alignment between consecutive gradients and using a local curvature estimate. Guided by the convergence analysis, we formulate the problem of selecting the learning rate as a one-dimensional online learning problem. When paired with an online learning algorithm such as Follow-the-Regularized-Leader, our method produces a flexible, adaptive learning rate schedule that tends to increase when consecutive gradients are aligned and decrease otherwise. We establish a data-adaptive convergence rate for normalized SGD equipped with GALA in the smooth, nonconvex setting. Empirically, common optimizers such as SGD and Adam, when augmented with GALA, demonstrate robust performance across a wide range of initial learning rates and perform competitively without the need for tuning.

Ruichen Jiang、Ali Kavis、Aryan Mokhtari

计算技术、计算机技术

Ruichen Jiang,Ali Kavis,Aryan Mokhtari.Online Learning-guided Learning Rate Adaptation via Gradient Alignment[EB/OL].(2025-06-09)[2025-06-17].https://arxiv.org/abs/2506.08419.点此复制

评论