首页|Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness

Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness

来源：

英文摘要

This work introduces a hybrid non-Euclidean optimization method which generalizes gradient norm clipping by combining steepest descent and conditional gradient approaches. The method achieves the best of both worlds by establishing a descent property under a generalized notion of ($L_0$,$L_1$)-smoothness. Weight decay is incorporated in a principled manner by identifying a connection to the Frank-Wolfe short step. In the stochastic case, we show an order optimal $O(n^{-1/4})$ convergence rate by leveraging a momentum based gradient estimator. We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.

作者：Thomas Pethick、Wanyun Xie、Mete Erdogan、Kimon Antonakopoulos、Tony Silveti-Falls、Volkan Cevher

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Thomas Pethick,Wanyun Xie,Mete Erdogan,Kimon Antonakopoulos,Tony Silveti-Falls,Volkan Cevher.Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness[EB/OL].(2025-06-02)[2025-06-22].https://arxiv.org/abs/2506.01913.点此复制

Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness

Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness

评论