|国家预印本平台
首页|Orthogonal Gradient Descent Improves Neural Calibration

Orthogonal Gradient Descent Improves Neural Calibration

Orthogonal Gradient Descent Improves Neural Calibration

来源:Arxiv_logoArxiv
英文摘要

We provide evidence that orthogonalizing gradients during training improves model calibration without sacrificing accuracy. On CIFAR-10 with 10% labeled data, $\perp$Grad matches SGD in accuracy but yields consistently improved calibration metrics such as lower test loss, reduced softmax overconfidence, and higher predictive entropy. These benefits persist under input corruption (CIFAR-10C) and extended training, where $\perp$Grad models degrade more gracefully than SGD-trained counterparts. $\perp$Grad is optimizer-agnostic, incurs minimal overhead, and works well with post-hoc calibration techniques like temperature scaling. Theoretically, we prove convergence of a simplified version of $\perp$Grad under mild assumptions and characterize its stationary points in positive homogeneous networks: $\perp$Grad converges to solutions where further loss reduction requires confidence scaling rather than decision boundary improvement.

C. Evans Hedges

计算技术、计算机技术

C. Evans Hedges.Orthogonal Gradient Descent Improves Neural Calibration[EB/OL].(2025-06-04)[2025-06-13].https://arxiv.org/abs/2506.04487.点此复制

评论