Orthogonal Gradient Descent Improves Neural Calibration
Orthogonal Gradient Descent Improves Neural Calibration
We provide evidence that orthogonalizing gradients during training improves model calibration without sacrificing accuracy. On CIFAR-10 with 10% labeled data, $\perp$Grad matches SGD in accuracy but yields consistently improved calibration metrics such as lower test loss, reduced softmax overconfidence, and higher predictive entropy. These benefits persist under input corruption (CIFAR-10C) and extended training, where $\perp$Grad models degrade more gracefully than SGD-trained counterparts. $\perp$Grad is optimizer-agnostic, incurs minimal overhead, and works well with post-hoc calibration techniques like temperature scaling. Theoretically, we prove convergence of a simplified version of $\perp$Grad under mild assumptions and characterize its stationary points in positive homogeneous networks: $\perp$Grad converges to solutions where further loss reduction requires confidence scaling rather than decision boundary improvement.
C. Evans Hedges
计算技术、计算机技术
C. Evans Hedges.Orthogonal Gradient Descent Improves Neural Calibration[EB/OL].(2025-06-04)[2025-06-13].https://arxiv.org/abs/2506.04487.点此复制
评论