|国家预印本平台
首页|BCE vs. CE in Deep Feature Learning

BCE vs. CE in Deep Feature Learning

BCE vs. CE in Deep Feature Learning

来源:Arxiv_logoArxiv
英文摘要

When training classification models, it expects that the learned features are compact within classes, and can well separate different classes. As the dominant loss function for training classification models, minimizing cross-entropy (CE) loss maximizes the compactness and distinctiveness, i.e., reaching neural collapse (NC). The recent works show that binary CE (BCE) performs also well in multi-class tasks. In this paper, we compare BCE and CE in deep feature learning. For the first time, we prove that BCE can also maximize the intra-class compactness and inter-class distinctiveness when reaching its minimum, i.e., leading to NC. We point out that CE measures the relative values of decision scores in the model training, implicitly enhancing the feature properties by classifying samples one-by-one. In contrast, BCE measures the absolute values of decision scores and adjust the positive/negative decision scores across all samples to uniformly high/low levels. Meanwhile, the classifier biases in BCE present a substantial constraint on the decision scores to explicitly enhance the feature properties in the training. The experimental results are aligned with above analysis, and show that BCE could improve the classification and leads to better compactness and distinctiveness among sample features. The codes will be released.

Qiufu Li、Huibin Xiao、Linlin Shen

计算技术、计算机技术

Qiufu Li,Huibin Xiao,Linlin Shen.BCE vs. CE in Deep Feature Learning[EB/OL].(2025-05-09)[2025-06-08].https://arxiv.org/abs/2505.05813.点此复制

评论