|国家预印本平台
首页|Generative Distribution Distillation

Generative Distribution Distillation

Generative Distribution Distillation

来源:Arxiv_logoArxiv
英文摘要

In this paper, we formulate the knowledge distillation (KD) as a conditional generative problem and propose the \textit{Generative Distribution Distillation (GenDD)} framework. A naive \textit{GenDD} baseline encounters two major challenges: the curse of high-dimensional optimization and the lack of semantic supervision from labels. To address these issues, we introduce a \textit{Split Tokenization} strategy, achieving stable and effective unsupervised KD. Additionally, we develop the \textit{Distribution Contraction} technique to integrate label supervision into the reconstruction objective. Our theoretical proof demonstrates that \textit{GenDD} with \textit{Distribution Contraction} serves as a gradient-level surrogate for multi-task learning, realizing efficient supervised training without explicit classification loss on multi-step sampling image representations. To evaluate the effectiveness of our method, we conduct experiments on balanced, imbalanced, and unlabeled data. Experimental results show that \textit{GenDD} performs competitively in the unsupervised setting, significantly surpassing KL baseline by \textbf{16.29\%} on ImageNet validation set. With label supervision, our ResNet-50 achieves \textbf{82.28\%} top-1 accuracy on ImageNet in 600 epochs training, establishing a new state-of-the-art.

Jiequan Cui、Beier Zhu、Qingshan Xu、Xiaogang Xu、Pengguang Chen、Xiaojuan Qi、Bei Yu、Hanwang Zhang、Richang Hong

计算技术、计算机技术

Jiequan Cui,Beier Zhu,Qingshan Xu,Xiaogang Xu,Pengguang Chen,Xiaojuan Qi,Bei Yu,Hanwang Zhang,Richang Hong.Generative Distribution Distillation[EB/OL].(2025-07-19)[2025-08-10].https://arxiv.org/abs/2507.14503.点此复制

评论