Liquid and solid layers in a thermal deep learning machine
Liquid and solid layers in a thermal deep learning machine
Based on deep neural networks (DNNs), deep learning has been successfully applied to many problems, but its mechanism is still not well understood -- especially the reason why over-parametrized DNNs can generalize. A recent statistical mechanics theory on supervised learning by a prototypical multi-layer perceptron (MLP) on some artificial learning scenarios predicts that adjustable parameters of over-parametrized MLPs become strongly constrained by the training data close to the input/output boundaries, while the parameters in the center remain largely free, giving rise to a solid-liquid-solid structure. Here we establish this picture, through numerical experiments on benchmark real-world data using a thermal deep learning machine that explores the phase space of the synaptic weights and neurons. The supervised training is implemented by a GPU-accelerated molecular dynamics algorithm, which operates at very low temperatures, and the trained machine exhibits good generalization ability in the test. Global and layer-specific dynamics, with complex non-equilibrium aging behavior, are characterized by time-dependent auto-correlation and replica-correlation functions. Our analyses reveal that the design space of the parameters in the liquid and solid layers are respectively structureless and hierarchical. Our main results are summarized by a data storage ratio -- network depth phase diagram with liquid and solid phases. The proposed thermal machine, which is a physical model with a well-defined Hamiltonian, that reduces to MLP in the zero-temperature limit, can serve as a starting point for physically interpretable deep learning.
Gang Huang、Lai Shun Chan、Hajime Yoshino、Ge Zhang、Yuliang Jin
计算技术、计算机技术
Gang Huang,Lai Shun Chan,Hajime Yoshino,Ge Zhang,Yuliang Jin.Liquid and solid layers in a thermal deep learning machine[EB/OL].(2025-06-07)[2025-07-01].https://arxiv.org/abs/2506.06789.点此复制
评论