|国家预印本平台
| 注册

国家预印本平台

中国首发,全球知晓

热点论文更多 >
A Brief Tutorial on Reinforcement Learning: From MDP to DDPG

This tutorial presents a coherent overview of reinforcement learning (RL), tracing its evolution from theoretical foundations to advanced deep learning algorithms. We begin with the mathematical formalization of sequential decision-making via Markov Decision Processes (MDPs). Central to RL theory is the Bellman equation for policy evaluation and its extension, the Bellman optimality equation, which provides the fundamental condition for optimal behavior. The journey from these equations to practical algorithms is explored, starting with model-based dynamic programming and progressing to model-free temporal-difference learning. We highlight Q-learning as a pivotal model-free algorithm that directly implements the Bellman optimality equation through sampling. To handle high-dimensional state spaces, the paradigm shifts to function approximation and deep reinforcement learning, exemplified by Deep Q-Networks (DQN). A significant challenge arises in continuous action spaces, addressed by actor-critic methods. We examine the Deep Deterministic Policy Gradient (DDPG) algorithm in detail, explaining how it adapts the principles of optimality to continuous control by maintaining separate actor and critic networks. The tutorial concludes with a unified perspective, framing RL's development as a logical progression from defining optimality conditions to developing scalable solution algorithms, and briefly surveys subsequent improvements and future directions, all underpinned by the enduring framework of the Bellman equations.

张田发表时间:2026-01-06
A long 3He proportional counter array for the study of β-delayed neutron emission probability at BRIF

The β-delayed neutron emission probability (Pn) is an indispensable quantity to describe the decay strength of very neutron-rich nuclei and the rapid neutron capture process in nuclear astrophysics. A Long HElium-3 Neutron Array (LHENA), is developed at the Beijing Rare Isotope Facility (BRIF) to initiate Pn measurements using Isotope Separated On Line (ISOL) pulsed beams. LHENA is designed to work in conjunction with a tape driver and different detectors, so that β particles, β-delayed neutrons and γ rays emitted from the implanted nuclei can be measured simultaneously in periodical mode. LHENA consists of 21 long 3He proportional counters embedded in a polyethylene-made moderator in two-ring structure, which allows for a flat neutron detection efficiency up to 3 MeV according to our Geant4 simulation. The detection efficiency has been experimentally determined to be 16.4(±0.4)% using the 51V(p,n)51Cr reaction for neutron energies in the 120-700 keV range. A good flatness in neutron detection efficiency and very low background of LHENA are verified, which lay a solid foundation for the first Pn measurement using very neutron-rich Rb isotopes at BRIF.

Yang-Ping, Dr. Shen;Guo, Dr. Bing 郭冰;Liu, Mr. Wei-Ping;Jin-Long, Mr. Ma;Tian, Mr. Jun-Wen;Xie, Mr. DongLin;Huang, Mr. HongWei;Lin, Dr. Weiping;Xie, Mr. DeHao;Tang, Mr. Yi;Wen, Mr. Cun;Su, Prof. Jun;Qin, Mr. ZhiWei;Ma, Mr. Junrui;Nan, Dr. Weike;Tu, Miss Wanqin;Nan, Dr. Wei;Yan, Dr. Shengquan;Yun-Ju, Prof. Li;Wang, Dr. Qiang;Wang, Mr. You-Bao;Yu-Qiang, Mr. Zhang;Zhu, Mr. Ming-Hao发表时间:2026-01-05
基于非言语线索的面孔可信度印象动态更新及社会距离的调节作用

在人际互动中,面孔特征和情境因素等非言语线索对人的知觉至关重要。但以往的面孔印象更新研究忽略了非言语线索,且鲜少关注心理社会因素对面孔印象更新的影响。本研究使用经典的印象更新范式和想象范式,结合眼动技术,为非言语线索面孔印象更新的模式、以及评价者/目标–线索社会距离对其的调节作用及其认知机制,提供了新的实证证据。结果显示:(1)基于非言语线索的面孔可信度印象更新呈同化对比动态变化模式。(2)评价者/目标–线索社会距离总体上均对面孔可信度印象产生跨维度同化效应。(3)评价者–线索社会距离和目标–线索社会距离均只能通过“目标–线索相对注意值”来间接影响面孔可信度印象更新,且二者均主要影响可信度印象提升程度。

何婷婷;吴天朗;高晓岚;季琭妍;陈文锋发表时间:2026-01-05
自发性知觉经络反应对认知控制的影响及其认知神经基础

自发性知觉经络反应(autonomous sensory meridian response, ASMR)作为一种特殊的视听诱发体验,为探究人脑加工与响应情感刺激的机制提供了独特窗口。现有研究表明,ASMR会暂时抑制个体的认知控制,但这种影响是状态性的还是特质性的、具体作用于认知控制的哪个阶段及其认知神经基础为何,尚不清楚。为此,本研究旨在从青少年心理发展和神经加工动态两个视角,深入揭示ASMR的发生发展机制。这一工作不仅能为理解人脑加工情感刺激的认知控制过程提供新证据,深化对青少年认知控制发展的认识,也将为促进有异常情感反应青少年的心理健康,以及ASMR的临床实践应用提供理论指导。

王协顺;张伊晓;李响;苏彦捷发表时间:2026-01-05
Heterogeneous Low-Bandwidth Pre-Training of LLMs

Pre-training large language models (LLMs) increasingly requires distributed compute, yet bandwidth constraints make it difficult to scale beyond well-provisioned datacenters-especially when model parallelism forces frequent, large inter-device communications. We study whether SparseLoCo, a low-communication data parallel method based on infrequent synchronization and sparse pseudo-gradient exchange, can be combined with low-bandwidth pipeline model parallelism via activation and activation-gradient compression. We introduce a heterogeneous distributed training framework where some participants host full replicas on high-bandwidth interconnects, while resource-limited participants are grouped to jointly instantiate a replica using pipeline parallelism with subspace-projected inter-stage communication. To make the recently introduced subspace pipeline compression compatible with SparseLoCo, we study a number of adaptations. Across large-scale language modeling experiments (178M-1B parameters) on standard pretraining corpora, we find that activation compression composes with SparseLoCo at modest cost, while selective (heterogeneous) compression consistently improves the loss-communication tradeoff relative to compressing all replicas-especially at aggressive compression ratios. These results suggest a practical path to incorporating low-bandwidth model parallelism and heterogeneous participants into LLM pre-training.

Yazan Obeidi;Amir Sarfi;Joel Lidin;Paul Janson;Eugene Belilovsky发表时间:2026-01-05
中国预印本平台发展联盟
合作期刊
国际预印本仓储