|国家预印本平台
| 注册
首页|碎片化记忆混合专家:架构无关的反记忆化与编织密度的自主发现 基于噪声多源分类的时空流形路由

碎片化记忆混合专家:架构无关的反记忆化与编织密度的自主发现 基于噪声多源分类的时空流形路由

张庆君

istic_logo国家预印本平台

碎片化记忆混合专家:架构无关的反记忆化与编织密度的自主发现 基于噪声多源分类的时空流形路由

FragmentedMemoryMoE: Architecture-Agnostic Anti-Memorization and Learned Weaving Density in Noisy Multi-Source Classification Spacetime Manifold Routing for Mixture-of-Experts

张庆君1

作者信息

  • 1. 无锡太湖学院
  • 折叠

摘要

[目的] 当前AI架构趋向同质化——更大的Transformer、相同的缩放定律、仅空间特征处理。本文提出一种碎片化记忆混合专家(FragmentedMemoryMoE)架构,通过三种连续扰动机制对抗标签噪声下的记忆化,并引入时空流形路由概念。 [方法] FragmentedMemoryMoE通过路由噪声、输出碎片化、专家抑制三种扰动机制构建时空流形,使每个epoch提供数据的不同的"视角"。我们进行了85个实验(6个阶段),测试了4种N层配置(3t3e、3t4e、4t3e、4t4e),并引入了双层噪声优化——从验证损失中学习噪声参数而非预设。 [结果] (1) 碎片化优势具有架构不可知性,𝐾=3−4为最优区间,之前的奇偶效应被推翻;(2) 在SEQ=48与SEQ=64之间存在尖锐相变,碎片化从探索模式切换到正则化模式;(3) 双层优化在𝐾=2时达到78.9%(比固定碎片化提升12.8%),模型自发发现编织密度随任务维度变化;(4) 时间调度与SEQ交叉实验揭示了最佳的调度路径——SEQ=32时早期重调度主导,SEQ=64时早期轻调度主导。 [局限] 当前Minkowski流形距离的实施仅限于分类任务中的片段级路由,尚未在生成模型或异构数据源上进行测试。 [结论] 轴轨迹不是固定路径,而是由完整的(𝐾,noise,SEQ)流形参数化的一族路径。时空流形路由为MoE系统提供了除输出空间隔离外的另一种结构化反高斯机制。

Abstract

[Objective] Current AI architectures converge toward homogeneity — larger Transformers, identical scaling laws, spatial-only feature processing. FragmentedMemoryMoE proposes a discriminative MoE architecture that prevents memorization through three synergistic perturbation mechanisms and introduces the concept of spacetime manifold routing. [Methods] We conduct a comprehensive experimental program (85 experiments across 6 phases) testing 4 N-tier configurations. Three continuous perturbation mechanisms operate simultaneously: routing noise, output fragmentation, and expert suppression — together creating a spacetime manifold where each epoch provides a different view of the same data. We introduce bilevel noise optimization, where noise parameters are learned from validation loss rather than preset. [Results] Key findings include: (1) Fragmentation advantage is architecture-agnostic with a consistent sweet spot at 𝐾=3−4; the previously claimed even-odd 𝐾 parity effect is refuted as a confounding artifact of sequence length. (2) A sharp phase transition occurs between SEQ=48 and SEQ=64 where fragmentation switches from exploration to regularization mode. (3) Bilevel optimization achieves 78.9% at 𝐾=2 (+12.8% over fixed fragmentation); the model spontaneously discovers a weaving density that varies with task dimension. (4) A schedule × SEQ crossover reveals that the optimal temporal schedule reverses across regimes, with growth achieving 67.6% (+5.8% over constant). [Limitations] Current validation is restricted to classification tasks on TinyImageNet-200 and WikiText-2. The Minkowski light-cone distance for fragment routing has not been tested on generative tasks or heterogeneous multi-modal data sources. [Conclusions] The axis trajectory is not a fixed path but a family of paths parameterized by the full (𝐾,noise,SEQ) manifold. The spacetime manifold principle — established here through a fragment-schedule tensor — extends naturally to Minkowski spacetime, providing an alternative structural anti-Gaussian mechanism complementary to output-space isolation. The independent validation of similarity-based routing without positional encoding by CPiRi (ICLR 2026) supports the generality of this approach across domains.

关键词

混合专家模型,碎片化记忆,反记忆化,编织密度,时空流形,双层优化,相变,架构不可知

Key words

Mixture-of-Experts/ fragmented memory/ anti-memorization/ weaving density/ spacetime manifold/ bilevel optimization/ phase transition/ architecture-agnostic/ CPiRi

引用本文复制引用

张庆君.碎片化记忆混合专家:架构无关的反记忆化与编织密度的自主发现 基于噪声多源分类的时空流形路由[EB/OL].(2026-06-26)[2026-06-28].https://sinoxiv.napstic.cn/article/26011664.

学科分类

计算技术、计算机技术
首发时间 2026-06-26 17:10:41
下载量:2
|
点击量:37
段落导航相关论文