|国家预印本平台

首页|碎片化记忆混合专家：架构无关的反记忆化与编织密度的自主发现基于噪声多源分类的时空流形路由

碎片化记忆混合专家：架构无关的反记忆化与编织密度的自主发现基于噪声多源分类的时空流形路由

张庆君

DOI：10.12383/202606120002V1

CSTR:20001.14.202606120002V1

✕

DOI：10.12383/202606120002V1

CSTR:20001.14.202606120002V1

来源：

istic_logo

国家预印本平台

碎片化记忆混合专家：架构无关的反记忆化与编织密度的自主发现基于噪声多源分类的时空流形路由

FragmentedMemoryMoE: Architecture-Agnostic Anti-Memorization and Learned Weaving Density in Noisy Multi-Source Classification Spacetime Manifold Routing for Mixture-of-Experts

张庆君¹

作者信息

1. 无锡太湖学院
折叠

摘要

[目的] 当前AI架构趋向同质化——更大的Transformer、相同的缩放定律、仅空间特征处理。本文提出一种碎片化记忆混合专家（FragmentedMemoryMoE）架构，通过三种连续扰动机制对抗标签噪声下的记忆化，并引入时空流形路由概念。[方法] FragmentedMemoryMoE通过路由噪声、输出碎片化、专家抑制三种扰动机制构建时空流形，使每个epoch提供数据的不同的"视角"。我们进行了85个实验（6个阶段），测试了4种N层配置（3t3e、3t4e、4t3e、4t4e），并引入了双层噪声优化——从验证损失中学习噪声参数而非预设。[结果] (1) 碎片化优势具有架构不可知性，𝐾=3−4为最优区间，之前的奇偶效应被推翻；(2) 在SEQ=48与SEQ=64之间存在尖锐相变，碎片化从探索模式切换到正则化模式；(3) 双层优化在𝐾=2时达到78.9%（比固定碎片化提升12.8%），模型自发发现编织密度随任务维度变化；(4) 时间调度与SEQ交叉实验揭示了最佳的调度路径——SEQ=32时早期重调度主导，SEQ=64时早期轻调度主导。[局限] 当前Minkowski流形距离的实施仅限于分类任务中的片段级路由，尚未在生成模型或异构数据源上进行测试。[结论] 轴轨迹不是固定路径，而是由完整的(𝐾,noise,SEQ)流形参数化的一族路径。时空流形路由为MoE系统提供了除输出空间隔离外的另一种结构化反高斯机制。

Abstract

[Objective] Current AI architectures converge toward homogeneity — larger Transformers, identical scaling laws, spatial-only feature processing. FragmentedMemoryMoE proposes a discriminative MoE architecture that prevents memorization through three synergistic perturbation mechanisms and introduces the concept of spacetime manifold routing.[Methods] We conduct a comprehensive experimental program (85 experiments across 6 phases) testing 4 N-tier configurations. Three continuous perturbation mechanisms operate simultaneously: routing noise, output fragmentation, and expert suppression — together creating a spacetime manifold where each epoch provides a different view of the same data. We introduce bilevel noise optimization, where noise parameters are learned from validation loss rather than preset.[Results] Key findings include: (1) Fragmentation advantage is architecture-agnostic with a consistent sweet spot at 𝐾=3−4; the previously claimed even-odd 𝐾 parity effect is refuted as a confounding artifact of sequence length. (2) A sharp phase transition occurs between SEQ=48 and SEQ=64 where fragmentation switches from exploration to regularization mode. (3) Bilevel optimization achieves 78.9% at 𝐾=2 (+12.8% over fixed fragmentation); the model spontaneously discovers a weaving density that varies with task dimension. (4) A schedule × SEQ crossover reveals that the optimal temporal schedule reverses across regimes, with growth achieving 67.6% (+5.8% over constant).[Limitations] Current validation is restricted to classification tasks on TinyImageNet-200 and WikiText-2. The Minkowski light-cone distance for fragment routing has not been tested on generative tasks or heterogeneous multi-modal data sources.[Conclusions] The axis trajectory is not a fixed path but a family of paths parameterized by the full (𝐾,noise,SEQ) manifold. The spacetime manifold principle — established here through a fragment-schedule tensor — extends naturally to Minkowski spacetime, providing an alternative structural anti-Gaussian mechanism complementary to output-space isolation. The independent validation of similarity-based routing without positional encoding by CPiRi (ICLR 2026) supports the generality of this approach across domains.

关键词

混合专家模型，碎片化记忆，反记忆化，编织密度，时空流形，双层优化，相变，架构不可知

Key words

Mixture-of-Experts/ fragmented memory/ anti-memorization/ weaving density/ spacetime manifold/ bilevel optimization/ phase transition/ architecture-agnostic/ CPiRi

引用本文复制引用

张庆君.碎片化记忆混合专家：架构无关的反记忆化与编织密度的自主发现基于噪声多源分类的时空流形路由[EB/OL].(2026-06-26)[2026-06-28].https://sinoxiv.napstic.cn/article/26011664.

学科分类

计算技术、计算机技术

首发时间： 2026-06-26 17:10:41

下载量：2

|

点击量：37

段落导航

相关论文

摘要
Abstract
关键词
Key words
引用本文