FragmentedMemoryMoE: Architecture-Agnostic Anti-Memorization and Learned Weaving Density in Noisy Multi-Source Classification Spacetime Manifold Routing for Mixture-of-Experts
[Objective] Current AI architectures converge toward homogeneity — larger Transformers, identical scaling laws, spatial-only feature processing. FragmentedMemoryMoE proposes a discriminative MoE architecture that prevents memorization through three synergistic perturbation mechanisms and introduces the concept of spacetime manifold routing.[Methods] We conduct a comprehensive experimental program (85 experiments across 6 phases) testing 4 N-tier configurations. Three continuous perturbation mechanisms operate simultaneously: routing noise, output fragmentation, and expert suppression — together creating a spacetime manifold where each epoch provides a different view of the same data. We introduce bilevel noise optimization, where noise parameters are learned from validation loss rather than preset.[Results] Key findings include: (1) Fragmentation advantage is architecture-agnostic with a consistent sweet spot at 𝐾=3−4; the previously claimed even-odd 𝐾 parity effect is refuted as a confounding artifact of sequence length. (2) A sharp phase transition occurs between SEQ=48 and SEQ=64 where fragmentation switches from exploration to regularization mode. (3) Bilevel optimization achieves 78.9% at 𝐾=2 (+12.8% over fixed fragmentation); the model spontaneously discovers a weaving density that varies with task dimension. (4) A schedule × SEQ crossover reveals that the optimal temporal schedule reverses across regimes, with growth achieving 67.6% (+5.8% over constant).[Limitations] Current validation is restricted to classification tasks on TinyImageNet-200 and WikiText-2. The Minkowski light-cone distance for fragment routing has not been tested on generative tasks or heterogeneous multi-modal data sources.[Conclusions] The axis trajectory is not a fixed path but a family of paths parameterized by the full (𝐾,noise,SEQ) manifold. The spacetime manifold principle — established here through a fragment-schedule tensor — extends naturally to Minkowski spacetime, providing an alternative structural anti-Gaussian mechanism complementary to output-space isolation. The independent validation of similarity-based routing without positional encoding by CPiRi (ICLR 2026) supports the generality of this approach across domains.