|国家预印本平台
首页|HSG-12M: A Large-Scale Spatial Multigraph Dataset

HSG-12M: A Large-Scale Spatial Multigraph Dataset

HSG-12M: A Large-Scale Spatial Multigraph Dataset

来源:Arxiv_logoArxiv
英文摘要

Existing graph benchmarks assume non-spatial, simple edges, collapsing physically distinct paths into a single link. We introduce HSG-12M, the first large-scale dataset of $\textbf{spatial multigraphs}-$graphs embedded in a metric space where multiple geometrically distinct trajectories between two nodes are retained as separate edges. HSG-12M contains 11.6 million static and 5.1 million dynamic $\textit{Hamiltonian spectral graphs}$ across 1401 characteristic-polynomial classes, derived from 177 TB of spectral potential data. Each graph encodes the full geometry of a 1-D crystal's energy spectrum on the complex plane, producing diverse, physics-grounded topologies that transcend conventional node-coordinate datasets. To enable future extensions, we release $\texttt{Poly2Graph}$: a high-performance, open-source pipeline that maps arbitrary 1-D crystal Hamiltonians to spectral graphs. Benchmarks with popular GNNs expose new challenges in learning from multi-edge geometry at scale. Beyond its practical utility, we show that spectral graphs serve as universal topological fingerprints of polynomials, vectors, and matrices, forging a new algebra-to-graph link. HSG-12M lays the groundwork for geometry-aware graph learning and new opportunities of data-driven scientific discovery in condensed matter physics and beyond.

Xianquan Yan、Hakan Akgün、Kenji Kawaguchi、N. Duane Loh、Ching Hua Lee

物理学晶体学数学计算技术、计算机技术

Xianquan Yan,Hakan Akgün,Kenji Kawaguchi,N. Duane Loh,Ching Hua Lee.HSG-12M: A Large-Scale Spatial Multigraph Dataset[EB/OL].(2025-06-10)[2025-06-24].https://arxiv.org/abs/2506.08618.点此复制

评论