|国家预印本平台
首页|SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan

SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan

SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan

来源:Arxiv_logoArxiv
英文摘要

SAKURAONE is a managed high performance computing (HPC) cluster developed and operated by the SAKURA Internet Research Center. It reinforces the ``KOKARYOKU PHY'' configuration of bare-metal GPU servers and is designed as a cluster computing resource optimized for advanced workloads, including large language model (LLM) training. In the ISC 2025 edition of the TOP500 list, SAKURAONE was ranked \textbf{49th} in the world based on its High Performance Linpack (HPL) score, demonstrating its global competitiveness. In particular, it is the \textbf{only system within the top 100} that employs a fully open networking stack based on \textbf{800~GbE (Gigabit Ethernet)} and the \textbf{SONiC (Software for Open Networking in the Cloud)} operating system, highlighting the viability of open and vendor-neutral technologies in large-scale HPC infrastructure. SAKURAONE achieved a sustained performance of 33.95~PFLOP/s on the HPL benchmark (Rmax), and 396.295~TFLOP/s on the High Performance Conjugate Gradient (HPCG) benchmark. For the HPL-MxP benchmark, which targets low-precision workloads representative of AI applications, SAKURAONE delivered an impressive 339.86~PFLOP/s using FP8 precision. The system comprises 100 compute nodes, each equipped with eight NVIDIA H100 GPUs. It is supported by an all-flash Lustre storage subsystem with a total physical capacity of 2~petabytes, providing high-throughput and low-latency data access. Internode communication is enabled by a full-bisection bandwidth interconnect based on a Rail-Optimized topology, where the Leaf and Spine layers are interconnected via 800~GbE links. This topology, in combination with RoCEv2 (RDMA over Converged Ethernet version 2), enables high-speed, lossless data transfers and mitigates communication bottlenecks in large-scale parallel workloads.

Fumikazu Konishi

计算技术、计算机技术

Fumikazu Konishi.SAKURAONE: Empowering Transparent and Open AI Platforms through Private-Sector HPC Investment in Japan[EB/OL].(2025-07-02)[2025-07-16].https://arxiv.org/abs/2507.02124.点此复制

评论