|国家预印本平台
首页|Efficient vectorized evaluation of Gaussian AO integrals on modern central processing units

Efficient vectorized evaluation of Gaussian AO integrals on modern central processing units

Efficient vectorized evaluation of Gaussian AO integrals on modern central processing units

来源:Arxiv_logoArxiv
英文摘要

We report an implementation of the McMurchie-Davidson evaluation scheme for 1- and 2-particle Gaussian AO integrals designed for efficient execution on modern central processing units (CPUs) with Single Instruction Multiple Data (SIMD) instruction sets. Like in our recent MD implementation for graphical processing units (GPUs) [J. Chem. Phys. 160, 244109 (2024)], variable-sized batches of shellsets of integrals are evaluated at a time. By optimizing for the floating point instruction throughput rather than minimizing the number of operations, this approach achieves up to 50% of the theoretical hardware peak FP64 performance for many common SIMD-equipped platforms (AVX2, AVX512, NEON), which translates to speedups of up to 30 over the state-of-the-art one-shellset-at-a-time implementation of Obara-Saika-type schemes in Libint for a variety of primitive and contracted integrals. As with our previous work, we rely on the standard C++ programming language -- such as the std::simd standard library feature to be included in the 2026 ISO C++ standard -- without any explicit code generation to keep the code base small and portable. The implementation is part of the open source LibintX library freely available at https://github.com/ValeevGroup/libintx.

Andrey Asadchev、Edward F. Valeev

化学

Andrey Asadchev,Edward F. Valeev.Efficient vectorized evaluation of Gaussian AO integrals on modern central processing units[EB/OL].(2025-06-14)[2025-06-25].https://arxiv.org/abs/2506.12501.点此复制

评论