Scalability of 3D deterministic particle transport on the Intel MIC architecture
Scalability of 3D deterministic particle transport on the Intel MIC architecture
he key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer architectures. However, the number of on-chip cores grows quickly with the scale-down of feature size in semiconductor technology. In this paper, we present a scalability investigation of one energy group time-independent deterministic discrete ordinates neutron transport in 3D Cartesian geometry (Sweep3D) on Intel’s Many Integrated Core (MIC) architecture, which can provide up to 62 cores with four hardware threads per core now and will own up to 72 in the future. The parallel programming model, OpenMP, and vector intrinsic functions are used to exploit thread parallelism and vector parallelism for the discrete ordinates method, respectively. The results on a 57-core MIC coprocessor show that the implementation of Sweep3D on MIC has good scalability in performance. In addition, the application of the Roofline model to assess the implementation and performance comparison between MIC and Tesla K20C Graphics Processing Unit (GPU) are also reported.
he key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer architectures. However, the number of on-chip cores grows quickly with the scale-down of feature size in semiconductor technology. In this paper, we present a scalability investigation of one energy group time-independent deterministic discrete ordinates neutron transport in 3D Cartesian geometry (Sweep3D) on Intel’s Many Integrated Core (MIC) architecture, which can provide up to 62 cores with four hardware threads per core now and will own up to 72 in the future. The parallel programming model, OpenMP, and vector intrinsic functions are used to exploit thread parallelism and vector parallelism for the discrete ordinates method, respectively. The results on a 57-core MIC coprocessor show that the implementation of Sweep3D on MIC has good scalability in performance. In addition, the application of the Roofline model to assess the implementation and performance comparison between MIC and Tesla K20C Graphics Processing Unit (GPU) are also reported.
LIU Jie、WANG Qing-Lin、XING Zuo-Cheng、GONG Chun-Ye
dx.doi.org/10.13538/j.1001-8042/nst.26.050502
微电子学、集成电路计算技术、计算机技术
Particle transportiscrete ordinates methodSweep3DMany Integrated Core (MIC)ScalabilityRoofline modelGraphics Processing Unit (GPU)
Particle transportiscrete ordinates methodSweep3DMany Integrated Core (MIC)ScalabilityRoofline modelGraphics Processing Unit (GPU)
LIU Jie,WANG Qing-Lin,XING Zuo-Cheng,GONG Chun-Ye.Scalability of 3D deterministic particle transport on the Intel MIC architecture[EB/OL].(2023-06-18)[2025-08-02].https://chinaxiv.org/abs/202306.00232.点此复制
评论