|国家预印本平台
首页|Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters

Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters

Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters

来源:Arxiv_logoArxiv
英文摘要

Dynamic resource management is essential for optimizing computational efficiency in modern high-performance computing (HPC) environments, particularly as systems scale. While research has demonstrated the benefits of malleability in resource management systems (RMS), the adoption of such techniques in production environments remains limited due to challenges in standardization, interoperability, and usability. Addressing these gaps, this paper extends our prior work on the Dynamic Management of Resources (DMR) framework, which provides a modular and user-friendly approach to dynamic resource allocation. Building upon the original DMRlib reconfiguration runtime, this work integrates new methodology from the Malleability Module (MaM) of the Proteo framework, further enhancing reconfiguration capabilities with new spawning strategies and data redistribution methods. In this paper, we explore new malleability strategies in HPC dynamic workloads, such as merging MPI communicators and asynchronous reconfigurations, which offer new opportunities for dramatically reducing memory overhead. The proposed enhancements are rigorously evaluated on a world-class supercomputer, demonstrating improved resource utilization and workload efficiency. Results show that dynamic resource management can reduce the workload completion time by 40% and increase the resource utilization by over 20%, compared to static resource allocation.

Sergio Iserte、Iker Martín-álvarez、Krzysztof Rojek、José I. Aliaga、Maribel Castillo、Weronika Folwarska、Antonio J. Pe?a

10.1016/j.future.2025.107949

计算技术、计算机技术

Sergio Iserte,Iker Martín-álvarez,Krzysztof Rojek,José I. Aliaga,Maribel Castillo,Weronika Folwarska,Antonio J. Pe?a.Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters[EB/OL].(2025-06-17)[2025-07-21].https://arxiv.org/abs/2506.14743.点此复制

评论