Data Version Management and Machine-Actionable Reproducibility for HPC based on git and DataLad
Data Version Management and Machine-Actionable Reproducibility for HPC based on git and DataLad
We present the adaptation of an existing data versioning and machine-actionable reproducibility solution for HPC. Both aspects are important for research data management and the DataLad tool provides both based on the very prevalent git version control system. However, it is incompatible with HPC batch processing. The presented extension enables DataLad's versioning and reproducibility in conjunction with the HPC batch scheduling system Slurm. It solves a fundamental incompatibility as well as inefficient behavior patterns on parallel file systems.
Andreas Knüpfer、Timothy J. Callow
计算技术、计算机技术
Andreas Knüpfer,Timothy J. Callow.Data Version Management and Machine-Actionable Reproducibility for HPC based on git and DataLad[EB/OL].(2025-05-10)[2025-06-05].https://arxiv.org/abs/2505.06558.点此复制
评论