|国家预印本平台
首页|Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework

Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework

Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework

来源:Arxiv_logoArxiv
英文摘要

Most materials science datasets are limited to atomic geometries (e.g., XYZ files), restricting their utility for multimodal learning and comprehensive data-centric analysis. These constraints have historically impeded the adoption of advanced machine learning techniques in the field. This work introduces MultiCrystalSpectrumSet (MCS-Set), a curated framework that expands materials datasets by integrating atomic structures with 2D projections and structured textual annotations, including lattice parameters and coordination metrics. MCS-Set enables two key tasks: (1) multimodal property and summary prediction, and (2) constrained crystal generation with partial cluster supervision. Leveraging a human-in-the-loop pipeline, MCS-Set combines domain expertise with standardized descriptors for high-quality annotation. Evaluations using state-of-the-art language and vision-language models reveal substantial modality-specific performance gaps and highlight the importance of annotation quality for generalization. MCS-Set offers a foundation for benchmarking multimodal models, advancing annotation practices, and promoting accessible, versatile materials science datasets. The dataset and implementations are available at https://github.com/KurbanIntelligenceLab/MultiCrystalSpectrumSet.

Can Polat、Hasan Kurban、Erchin Serpedin、Mustafa Kurban

计算技术、计算机技术

Can Polat,Hasan Kurban,Erchin Serpedin,Mustafa Kurban.Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework[EB/OL].(2025-05-30)[2025-06-19].https://arxiv.org/abs/2506.00302.点此复制

评论