MatWheel: Addressing Data Scarcity in Materials Science Through Synthetic Data
MatWheel: Addressing Data Scarcity in Materials Science Through Synthetic Data
Data scarcity and the high cost of annotation have long been persistent challenges in the field of materials science. Inspired by its potential in other fields like computer vision, we propose the MatWheel framework, which train the material property prediction model using the synthetic data generated by the conditional generative model. We explore two scenarios: fully-supervised and semi-supervised learning. Using CGCNN for property prediction and Con-CDVAE as the conditional generative model, experiments on two data-scarce material property datasets from Matminer database are conducted. Results show that synthetic data has potential in extreme data-scarce scenarios, achieving performance close to or exceeding that of real samples in all two tasks. We also find that pseudo-labels have little impact on generated data quality. Future work will integrate advanced models and optimize generation conditions to boost the effectiveness of the materials data flywheel.
Wentao Li、Yizhe Chen、Jiangjie Qiu、Xiaonan Wang
自然科学研究方法信息科学、信息技术
Wentao Li,Yizhe Chen,Jiangjie Qiu,Xiaonan Wang.MatWheel: Addressing Data Scarcity in Materials Science Through Synthetic Data[EB/OL].(2025-04-12)[2025-06-13].https://arxiv.org/abs/2504.09152.点此复制
评论