Efficient dataset generation for machine learning perovskite alloys
Efficient dataset generation for machine learning perovskite alloys
Lead-based perovskite solar cells have reached high efficiencies, but toxicity and lack of stability hinder their wide-scale adoption. These issues have been partially addressed through compositional engineering of perovskite materials, but the vast complexity of the perovskite materials space poses a significant obstacle to exploration. We previously demonstrated how machine learning (ML) can accelerate property predictions for the CsPb(Cl/Br)$_3$ perovskite alloy. However, the substantial computational demand of density functional theory (DFT) calculations required for model training prevents applications to more complex materials. Here, we introduce a data-efficient scheme to facilitate model training, validated initially on CsPb(Cl/Br)$_3$ data and extended to the ternary alloy CsSn(Cl/Br/I)$_3$. Our approach employs clustering to construct a compact yet diverse initial dataset of atomic structures. We then apply a two-stage active learning approach to first improve the reliability of the ML-based structure relaxations and then refine accuracy near equilibrium structures. Tests for CsPb(Cl/Br)$_3$ demonstrate that our scheme reduces the number of required DFT calculations during the different parts of our proposed model training method by up to 20% and 50%. The fitted model for CsSn(Cl/Br/I)$_3$ is robust and highly accurate, evidenced by the convergence of all ML-based structure relaxations in our tests and an average relaxation error of only 0.5 meV/atom.
Henrietta Homm、Jarno Laakso、Patrick Rinke
10.1103/PhysRevMaterials.9.053802
物理学计算技术、计算机技术
Henrietta Homm,Jarno Laakso,Patrick Rinke.Efficient dataset generation for machine learning perovskite alloys[EB/OL].(2025-06-06)[2025-06-21].https://arxiv.org/abs/2506.05777.点此复制
评论