TerraMind: Large-Scale Generative Multimodality for Earth Observation
TerraMind: Large-Scale Generative Multimodality for Earth Observation
We present TerraMind, the first any-to-any generative, multimodal foundation model for Earth observation (EO). Unlike other multimodal models, TerraMind is pretrained on dual-scale representations combining both token-level and pixel-level data across modalities. On a token level, TerraMind encodes high-level contextual information to learn cross-modal relationships, while on a pixel level, TerraMind leverages fine-grained representations to capture critical spatial nuances. We pretrained TerraMind on nine geospatial modalities of a global, large-scale dataset. In this paper, we demonstrate that (i) TerraMind's dual-scale early fusion approach unlocks a range of zero-shot and few-shot applications for Earth observation, (ii) TerraMind introduces "Thinking-in-Modalities" (TiM) -- the capability of generating additional artificial data during finetuning and inference to improve the model output -- and (iii) TerraMind achieves beyond state-of-the-art performance in community-standard benchmarks for EO like PANGAEA. The pretraining dataset, the model weights, and our code is open-sourced under a permissive license.
Johannes Jakubik、Felix Yang、Benedikt Blumenstiel、Erik Scheurer、Rocco Sedona、Stefano Maurogiovanni、Jente Bosmans、Nikolaos Dionelis、Valerio Marsocci、Niklas Kopp、Rahul Ramachandran、Paolo Fraccaro、Thomas Brunschwiler、Gabriele Cavallaro、Juan Bernabe-Moreno、Nicolas Longépé
遥感技术
Johannes Jakubik,Felix Yang,Benedikt Blumenstiel,Erik Scheurer,Rocco Sedona,Stefano Maurogiovanni,Jente Bosmans,Nikolaos Dionelis,Valerio Marsocci,Niklas Kopp,Rahul Ramachandran,Paolo Fraccaro,Thomas Brunschwiler,Gabriele Cavallaro,Juan Bernabe-Moreno,Nicolas Longépé.TerraMind: Large-Scale Generative Multimodality for Earth Observation[EB/OL].(2025-04-15)[2025-04-24].https://arxiv.org/abs/2504.11171.点此复制
评论