Less is More: Multimodal Region Representation via Pairwise Inter-view Learning
Less is More: Multimodal Region Representation via Pairwise Inter-view Learning
With the increasing availability of geospatial datasets, researchers have explored region representation learning (RRL) to analyze complex region characteristics. Recent RRL methods use contrastive learning (CL) to capture shared information between two modalities but often overlook task-relevant unique information specific to each modality. Such modality-specific details can explain region characteristics that shared information alone cannot capture. Bringing information factorization to RRL can address this by factorizing multimodal data into shared and unique information. However, existing factorization approaches focus on two modalities, whereas RRL can benefit from various geospatial data. Extending factorization beyond two modalities is non-trivial because modeling high-order relationships introduces a combinatorial number of learning objectives, increasing model complexity. We introduce Cross modal Knowledge Injected Embedding, an information factorization approach for RRL that captures both shared and unique representations. CooKIE uses a pairwise inter-view learning approach that captures high-order information without modeling high-order dependency, avoiding exhaustive combinations. We evaluate CooKIE on three regression tasks and a land use classification task in New York City and Delhi, India. Results show that CooKIE outperforms existing RRL methods and a factorized RRL model, capturing multimodal information with fewer training parameters and floating-point operations per second (FLOPs). We release the code: https://github.com/MinNamgung/CooKIE.
Min Namgung、Yijun Lin、JangHyeon Lee、Yao-Yi Chiang
测绘学自然地理学环境科学技术现状
Min Namgung,Yijun Lin,JangHyeon Lee,Yao-Yi Chiang.Less is More: Multimodal Region Representation via Pairwise Inter-view Learning[EB/OL].(2025-05-14)[2025-07-02].https://arxiv.org/abs/2505.18178.点此复制
评论