Multi-level Compositional Feature Augmentation for Unbiased Scene Graph Generation
Multi-level Compositional Feature Augmentation for Unbiased Scene Graph Generation
Scene Graph Generation (SGG) aims to detect all the visual relation triplets <sub, pred, obj> in a given image. With the emergence of various advanced techniques for better utilizing both the intrinsic and extrinsic information in each relation triplet, SGG has achieved great progress over the recent years. However, due to the ubiquitous long-tailed predicate distributions, today's SGG models are still easily biased to the head predicates. Currently, the most prevalent debiasing solutions for SGG are re-balancing methods, e.g., changing the distributions of original training samples. In this paper, we argue that all existing re-balancing strategies fail to increase the diversity of the relation triplet features of each predicate, which is critical for robust SGG. To this end, we propose a novel Multi-level Compositional Feature Augmentation (MCFA) strategy, which aims to mitigate the bias issue from the perspective of increasing the diversity of triplet features. Specifically, we enhance relationship diversity on not only feature-level, i.e., replacing the intrinsic or extrinsic visual features of triplets with other correlated samples to create novel feature compositions for tail predicates, but also image-level, i.e., manipulating the image to generate brand new visual appearance for triplets. Due to its model-agnostic nature, MCFA can be seamlessly incorporated into various SGG frameworks. Extensive ablations have shown that MCFA achieves a new state-of-the-art performance on the trade-off between different metrics.
Xingchen Li、Chong Sun、Chen Li、Long Chen、Lin Li
计算技术、计算机技术
Xingchen Li,Chong Sun,Chen Li,Long Chen,Lin Li.Multi-level Compositional Feature Augmentation for Unbiased Scene Graph Generation[EB/OL].(2025-06-23)[2025-07-09].https://arxiv.org/abs/2308.06712.点此复制
评论