MogaNet: Multi-order Gated Aggregation Network
Siyuan Li Zedong Wang Zicheng Liu Cheng Tan Haitao Lin Di Wu Zhiyuan Chen Jiangbin Zheng Stan Z. Li
作者信息
Abstract
By contextualizing the kernel as global as possible, Modern ConvNets have
shown great potential in computer vision tasks. However, recent progress on
multi-order game-theoretic interaction within deep neural networks (DNNs)
reveals the representation bottleneck of modern ConvNets, where the expressive
interactions have not been effectively encoded with the increased kernel size.
To tackle this challenge, we propose a new family of modern ConvNets, dubbed
MogaNet, for discriminative visual representation learning in pure
ConvNet-based models with favorable complexity-performance trade-offs. MogaNet
encapsulates conceptually simple yet effective convolutions and gated
aggregation into a compact module, where discriminative features are
efficiently gathered and contextualized adaptively. MogaNet exhibits great
scalability, impressive efficiency of parameters, and competitive performance
compared to state-of-the-art ViTs and ConvNets on ImageNet and various
downstream vision benchmarks, including COCO object detection, ADE20K semantic
segmentation, 2D&3D human pose estimation, and video prediction. Notably,
MogaNet hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on
ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59% FLOPs and
17M parameters, respectively. The source code is available at
https://github.com/Westlake-AI/MogaNet.引用本文复制引用
Siyuan Li,Zedong Wang,Zicheng Liu,Cheng Tan,Haitao Lin,Di Wu,Zhiyuan Chen,Jiangbin Zheng,Stan Z. Li.MogaNet: Multi-order Gated Aggregation Network[EB/OL].(2022-11-06)[2026-04-05].https://arxiv.org/abs/2211.03295.学科分类
计算技术、计算机技术
评论