Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Human-centric perception tasks, e.g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis. There is a recent surge to develop human-centric foundation models that can benefit a broad range of human-centric perception tasks. While many human-centric foundation models have achieved success, they did not explore 3D and vision-language tasks for human-centric and required task-specific finetuning. These limitations restrict their application to more downstream tasks and situations. To tackle these problems, we present Hulk, the first multimodal human-centric generalist model, capable of addressing 2D vision, 3D vision, skeleton-based, and vision-language tasks without task-specific finetuning. The key to achieving this is condensing various task-specific heads into two general heads, one for discrete representations, \emph{e.g.,} languages, and the other for continuous representations, \emph{e.g.,} location coordinates. The outputs of two heads can be further stacked into four distinct input and output modalities. This uniform representation enables Hulk to treat diverse human-centric tasks as modality translation, integrating knowledge across a wide range of tasks. Comprehensive evaluations of Hulk on 12 benchmarks covering 8 human-centric tasks demonstrate the superiority of our proposed method, achieving state-of-the-art performance in 11 benchmarks. The code will be available on https://github.com/OpenGVLab/Hulk.
Rui Zhao、Xun Guo、Feng Zhu、Lei Bai、Jian Wu、Tong He、Wanli Ouyang、Shixiang Tang、Yizhou Wang、Yixuan Wu、Weizhen He
信息传播、知识传播科学、科学研究计算技术、计算机技术
Rui Zhao,Xun Guo,Feng Zhu,Lei Bai,Jian Wu,Tong He,Wanli Ouyang,Shixiang Tang,Yizhou Wang,Yixuan Wu,Weizhen He.Hulk: A Universal Knowledge Translator for Human-Centric Tasks[EB/OL].(2025-08-06)[2025-08-16].https://arxiv.org/abs/2312.01697.点此复制
评论