|国家预印本平台
首页|特征聚集 Hash 网络

特征聚集 Hash 网络

Feature Aggregation Hash

中文摘要英文摘要

在信息爆炸的时代,图像检索技术在各个领域中发挥着至关重要的作用。为了解决细粒度图像检索任务中的挑战,本文提出了一种新的基于视觉变换器(ViT)和大型语言模型(LLM)的细粒度哈希图像检索方法。该方法通过结合视觉特征和语言模型的输出,生成更加鲁棒和具有语义意义的哈希码。实验结果表明,该方法在CUB-200-2011数据集上显著超过了现有的方法,特别是在较短的哈希码长度下仍能保持高精度的检索性能。本文的方法为细粒度图像检索任务提供了一种高效且具有通用性的解决方案。

In the era of information explosion, image retrieval technology plays a crucial role in various fields. To address the challenges in fine-grained image retrieval tasks, this paper proposes a novel fine-grained hashing image retrieval method based on Vision Transformer (ViT) and Large Language Model (LLM). By integrating visual features with the output from the language model, the proposed method generates more robust and semantically meaningful hash codes. Experimental results demonstrate that the proposed method significantly outperforms existing methods on the CUB-200-2011 dataset, especially maintaining high retrieval accuracy with shorter hash code lengths. This approach provides an efficient and generalizable solution for fine-grained image retrieval tasks.

肖波、周卫东

计算技术、计算机技术

人工智能,细粒度图像, Hash 检索

rtificial Intelligence Fine-grained image analysis Hash retrieval

肖波,周卫东.特征聚集 Hash 网络[EB/OL].(2024-10-29)[2025-07-16].http://www.paper.edu.cn/releasepaper/content/202410-28.点此复制

评论