|国家预印本平台
| 注册
首页|Level-Navi Agent: A Framework and benchmark for Chinese Web Search Agents

Level-Navi Agent: A Framework and benchmark for Chinese Web Search Agents

Shichong Xie Chuanrui Hu Jun Zhang Xiaofeng Cong Baoxin Wang Bin Chen

Level-Navi Agent: A Framework and benchmark for Chinese Web Search Agents

Level-Navi Agent: A Framework and benchmark for Chinese Web Search Agents

Shichong Xie Chuanrui Hu Jun Zhang Xiaofeng Cong Baoxin Wang Bin Chen

作者信息

摘要

Large language models (LLMs), adopted to understand human language, drive the development of artificial intelligence (AI) web search agents. Compared to traditional search engines, LLM-powered AI search agents are capable of understanding and responding to complex queries with greater depth, enabling more accurate operations and better context recognition. However, little attention and effort has been paid to the Chinese web search, which results in that the capabilities of open-source models have not been uniformly and fairly evaluated. The difficulty lies in lacking three aspects: an unified agent framework, an accurately labeled dataset, and a suitable evaluation metric. To address these issues, we propose a general-purpose and training-free web search agent by level-aware navigation,  Level-Navi Agent, accompanied by a well-annotated dataset (Web24) and a suitable evaluation metric. Level-Navi Agent can think through complex user questions and conduct searches across various levels on the internet to gather information for questions. Meanwhile, we provide a comprehensive evaluation of state-of-the-art LLMs under fair settings. To further facilitate future research, source code is available at Github.

Abstract

Large language models (LLMs), adopted to understand human language, drive the development of artificial intelligence (AI) web search agents. Compared to traditional search engines, LLM-powered AI search agents are capable of understanding and responding to complex queries with greater depth, enabling more accurate operations and better context recognition. However, little attention and effort has been paid to the Chinese web search, which results in that the capabilities of open-source models have not been uniformly and fairly evaluated. The difficulty lies in lacking three aspects: an unified agent framework, an accurately labeled dataset, and a suitable evaluation metric. To address these issues, we propose a general-purpose and training-free web search agent by level-aware navigation,  Level-Navi Agent, accompanied by a well-annotated dataset (Web24) and a suitable evaluation metric. Level-Navi Agent can think through complex user questions and conduct searches across various levels on the internet to gather information for questions. Meanwhile, we provide a comprehensive evaluation of state-of-the-art LLMs under fair settings. To further facilitate future research, source code is available at Github.

关键词

Web Search Agent/Benchmarking/Evaluation Metrics/Large Language Model

Key words

Web Search Agent/Benchmarking/Evaluation Metrics/Large Language Model

引用本文复制引用

Shichong Xie,Chuanrui Hu,Jun Zhang,Xiaofeng Cong,Baoxin Wang,Bin Chen.Level-Navi Agent: A Framework and benchmark for Chinese Web Search Agents[EB/OL].(2024-12-25)[2026-04-03].https://chinaxiv.org/abs/202412.00330.

学科分类

计算技术、计算机技术/自动化技术、自动化技术设备

评论

首发时间 2024-12-25
下载量:0
|
点击量:100
段落导航相关论文