主题网络爬虫研究与C#实现
Research on and implementation of topic web crawler
本文从对比通用网络爬虫与主题网络爬虫的需求与实现机制出发,研究多中网络爬虫网页抓取策略的不同性能,并从中讨论较为适合主题网络爬虫的网络抓取策略与算法,其中主要为Fish-Search算法和Shark-Search算法。并且通过研究网络爬虫的实现过程、技术方法以及不同网页抓取方案的效率,提出一套主题网络爬虫的实现结构与方法,并对如何使用C#实现此网络爬虫进行介绍。此网络爬虫可用于多进程或者多机器配合抓取网页,在考虑网络服务器的负载问题和robots.txt的同时,也具有较高的网页抓取效率。此网络爬虫可用于多种数据信息系统,包括垂直搜索引擎、主题信息数据抓取收集系统等。
his paper provides the introduction of the difference of general web crawler and topic web crawler, and comparision between multiple strategies for web crawling. In the paper gives the better effective strategy for topic web crawler which is Fish-Search or Shark-Search. Later this paper provides a framework and the implementation of important part using C# of a topic web crawler based on the technique, implementation of the topic web crawler, which is effective and considers robots.txt of the sites. The framework can be used in a lot of information systems, such as focused search engines.
吴峰
计算技术、计算机技术
网络爬虫#主题信息抓取
opic Crawler#Web mining
吴峰.主题网络爬虫研究与C#实现[EB/OL].(2008-11-19)[2025-08-11].http://www.paper.edu.cn/releasepaper/content/200811-550.点此复制
评论