面向BBS的主题爬虫系统的分析与设计
he Analysis and Design of Theme Crawler System for Topics in BBS
BBS是当前网络用户发表评论、自由交流的重要平台,也成为了用户需求和商业价值等重要信息的聚集地。主题爬虫是一种面向主题的信息搜集系统,可以根据用户需要从互联网上自动搜集与主题相关信息,在主题搜索引擎、站点结构分析等方面取得越来越广泛的应用。本文首先阐述了主题爬虫的工作原理、模块组成及其实现所需的关键技术,然后通过分析动态网页的目录型结构和BBS的文本结构,设计了一种具有较强通用性的BBS爬虫抓取方案,并详细描述了主题爬虫的设计方案。并与通用网络爬虫方案进行了对比。
BBS is an important platform for the current network users to make comments and exchange views. There is lots of commercial value, user needs and other important information here. Focused crawler is a topic-oriented information collection system, which can collect information relevant to the subject automatically from the Internet according to user needs, It is used more and more widely in the design of subject search engine and analysis of site structure. This paper describes the principle, main modules and the key technologies of focused Web crawler. A general scheme of the BBS information extraction Web crawler is designed by analyzing the directory structure of dynamic web pages and text structure of BBS. After that, the analysis and design of the focused crawler is described in detail. Finally, Contrast the scheme crawler with the general Web crawler program.
辛阳、赵晓阳
计算技术、计算机技术
BBS主题爬虫搜索算法舆情监控
BBSFocused CrawlerSearch AlgorithmMonitoring
辛阳,赵晓阳.面向BBS的主题爬虫系统的分析与设计[EB/OL].(2011-10-14)[2025-08-11].http://www.paper.edu.cn/releasepaper/content/201110-113.点此复制
评论