一种两阶段过滤的Android应用市场爬虫工具
two-stage filtering Android application market crawler tool
目前Android操作系统在手机操作系统市场上占有绝对优势地位,其开放性在给用户带来使用方便的同时也给各类恶意Android应用程序提供了生长的土壤。为了能够对市面上现存的数百家提供Android应用下载服务的应用市场进行必要的监控和检测服务,需要有一套完善的爬虫流程对这些Android应用进行爬取。本文针对在爬取流程中的噪点URL剔除问题提出了一种两阶段过滤的剔除方案,该方案基于URL路径长度分析过滤和规则匹配过滤的方法,可以显著减少常规正则表达式过滤方法的整体耗时。通过相关实验验证,达到了预期效果。
t present, the Android operating system occupies an absolute dominant position in the mobile phone operating system market, and its openness brings convenience to users while also providing the soil for the growth of various malicious Android applications. In order to be able to perform necessary monitoring and detection services on the hundreds of existing application markets that provide Android application download services on the market, a complete crawler process is required to crawl these Android applications. This paper proposes a two-stage filtering scheme to eliminate noise in the crawling process. This scheme is based on URL path length analysis filtering and rule matching filtering methods, which can significantly reduce the overall consumption of conventionaResearch of a two-stage filtering Mobile App store Crawlerl regular expression filtering methods. Through experimental verification, the expected results have been achieved.
朱旗、金正平
计算技术、计算机技术
计算机应用技术网络爬虫信息提取噪点剔除
computer applicationweb crawlerinformation extractionnoise removal
朱旗,金正平.一种两阶段过滤的Android应用市场爬虫工具[EB/OL].(2021-03-16)[2025-08-18].http://www.paper.edu.cn/releasepaper/content/202103-154.点此复制
评论