Bandit on the Hunt: Dynamic Crawling for Cyber Threat Intelligence
Bandit on the Hunt: Dynamic Crawling for Cyber Threat Intelligence
Public information contains valuable Cyber Threat Intelligence (CTI) that is used to prevent future attacks. While standards exist for sharing this information, much appears in non-standardized news articles or blogs. Monitoring online sources for threats is time-consuming and source selection is uncertain. Current research focuses on extracting Indicators of Compromise from known sources, rarely addressing new source identification. This paper proposes a CTI-focused crawler using multi-armed bandit (MAB) and various crawling strategies. It employs SBERT to identify relevant documents while dynamically adapting its crawling path. Our system ThreatCrawl achieves a harvest rate exceeding 25% and expands its seed by over 300% while maintaining topical focus. Additionally, the crawler identifies previously unknown but highly relevant overview pages, datasets, and domains.
计算技术、计算机技术
.Bandit on the Hunt: Dynamic Crawling for Cyber Threat Intelligence[EB/OL].(2025-04-25)[2025-05-13].https://arxiv.org/abs/2504.18375.点此复制
评论