基于词和实体标注的古籍数字人文知识库的构建与应用——以《资治通鉴·周秦汉纪》为例
he Construction and Application for Digital Humanities Knowledge Base of Ancient Books Based on Word and Entity Annotation: A Case Study on <i>Zhou Qin Han Annals ofZizhitongjian</i
目的/意义] 探索能够实现基于词和实体的检索与知识挖掘的人文知识库构建方法。[方法/过程] 以《资治通鉴·周秦汉纪》为例,对68卷60万字的文本自动分词与词性标注之后,人工标注文本中的人物、地点GIS、时间等实体信息,实现基于词和实体的全文检索和地图检索系统;利用同现信息,统计出人物关系与人物游历信息;进而使用TF-IDF方法,通过时间序列分析,挖掘出多事之秋、风云人物、风云之地等结果。[结果/结论] 基于词和实体的深度信息标注,能够解决缺乏词界、同名异指和异名同指的检索难题,更可以为古籍多角度的知识发掘与知识服务提供基础支撑。
Purpose/significance] To explore a humanistic knowledge base construction method based on word and entity retrieval and knowledge mining. [Method/process] This paper constructed the Zhou Qin Han Annals of the Zizhitongjian, achieved the automatic segmentation and part-of-speech tagging of the 68-volume 600,000-character text, manually annotated entity information such as persons, locations, GIS and time in the text, and designed the system of full-text retrieval and map visualization based on words and entities. This paper used co-occurrence information to get the relationship and travel information of the characters. By TF-IDF and time series analysis, the key periods, people and locations in history were automatically extracted and illustrated. [Result/conclusion] Depth information labeling based on words and entities is a good solution to the problems of word boundaries, same name with different person and different name with same person, and it can solid the basis for multi-studies on the knowledge mining and knowledge service of ancient books.
常博林、王东波、李斌、陈欣雨、万晨、冯敏萱
信息传播、知识传播中国史计算技术、计算机技术
《资治通鉴》数字人文知识挖掘古籍检索古文信息处理
i>Zizhitongjian</idigital humanitiesknowledge miningancient book retrievalancient Chinese language processing
常博林,王东波,李斌,陈欣雨,万晨,冯敏萱.基于词和实体标注的古籍数字人文知识库的构建与应用——以《资治通鉴·周秦汉纪》为例[EB/OL].(2023-04-01)[2025-08-18].https://chinaxiv.org/abs/202304.00421.点此复制
评论