|国家预印本平台
首页|互联网产品结构化信息抽取技术

互联网产品结构化信息抽取技术

Structured Information Extraction for Web Product

中文摘要英文摘要

随着电子商务和垂直搜索引擎的发展,产品结构化信息抽取成为数据挖掘、信息检索、自然语言处理的一个研究热点。本文以服装饰品领域的产品命名实体识别为例,分析了产品领域结构化信息抽取的难点和特点,采用了一种基于条件随机场的主动学习方法,实现了从中文网页文本中抽取产品实体。以一个团购电子商务平台中的数据实例为实验数据,该方法获得整体F1值为0.80,其中品牌、颜色、材质F1值分别为0.70、0.89、0.72。实验表明,该方法在达到较高性能的同时,能有效减少数据标注的代价。

With the development of e-commerce and vertical search engine, structured information extraction for online products becomes a research focus in data mining, information retrieval and natural language processing. This paper, as an example in named product entity recognition in clothing, we analyzed the difficulties and characteristics of information extraction, adopted an active learning method based on conditional random fields, and we implemented the process of product entity extraction from Chinese webpages. With the use of data from a group buying e-commerce platform for the experiment, this method obtained the overall F1 value 0.80, including brand, color, material F1 value, 0.70, 0.89 and 0.72, respectively. The experiment shows that this method has achieved higher performance; meanwhile, it has effectively reduced the cost of data labeling.

袁彩霞、季成晖、王小捷

计算技术、计算机技术

自然语言处理信息抽取产品命名实体识别条件随机场主动学习

Natrual Language Processinginformation extractionproduct named entity recognitionconditional random fieldsactive learning

袁彩霞,季成晖,王小捷.互联网产品结构化信息抽取技术[EB/OL].(2012-12-17)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/201212-372.点此复制

评论