|国家预印本平台
首页|基于主题预测与信息指纹的文章抄袭判断

基于主题预测与信息指纹的文章抄袭判断

rticle Plagiarism Judgment Based on Topic Prediction and Information Fingerprint

中文摘要英文摘要

对于给定的一定数量的文章,我们的目的是判断这些文章中有哪些文章是互相抄袭的以及抄袭的可能性大小。本文主要以下列几个方面进行详细阐述思想以及技术理论:首先对每一篇文章进行处理,主要目的是抓取每一篇文章的主题内容(即获取每篇文章的关键词),用到的技术是TF-IDF,即定量计算每一个词的权重,也就是获得该词对预测该文章主题做出的贡献的大小。如果一个词预测主题的能力越强,权重越大,反之,权重越小。接下来主要的工作是对这些文章按照主题相似的程度分类,主题越相似,越有可能归为同一类。用到的技术是余弦相似性,进而采取一种自底向上不断合并的方法把所有文章进行分类,就能达到将话题主题相似的文章分为同一类。最后,我们使用信息指纹来判断两篇话题相似的文章互相抄袭的概率,分别对同一主题的文章进行处理,得到每篇文章的信息指纹,再进行两两比较即可。

In this paper,for a given number of articles, our goal is to determine which of these articles are plagiarized from each other and the likelihood of plagiarism.This article mainly the following several aspects in detail in this paper, the thought and technology theory: first of all, to deal with each article, main purpose is to grab the theme of each article content (that is, for each article keywords), use of technology is the TF - IDF, namely quantitative calculating the weights of every word, that is to get the word to predict the size of the article theme\'s contribution.If a word\'s ability to predict a topic is stronger, the weight is higher, and vice versa.The next main work is to classify these articles according to the degree of similarity of the topic. The more similar the topic is, the more likely it is to be classified into the same category.The technique used is cosine similarity, and then a bottom-up merging method is used to categorize all articles, so that articles with similar topics can be grouped into the same category.Finally, we use the information fingerprint to judge the probability of plagiarism between two articles with similar topics. We process the articles with the same topic respectively to get the information fingerprint of each article, and then make pair-to-pair comparison.

尹斯星、王凯悦

计算技术、计算机技术

人工智能F-IDF算法余弦相似性信息指纹

rtificial intelligenceTF-IDF algorithmosine similarityFingerprint information

尹斯星,王凯悦.基于主题预测与信息指纹的文章抄袭判断[EB/OL].(2021-04-19)[2025-08-16].http://www.paper.edu.cn/releasepaper/content/202104-154.点此复制

评论