名字嵌入向量方法
Name2vec: Name Embedding for Recommender System
在进入推荐系统之前,商品名、人名等实体名字需要嵌入低维向量。word2vec这样的流行嵌入算法的出发点是“相同语法位置上的词具有相似的向量”,而名字序列没有语法结构,导致名字向量的质量不高。本文从“相邻的名字具有相似的向量”出发,提出一个称为名字嵌入的新方法。名字嵌入使用了一些新技巧:公式比word2vec更简单,向量模长固定为1、用相对权重处理低频名字、优化目标使用简单的均方差。以名字相似度作为衡量标准,在NBA球队名人造集、球队名微博集和微博点赞集上,名字嵌入均显著优于word2vec。
Before entering into a recommender system, an entity name must be embedded into a vector. Some popular models, such as word2vec, are based on the principle words which are in the same syntactic position should embedded into similar vectors. However, sequence of entity names has no syntactic structure, which led to the low quality of name vectors. Based on the principle neighbouring names should embedded into similar vectors, this paper proposes a novel algorithm named name2vec. Name2vec has new features: vector length equals 1, relative weight which has solved the low frequency problem, optimization objective function is mean square error rather than cross entropy. The quality of embedding is measured by the similarity of entity names. On there datasets from WEIBO.COM, name2vec has a better performance than word2veec.
计算技术、计算机技术
名字嵌入,name2vec,word2vec
name2vec name embedding word2vec
.名字嵌入向量方法[EB/OL].(2020-10-19)[2025-08-02].https://chinaxiv.org/abs/202010.00007.点此复制
评论