|国家预印本平台
首页|基于Transformer和对比学习的文本生成图像方法

基于Transformer和对比学习的文本生成图像方法

ext generation image method based on transformer and contrast learning

中文摘要英文摘要

针对文本生成图像任务中合成图像丢失部分图像属性并与给定文本描述语义不一致的问题,引入对比学习方法优化改进深度融合生成对抗网络(deep fusion generative adversarial networks, DF-GAN)。使用Transformer编码器结构提取全局语义信息,并利用对比学习方法来学习同一图像的多个文本表示,解决因语义信息提取不完善导致的合成图像属性丢失的问题;在网络训练过程中,采用对比学习方法来增强来自同一图像相关文本的合成图像之间的一致性。在CUB数据集上的实验结果表明,优化后的DF-GAN初始分数达到了4.97,FID指标达到了15.51,相比于原始DF-GAN分别提高了2.2%和19.38%,使模型生成了高质量的图像,并增强了合成图像与文本描述的语义一致性。

o solve the problem related to the lose of some image attributes and the inconsistence with the given text description semantics of the synthetic image in the text generation image task, a contrast learning method is introduced to optimize and improve the deep fusion generative adversarial networks (DF-GAN). The Transformer encoder structure is used to extract the global semantic information, and the comparative learning method is used to learn multiple text representations of the same image to solve the problem of attribute loss of the composite image due to incomplete semantic information extraction. In the process of network training, the contrast learning method is used to enhance the consistency between synthetic images from the same image related text. The experimental results in CUB dataset show that the optimized DF-GAN initial score and the FID index reach 4.97, and 15.51, respectively. Compared with the original DF-GAN, increases by 2.2% and 19.38% are achieved respectively, which enables the model to generate high-quality images, and enhances the semantic consistency between the composite image and the text description.

代婷婷、曲金帅、诸林云、范菁

计算技术、计算机技术

生成对抗网络文本生成图像对比学习ransformer

generative adversarial networkstext generation imagecomparative learningTransformer

代婷婷,曲金帅,诸林云,范菁.基于Transformer和对比学习的文本生成图像方法[EB/OL].(2023-07-14)[2025-08-18].http://www.paper.edu.cn/releasepaper/content/202307-26.点此复制

评论