|国家预印本平台
首页|Text-to-Image Models and Their Representation of People from Different Nationalities Engaging in Activities

Text-to-Image Models and Their Representation of People from Different Nationalities Engaging in Activities

Text-to-Image Models and Their Representation of People from Different Nationalities Engaging in Activities

来源:Arxiv_logoArxiv
英文摘要

This paper investigates how a popular Text-to-Image (T2I) model represents people from 208 different nationalities when prompted to generate images of individuals engaging in typical activities. Two scenarios were developed, and 644 images were generated based on input prompts that specified nationalities. The results show that in one scenario, 52.88% of images, and in the other, 27.4%, depict individuals wearing traditional attire. A statistically significant relationship was observed between this representation pattern and regions. This indicates that the issue disproportionately affects certain areas, particularly the Middle East & North Africa and Sub-Saharan Africa. A notable association with income groups was also found. CLIP, ALIGN, and GPT-4.1 mini were used to measure alignment scores between generated images and 3320 prompts and captions, with findings indicating statistically significant higher scores for images featuring individuals in traditional attire in one scenario. The study also examined revised prompts, finding that the word "traditional" was added by the model to 88.46% of prompts for one scenario. These findings provide valuable insights into T2I models' representation of individuals across different countries, demonstrating how the examined model prioritizes traditional characteristics despite their impracticality for the given activities.

Abdulkareem Alsudais

计算技术、计算机技术

Abdulkareem Alsudais.Text-to-Image Models and Their Representation of People from Different Nationalities Engaging in Activities[EB/OL].(2025-06-25)[2025-06-29].https://arxiv.org/abs/2504.06313.点此复制

评论