首页|CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

来源：

英文摘要

Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 30 countries on four continents, covering 31 languages with 13 scripts, providing a total of 10k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.

作者：Emilio Villa-Cueva、Samuel Cahyawijaya、Ryandito Diandaru、Bontu Fufa Balcha、David Romero、Kumaranage Ravindu Yasas Nagasinghe、Yova Kementchedjhieva、Gisela Vallejo、Toqeer Ehsan、Tiago Timponi Torrent、Guido Ivetta、Fajri Koto、Artem Abzaliev、David Le Meur、Ruochen Zhang、Luis Fernando D'Haro、Aditya Nanda Kishore、Mohamed Fazli Mohamed Imam、Christian Salamea、Muhammad Farid Adilazuarda、Mihail Mihaylov、Joan Nwatu、Chenyang Lyu、Dan John Velasco、Marcos Estecha-Garitagoitia、Rendi Chevi、Maria Camila Buitrago Cabrera、Henok Biadglign Ademtew、Hern¨￠n Maina、Jesus-German Ortiz-Barajas、Zheng Wei Lim、Atnafu Lambebo Tonja、Zheng Xin Yong、Teresa Lynn、Israel Abebe Azime、Aishik Mandal、Vladimir Araujo、Jiahui Geng、Haiyue Song、Ganzorig Batnasan、Jan Christian Blaise Cruz、Paula M¨?nica Silva、Frederico Belcavello、Munkh-Erdene Otgonbold、Oana Ignat、Mario Rodr¨aguez-Cantelar、Grainne Caulfield、Thanmay Jayakumar、Laura Alonso Alemany、Olivier Niyomugisha、Jay Gala、Soyeong Jeong、Teresa Clifford、Munkhjargal Gochoo、Chenxi Whitehouse、Fauzan Farooqui、Alham Fikri Aji、Pranjal Chitale、Naome Etori、Jocelyn Dunstan、Thamar Solorio、Holy Lovenia、Raj Dabre、Rada Mihalcea、Sukannya Purkayastha、Jinheon Baek、M¨|lanie Jouitteau、David Ifeoluwa Adelani、Injy Hamed、Marcelo Viridiano、Haryo Akbarianto Wibowo、Tatsuki Kuribayashi、Luciana Benotti、Zara Burzo、Santiago G¨?ngora、Alina Dragonetti

作者单位：

学科分类：信息传播、知识传播科学、科学研究文化理论

推荐引用：Emilio Villa-Cueva,Samuel Cahyawijaya,Ryandito Diandaru,Bontu Fufa Balcha,David Romero,Kumaranage Ravindu Yasas Nagasinghe,Yova Kementchedjhieva,Gisela Vallejo,Toqeer Ehsan,Tiago Timponi Torrent,Guido Ivetta,Fajri Koto,Artem Abzaliev,David Le Meur,Ruochen Zhang,Luis Fernando D'Haro,Aditya Nanda Kishore,Mohamed Fazli Mohamed Imam,Christian Salamea,Muhammad Farid Adilazuarda,Mihail Mihaylov,Joan Nwatu,Chenyang Lyu,Dan John Velasco,Marcos Estecha-Garitagoitia,Rendi Chevi,Maria Camila Buitrago Cabrera,Henok Biadglign Ademtew,Hern¨￠n Maina,Jesus-German Ortiz-Barajas,Zheng Wei Lim,Atnafu Lambebo Tonja,Zheng Xin Yong,Teresa Lynn,Israel Abebe Azime,Aishik Mandal,Vladimir Araujo,Jiahui Geng,Haiyue Song,Ganzorig Batnasan,Jan Christian Blaise Cruz,Paula M¨?nica Silva,Frederico Belcavello,Munkh-Erdene Otgonbold,Oana Ignat,Mario Rodr¨aguez-Cantelar,Grainne Caulfield,Thanmay Jayakumar,Laura Alonso Alemany,Olivier Niyomugisha,Jay Gala,Soyeong Jeong,Teresa Clifford,Munkhjargal Gochoo,Chenxi Whitehouse,Fauzan Farooqui,Alham Fikri Aji,Pranjal Chitale,Naome Etori,Jocelyn Dunstan,Thamar Solorio,Holy Lovenia,Raj Dabre,Rada Mihalcea,Sukannya Purkayastha,Jinheon Baek,M¨|lanie Jouitteau,David Ifeoluwa Adelani,Injy Hamed,Marcelo Viridiano,Haryo Akbarianto Wibowo,Tatsuki Kuribayashi,Luciana Benotti,Zara Burzo,Santiago G¨?ngora,Alina Dragonetti.CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark[EB/OL].(2024-06-09)[2025-07-01].https://arxiv.org/abs/2406.05967.点此复制

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

评论