VLF-VQA: Vietnamese Lightweight Fusion Architecture for Visual Question Answering

Nguyen, Toan; Quan, Tho

Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/4011

Full metadata record

DC Field	Value	Language
dc.contributor.author	Nguyen, Toan	-
dc.contributor.author	Quan, Tho	-
dc.date.accessioned	2024-07-30T08:43:55Z	-
dc.date.available	2024-07-30T08:43:55Z	-
dc.date.issued	2024-07	-
dc.identifier.isbn	978-604-80-9774-5	-
dc.identifier.uri	https://elib.vku.udn.vn/handle/123456789/4011	-
dc.description	Proceedings of the 13th International Conference on Information Technology and Its Applications (CITA 2024); pp: 58-69.	vi_VN
dc.description.abstract	In this paper, we propose a lightweight transformer-based approach to address the challenges of Visual Question Answering (VQA). While many Vision Language Models (VLMs) are based on Large Language Models (LLMs) with billions of parameters and require significant training resources, we optimize a language model GPT-2 by following the fusion architecture. To achieve this goal, we modify GPT-2 by incorporating a cross-attention block to align image and text features from two frozen encoders. During the training process, we apply the LORA fine-tuning technique to minimize training costs while maintaining effectiveness. Our research focuses on three main aspects: Training cost, natural output, and support for the Vietnamese language. We evaluated our approach on datasets for VQA and image captioning, achieving results that are comparable to existing methods while maintaining less far resource consumption. Our code and training weights are available at https://github.com/naot97/VLF-VQA	vi_VN
dc.language.iso	en	vi_VN
dc.publisher	Vietnam-Korea University of Information and Communication Technology	vi_VN
dc.relation.ispartofseries	CITA;	-
dc.subject	Vision Language Models	vi_VN
dc.subject	Visual Question Answering	vi_VN
dc.subject	LoRA	vi_VN
dc.title	VLF-VQA: Vietnamese Lightweight Fusion Architecture for Visual Question Answering	vi_VN
dc.type	Working Paper	vi_VN
Appears in Collections:	CITA 2024 (Proceeding - Vol 2)

Files in This Item:

Sign in to read

Show simple item record