Vui lòng dùng định danh này để trích dẫn hoặc liên kết đến tài liệu này: https://elib.vku.udn.vn/handle/123456789/4011
Toàn bộ biểu ghi siêu dữ liệu
Trường DCGiá trị Ngôn ngữ
dc.contributor.authorNguyen, Toan-
dc.contributor.authorQuan, Tho-
dc.date.accessioned2024-07-30T08:43:55Z-
dc.date.available2024-07-30T08:43:55Z-
dc.date.issued2024-07-
dc.identifier.isbn978-604-80-9774-5-
dc.identifier.urihttps://elib.vku.udn.vn/handle/123456789/4011-
dc.descriptionProceedings of the 13th International Conference on Information Technology and Its Applications (CITA 2024); pp: 58-69.vi_VN
dc.description.abstractIn this paper, we propose a lightweight transformer-based approach to address the challenges of Visual Question Answering (VQA). While many Vision Language Models (VLMs) are based on Large Language Models (LLMs) with billions of parameters and require significant training resources, we optimize a language model GPT-2 by following the fusion architecture. To achieve this goal, we modify GPT-2 by incorporating a cross-attention block to align image and text features from two frozen encoders. During the training process, we apply the LORA fine-tuning technique to minimize training costs while maintaining effectiveness. Our research focuses on three main aspects: Training cost, natural output, and support for the Vietnamese language. We evaluated our approach on datasets for VQA and image captioning, achieving results that are comparable to existing methods while maintaining less far resource consumption. Our code and training weights are available at https://github.com/naot97/VLF-VQAvi_VN
dc.language.isoenvi_VN
dc.publisherVietnam-Korea University of Information and Communication Technologyvi_VN
dc.relation.ispartofseriesCITA;-
dc.subjectVision Language Modelsvi_VN
dc.subjectVisual Question Answeringvi_VN
dc.subjectLoRAvi_VN
dc.titleVLF-VQA: Vietnamese Lightweight Fusion Architecture for Visual Question Answeringvi_VN
dc.typeWorking Papervi_VN
Bộ sưu tập: CITA 2024 (Proceeding - Vol 2)

Các tập tin trong tài liệu này:

 Đăng nhập để xem toàn văn



Khi sử dụng các tài liệu trong Thư viện số phải tuân thủ Luật bản quyền.