Please use this identifier to cite or link to this item:
https://elib.vku.udn.vn/handle/123456789/3995
Title: | DaNangVMD: Vietnamese Speech Mispronunciation Detection |
Other Titles: | DaNangVMD: Nhận diện phát âm sai tiếng Việt |
Authors: | Nguyen, Ket Doan Tran, Nguyen Anh Vo, Van Nam Nguyen, Tran Tien Le, Pham Tuyen Nguyen, Quoc Vuong Nguyen, Huu Nhat Minh |
Keywords: | Mispronunciation Detection Multimodal Embedding Vietnamese Speech Recognition |
Issue Date: | Jun-2024 |
Publisher: | Journal of Infomation & Communications |
Abstract: | Automatic Speech Recognition, also known as ASR, has grown exponentially over the past decade and is used to recognize and translate human speech into readable text automatically. However, Vietnamese Speech Recognition faces critical challenges such as frequent mispronunciations as well as a huge variant in Vietnamese speech. In this work, we dive into the difficult challenge of Mispronunciation Detection (MD) in the Vietnamese language. As such a tonal language, Vietnamese is not only based on consonants and vowels but also on variations in pitch or tone during pronunciation. In this paper, we propose DaNangVMD model for detecting mispronunciations in Vietnamese speech based on the audio speech and canonical transcript. By leveraging multi-head attention-based multimodal representation from the embeddings of the phonetic encoder and linguistic encoder, DaNangVMD aims to provide a robust solution for accurate mispronunciation detection and diagnosis. Throughout the extensive evaluation, the proposed DaNangVMD exhibits superior performances rather than that of the PAPL baseline models by 15% in F1 score and 13% in accuracy. |
Description: | Research and Development on Information and Communication Technology; pp: 49-55. |
URI: | https://doi.org/10.32913/mic-ict-research-vn.v2024.n1.1271 https://ictmag.vn/cntt-tt/article/view/1271/566 https://elib.vku.udn.vn/handle/123456789/3995 |
ISSN: | 1859-3526 |
Appears in Collections: | NĂM 2024 |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.