DaNangVMD: Vietnamese Speech Mispronunciation Detection

Nguyen, Ket Doan; Tran, Nguyen Anh; Vo, Van Nam; Nguyen, Tran Tien; Le, Pham Tuyen; Nguyen, Quoc Vuong; Nguyen, Huu Nhat Minh

Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/3995

Full metadata record

DC Field	Value	Language
dc.contributor.author	Nguyen, Ket Doan	-
dc.contributor.author	Tran, Nguyen Anh	-
dc.contributor.author	Vo, Van Nam	-
dc.contributor.author	Nguyen, Tran Tien	-
dc.contributor.author	Le, Pham Tuyen	-
dc.contributor.author	Nguyen, Quoc Vuong	-
dc.contributor.author	Nguyen, Huu Nhat Minh	-
dc.date.accessioned	2024-07-30T01:28:19Z	-
dc.date.available	2024-07-30T01:28:19Z	-
dc.date.issued	2024-06	-
dc.identifier.issn	1859-3526	-
dc.identifier.uri	https://doi.org/10.32913/mic-ict-research-vn.v2024.n1.1271	-
dc.identifier.uri	https://ictmag.vn/cntt-tt/article/view/1271/566	-
dc.identifier.uri	https://elib.vku.udn.vn/handle/123456789/3995	-
dc.description	Research and Development on Information and Communication Technology; pp: 49-55.	vi_VN
dc.description.abstract	Automatic Speech Recognition, also known as ASR, has grown exponentially over the past decade and is used to recognize and translate human speech into readable text automatically. However, Vietnamese Speech Recognition faces critical challenges such as frequent mispronunciations as well as a huge variant in Vietnamese speech. In this work, we dive into the difficult challenge of Mispronunciation Detection (MD) in the Vietnamese language. As such a tonal language, Vietnamese is not only based on consonants and vowels but also on variations in pitch or tone during pronunciation. In this paper, we propose DaNangVMD model for detecting mispronunciations in Vietnamese speech based on the audio speech and canonical transcript. By leveraging multi-head attention-based multimodal representation from the embeddings of the phonetic encoder and linguistic encoder, DaNangVMD aims to provide a robust solution for accurate mispronunciation detection and diagnosis. Throughout the extensive evaluation, the proposed DaNangVMD exhibits superior performances rather than that of the PAPL baseline models by 15% in F1 score and 13% in accuracy.	vi_VN
dc.language.iso	en	vi_VN
dc.publisher	Journal of Infomation & Communications	vi_VN
dc.subject	Mispronunciation Detection	vi_VN
dc.subject	Multimodal Embedding	vi_VN
dc.subject	Vietnamese Speech Recognition	vi_VN
dc.title	DaNangVMD: Vietnamese Speech Mispronunciation Detection	vi_VN
dc.title.alternative	DaNangVMD: Nhận diện phát âm sai tiếng Việt	vi_VN
dc.type	Working Paper	vi_VN
Appears in Collections:	NĂM 2024

Files in This Item:

Sign in to read

Show simple item record