Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/6234
Title: ViSWAP: Vietnamese Voice Conversion System with Diffusion Model
Authors: Ma, Thanh
Tran, Viet Chau
Tran, Nguyen Minh Thu
Pham, Xuan Hien
Nguyen, Van Nguyen
Do, Thanh Nghi
Keywords: Diffusion model
Voice conversion
Vietnamese speech
HiFi-Gan
Real-time-voice cloning
Issue Date: Jan-2026
Publisher: Springer Nature
Abstract: This paper presents an advanced Vietnamese voice conversion system, called ViSWAP, that utilizes a diffusion model to achieve highly natural and intelligible speech synthesis. By incorporating cutting-edge techniques such as HiFi-GAN, Real-Time Voice Cloning, and speaker diarization, ViSWAP effectively converts voices in both single and multi-speaker contexts with precision and speed. The system processes audio through a structured pipeline, from pre-processing with mel-spectrogram generation and TextGrid alignment in Vietnamese, to encoding and decoding within the diffusion framework. The adoption of the diffusion model is crucial, as it excels in maintaining high-quality voice conversion by handling complex transformations with superior fidelity. Experimental evaluations across multiple audio frequencies demonstrate the system’s strength in minimizing key metrics such as DTW, Euclidean, and Cosine distances, MSE showcasing significant improvements in timbre accuracy and harmonic preservation. We have also published the dataset and implementation on Github (https://github.com/Nguyen-Van-Nguyen-github/DiffusionVoiceVietNam).
Description: Lecture Notes in Networks and Systems (LNNS,volume 1581); The 14th Conference on Information Technology and Its Applications (CITA 2025) ; pp:
URI: https://doi.org/10.1007/978-3-032-00972-2_8
https://elib.vku.udn.vn/handle/123456789/6234
ISBN: 978-3-032-00971-5 (p)
978-3-032-00972-2 (e)
Appears in Collections:CITA 2025 (International)

Files in This Item:

 Sign in to read



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.