Vui lòng dùng định danh này để trích dẫn hoặc liên kết đến tài liệu này: https://elib.vku.udn.vn/handle/123456789/6234
Toàn bộ biểu ghi siêu dữ liệu
Trường DCGiá trị Ngôn ngữ
dc.contributor.authorMa, Thanh-
dc.contributor.authorTran, Viet Chau-
dc.contributor.authorTran, Nguyen Minh Thu-
dc.contributor.authorPham, Xuan Hien-
dc.contributor.authorNguyen, Van Nguyen-
dc.contributor.authorDo, Thanh Nghi-
dc.date.accessioned2026-01-20T07:26:42Z-
dc.date.available2026-01-20T07:26:42Z-
dc.date.issued2026-01-
dc.identifier.isbn978-3-032-00971-5 (p)-
dc.identifier.isbn978-3-032-00972-2 (e)-
dc.identifier.urihttps://doi.org/10.1007/978-3-032-00972-2_8-
dc.identifier.urihttps://elib.vku.udn.vn/handle/123456789/6234-
dc.descriptionLecture Notes in Networks and Systems (LNNS,volume 1581); The 14th Conference on Information Technology and Its Applications (CITA 2025) ; pp:vi_VN
dc.description.abstractThis paper presents an advanced Vietnamese voice conversion system, called ViSWAP, that utilizes a diffusion model to achieve highly natural and intelligible speech synthesis. By incorporating cutting-edge techniques such as HiFi-GAN, Real-Time Voice Cloning, and speaker diarization, ViSWAP effectively converts voices in both single and multi-speaker contexts with precision and speed. The system processes audio through a structured pipeline, from pre-processing with mel-spectrogram generation and TextGrid alignment in Vietnamese, to encoding and decoding within the diffusion framework. The adoption of the diffusion model is crucial, as it excels in maintaining high-quality voice conversion by handling complex transformations with superior fidelity. Experimental evaluations across multiple audio frequencies demonstrate the system’s strength in minimizing key metrics such as DTW, Euclidean, and Cosine distances, MSE showcasing significant improvements in timbre accuracy and harmonic preservation. We have also published the dataset and implementation on Github (https://github.com/Nguyen-Van-Nguyen-github/DiffusionVoiceVietNam).vi_VN
dc.language.isoenvi_VN
dc.publisherSpringer Naturevi_VN
dc.subjectDiffusion modelvi_VN
dc.subjectVoice conversionvi_VN
dc.subjectVietnamese speechvi_VN
dc.subjectHiFi-Ganvi_VN
dc.subjectReal-time-voice cloningvi_VN
dc.titleViSWAP: Vietnamese Voice Conversion System with Diffusion Modelvi_VN
dc.typeWorking Papervi_VN
Bộ sưu tập: CITA 2025 (International)

Các tập tin trong tài liệu này:

 Đăng nhập để xem toàn văn



Khi sử dụng các tài liệu trong Thư viện số phải tuân thủ Luật bản quyền.