Using Deep Learning for Obscene Language Detection in Vietnamese Social Media

Dang, Dai Tho; Tran, Xuan Thang; Huynh, Cong Phap; Nguyen, Ngoc Thanh

Vui lòng dùng định danh này để trích dẫn hoặc liên kết đến tài liệu này: https://elib.vku.udn.vn/handle/123456789/2722

Nhan đề:	Using Deep Learning for Obscene Language Detection in Vietnamese Social Media
Tác giả:	Dang, Dai Tho Tran, Xuan Thang Huynh, Cong Phap Nguyen, Ngoc Thanh
Từ khoá:	Obscene language Deep Learning Vietnamese Social Media
Năm xuất bản:	thá-2023
Nhà xuất bản:	Springer Nature
Tóm tắt:	Nowadays, a vast volume of text data is generated by Vietnamese people daily on social media platforms. Besides the enormous benefits, this situation creates many challenges. One of them concerns the fact that a tremendous amount of text contains obscene language. This kind of data negatively affects readers, especially young people. Detecting this kind of text is an important problem. In this paper, we investigate this problem using Deep Learning (DL) models such as Convolutional Neural Networks (CNN), Long-Short Term Memory (LSTM), and Bidirectional Long-Short Term Memory (BiLSTM). Besides, we combine LSTM and CNN in both sequence (sequential LSTM-CNN) and parallel (parallel LSTM-CNN) forms and sequential BiLSTM-CNN to solve this task. For word embedding phrase, we use Word2vec and PhoBERT. Experiment results show that the BiLSTM model with PhoBERT gains the best results for the obscene discrimination task, with 81.4% and 81.5% for accuracy and F1-score, respectively.
Mô tả:	Lecture Notes in Networks and Systems (LNNS, volume 734); CITA: Conference on Information Technology and its Applications; pp: 306-317.
Định danh:	https://link.springer.com/chapter/10.1007/978-3-031-36886-8_26 http://elib.vku.udn.vn/handle/123456789/2722
ISBN:	978-3-031-36886-8
Bộ sưu tập:	CITA 2023 (International)

Các tập tin trong tài liệu này:

Đăng nhập để xem toàn văn

Hiển thị đầy đủ biểu ghi tài liệu Xem thống kê

Khi sử dụng các tài liệu trong Thư viện số phải tuân thủ Luật bản quyền.