Using Deep Learning for Obscene Language Detection in Vietnamese Social Media

Dang, Dai Tho; Tran, Xuan Thang; Huynh, Cong Phap; Nguyen, Ngoc Thanh

Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/2722

Title:	Using Deep Learning for Obscene Language Detection in Vietnamese Social Media
Authors:	Dang, Dai Tho Tran, Xuan Thang Huynh, Cong Phap Nguyen, Ngoc Thanh
Keywords:	Obscene language Deep Learning Vietnamese Social Media
Issue Date:	Jul-2023
Publisher:	Springer Nature
Abstract:	Nowadays, a vast volume of text data is generated by Vietnamese people daily on social media platforms. Besides the enormous benefits, this situation creates many challenges. One of them concerns the fact that a tremendous amount of text contains obscene language. This kind of data negatively affects readers, especially young people. Detecting this kind of text is an important problem. In this paper, we investigate this problem using Deep Learning (DL) models such as Convolutional Neural Networks (CNN), Long-Short Term Memory (LSTM), and Bidirectional Long-Short Term Memory (BiLSTM). Besides, we combine LSTM and CNN in both sequence (sequential LSTM-CNN) and parallel (parallel LSTM-CNN) forms and sequential BiLSTM-CNN to solve this task. For word embedding phrase, we use Word2vec and PhoBERT. Experiment results show that the BiLSTM model with PhoBERT gains the best results for the obscene discrimination task, with 81.4% and 81.5% for accuracy and F1-score, respectively.
Description:	Lecture Notes in Networks and Systems (LNNS, volume 734); CITA: Conference on Information Technology and its Applications; pp: 306-317.
URI:	https://link.springer.com/chapter/10.1007/978-3-031-36886-8_26 http://elib.vku.udn.vn/handle/123456789/2722
ISBN:	978-3-031-36886-8
Appears in Collections:	CITA 2023 (International)

Files in This Item:

Sign in to read

Show full item record