Please use this identifier to cite or link to this item:
Title: Using Deep Learning for Obscene Language Detection in Vietnamese Social Media
Authors: Dang, Dai Tho
Tran, Xuan Thang
Huynh, Cong Phap
Nguyen, Ngoc Thanh
Keywords: Obscene language
Deep Learning
Vietnamese Social Media
Issue Date: Jul-2023
Publisher: Springer Nature
Abstract: Nowadays, a vast volume of text data is generated by Vietnamese people daily on social media platforms. Besides the enormous benefits, this situation creates many challenges. One of them concerns the fact that a tremendous amount of text contains obscene language. This kind of data negatively affects readers, especially young people. Detecting this kind of text is an important problem. In this paper, we investigate this problem using Deep Learning (DL) models such as Convolutional Neural Networks (CNN), Long-Short Term Memory (LSTM), and Bidirectional Long-Short Term Memory (BiLSTM). Besides, we combine LSTM and CNN in both sequence (sequential LSTM-CNN) and parallel (parallel LSTM-CNN) forms and sequential BiLSTM-CNN to solve this task. For word embedding phrase, we use Word2vec and PhoBERT. Experiment results show that the BiLSTM model with PhoBERT gains the best results for the obscene discrimination task, with 81.4% and 81.5% for accuracy and F1-score, respectively.
Description: Lecture Notes in Networks and Systems (LNNS, volume 734); CITA: Conference on Information Technology and its Applications; pp: 306-317.
ISBN: 978-3-031-36886-8
Appears in Collections:CITA 2023 (International)

Files in This Item:

 Sign in to read

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.