Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/5067
Title: A Hybrid Learning of Lexical and Language Processing for Domain Credibility Classification
Authors: Nguyen, Huu Nhat Minh
Nguyen, D. Bao
Ton, That Ron
Truong, The Quoc Dung
Truong, Dinh Dung
Phung, Anh Sang
Pham, Van Nam
Tran, The Son
Keywords: Domain credibility
Hybrid learning
Natural language processing
Issue Date: Oct-2024
Publisher: IEEE
Abstract: Malicious domains and websites pose a significant threat to normal users and their increasing prevalence demands for early detection methods. More and more domains registered with malicious intent are becoming more excessively difficult to prevent and detect. Leveraging the recent powerful BERT based-language representation and conventional lexical feature representation, we introduce a novel hybrid learning model that utilizes both lexical characteristics and semantic language features of inspected domains for domain credibility classification. The proposed model employs a combination of lexical and language encoders to process lexical features like length, special character count, domain type, domain entropy, and domain digits while fine-tuning the pre-trained language models such as Vietnamese PhoBERT and multilingual XLM-RoBERTa to capture semantic information from the domain. Through the experimental results, the hybrid learning models outperform the baselines such as using solely lexical encoder or language encoder in differentiatioz between High or Low credibility domains.
Description: 2024 International Conference on Advanced Technologies for Communications (ATC 2024);
URI: 10.1109/ATC63255.2024.10908153
https://elib.vku.udn.vn/handle/123456789/5067
ISBN: 979-8-3503-5397-6
ISSN: 2162-1020
Appears in Collections:NĂM 2024

Files in This Item:

 Sign in to read



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.