A Comparative Study on Domain and Content-Based Approaches for Abusive Website Detection

Nguyen, Quoc Vuong; Le, Tang Phu Quy; Pham, Van Nam; Ton, That Ron; Phung, Anh Sang; Truong, The Quoc Dung; Nguyen, Ngoc Xuan Quynh; Nguyen, Huu Nhat Minh

Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/6214

Full metadata record

DC Field	Value	Language
dc.contributor.author	Nguyen, Quoc Vuong	-
dc.contributor.author	Le, Tang Phu Quy	-
dc.contributor.author	Pham, Van Nam	-
dc.contributor.author	Ton, That Ron	-
dc.contributor.author	Phung, Anh Sang	-
dc.contributor.author	Truong, The Quoc Dung	-
dc.contributor.author	Nguyen, Ngoc Xuan Quynh	-
dc.contributor.author	Nguyen, Huu Nhat Minh	-
dc.date.accessioned	2026-01-20T02:23:01Z	-
dc.date.available	2026-01-20T02:23:01Z	-
dc.date.issued	2026-01	-
dc.identifier.isbn	978-3-032-00971-5 (p)	-
dc.identifier.isbn	978-3-032-00972-2 (e)	-
dc.identifier.uri	https://doi.org/10.1007/978-3-032-00972-2_21	-
dc.identifier.uri	https://elib.vku.udn.vn/handle/123456789/6214	-
dc.description	Lecture Notes in Networks and Systems (LNNS,volume 1581); The 14th Conference on Information Technology and Its Applications (CITA 2025) ; pp: 273-285	vi_VN
dc.description.abstract	The proliferation of abusive websites, particularly those facilitating phishing, fraud has emerged as a critical cybersecurity threat. Detecting these abusive websites efficiently remains a crucial challenge, necessitating sophisticated feature engineering and advanced machine learning techniques. In this paper, we present a comprehensive comparative study of domain-based and content-based approaches for abusive website detection with two datasets such as Vietnamese abusive websites and international phising datasets. Through extensive evaluation, we demonstrate that the integration of multiple feature types significantly enhances the detection accuracy. In particular, hosting-related features exhibit strong independent predictive capability, while machine learning models that take advantage of these features continue to achieve robust performance. Although extracted features contribute substantially to high-accuracy detection, our findings indicate that source code analysis is the most effective method for identifying abusive websites. In particular, language models, such as Phishlang, excel at capturing the textual patterns within website source code, achieving outstanding performance with an accuracy of 0.98 and an F1-score of 0.97.	vi_VN
dc.language.iso	en	vi_VN
dc.publisher	Springer Nature	vi_VN
dc.subject	Abusive website detection	vi_VN
dc.subject	Machine learning	vi_VN
dc.subject	Language model	vi_VN
dc.subject	Feature engineering	vi_VN
dc.title	A Comparative Study on Domain and Content-Based Approaches for Abusive Website Detection	vi_VN
dc.type	Working Paper	vi_VN
Appears in Collections:	CITA 2025 (International)

Files in This Item:

Sign in to read

Show simple item record