Performance Analysis of Deep Learning Models for Software Fault Prediction Using the BugHunter Dataset

Dang, Thi Kim Ngan; Dao, Khanh Duy; Ha, Thi Minh Phuong; Nguyen, Thanh Binh

Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/5786

Full metadata record

DC Field	Value	Language
dc.contributor.author	Dang, Thi Kim Ngan	-
dc.contributor.author	Dao, Khanh Duy	-
dc.contributor.author	Ha, Thi Minh Phuong	-
dc.contributor.author	Nguyen, Thanh Binh	-
dc.date.accessioned	2025-11-11T04:13:35Z	-
dc.date.available	2025-11-11T04:13:35Z	-
dc.date.issued	2025-06	-
dc.identifier.issn	1859-3526	-
dc.identifier.uri	https://ictmag.ictvietnam.vn/cntt-tt/article/view/1374	-
dc.identifier.uri	https://elib.vku.udn.vn/handle/123456789/5786	-
dc.description	Research, Development and Application on Information and Communication Technology; Tập 2025, số 1, tháng 6	vi_VN
dc.description.abstract	Software fault prediction (SFP) involves the identification of potentially fault-prone modules before the testing phase in the software development lifecycle. By predicting faults early in the development process, the SFP process enables software developers to focus their efforts on components that may contain faults, thereby enhancing the overall quality and reliability of the software. Machine learning and deep learning techniques have been widely applied to train SFP models. However, these approaches face several challenges, including irrelevant or redundant features, imbalanced datasets, overfitting, and complex model structures. The NASA dataset from the PROMISE repository is the most commonly used dataset for fault prediction. Recently, the BugHunter dataset with its substantially larger number of instances was explored to train the SFP models. In this study, we present the comparative study of three deep learning models, including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) and four machine learning models as K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGB) to investigate the performance of SFP models on the BugHunter dataset. We employ the Lasso method for feature selection and apply the Synthetic Minority Oversampling Technique (SMOTE) to address the issue of imbalanced data, aiming to enhance the accuracy of the results. The experimental findings reveal that CNN and RNN outperformed other machine learning models, achieving the best overall performance.	vi_VN
dc.language.iso	en	vi_VN
dc.publisher	The Journal of Information and Communication	vi_VN
dc.subject	Software fault prediction	vi_VN
dc.subject	machine learning	vi_VN
dc.subject	BugHunter dataset	vi_VN
dc.title	Performance Analysis of Deep Learning Models for Software Fault Prediction Using the BugHunter Dataset	vi_VN
dc.title.alternative	Đánh giá các mô hình học sâu cho bài toán dự đoán lỗi phần mềm trên bộ dữ liệu BugHunter	vi_VN
dc.type	Working Paper	vi_VN
Appears in Collections:	NĂM 2025

Files in This Item:

Sign in to read

Show simple item record