Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/3956
Full metadata record
DC FieldValueLanguage
dc.contributor.authorHa, Thi Minh Phuong-
dc.contributor.authorNguyen, Thanh Long-
dc.contributor.authorNguyen, Thanh Binh-
dc.date.accessioned2024-07-29T03:04:35Z-
dc.date.available2024-07-29T03:04:35Z-
dc.date.issued2023-09-
dc.identifier.isbn978-604-357-201-8-
dc.identifier.urihttp://vap.ac.vn/Portals/0/TuyenTap/2024/2/21/64e13532907845ed9f5a2547dfec276f/33B_FAIR2023_paper_6739.pdf-
dc.identifier.urihttps://elib.vku.udn.vn/handle/123456789/3956-
dc.descriptionProceedings of the 16th National Scientific Conference on Fundamental and Applied It Research (FAIR-2023); pp: 258-265.vi_VN
dc.description.abstractSoftware fault prediction (SFP) is the process of building models to predict faults in the early stage of software development. Prediction of software fault-prone modules can help developers allocate testing efforts more effectively and optimize maintenance cost. However, the performance of SFP models is influenced by the quality of software fault datasets. The irrelevant and redundant features of datasets may lead to negative impacts on the speed and accuracy of the trained models. Additionally, the presence of data imbalance that the number of faulty modules is significantly less than the number of non-faulty modules is the challenge in SFP. The study has applied 3 Generative adversarial networks (GAN) models including VanillaGAN, CTGAN and WGANGP along with 4 feature selection ranking methods including Chi-Squared, Information Gain, Fisher and Relief on four software fault datasets. The comparative analysis is performed by using 4 different classifiers to predict software faults. We have considered precision, recall, F1-score and Area Under the ROC (receiver operating characteristic curve) Curve (AUC) as performance evaluation metrics. The experimental results reveal that combinations of CTGAN, VanillaGAN and feature selection approaches outperformed the SFP models without applying data sampling and feature selection methods. The combinational pair of CTGAN and Relief demonstrated the best performance than other combinations with the highest average precision, recall, F1-score and AUC values of 0.857, 0.873, 0.856 and 0.767, respectively on Extra Tree.vi_VN
dc.language.isoenvi_VN
dc.publisherPublishing House for Science and Technologyvi_VN
dc.subjectSoftware fault predictionvi_VN
dc.subjectFeature selectionvi_VN
dc.subjectData samplingvi_VN
dc.subjectPromisevi_VN
dc.titleA combination of feature selection and data sampling techniques for software fault predictionvi_VN
dc.typeWorking Papervi_VN
Appears in Collections:NĂM 2023

Files in This Item:

 Sign in to read



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.