Please use this identifier to cite or link to this item:
https://elib.vku.udn.vn/handle/123456789/3956
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ha, Thi Minh Phuong | - |
dc.contributor.author | Nguyen, Thanh Long | - |
dc.contributor.author | Nguyen, Thanh Binh | - |
dc.date.accessioned | 2024-07-29T03:04:35Z | - |
dc.date.available | 2024-07-29T03:04:35Z | - |
dc.date.issued | 2023-09 | - |
dc.identifier.isbn | 978-604-357-201-8 | - |
dc.identifier.uri | http://vap.ac.vn/Portals/0/TuyenTap/2024/2/21/64e13532907845ed9f5a2547dfec276f/33B_FAIR2023_paper_6739.pdf | - |
dc.identifier.uri | https://elib.vku.udn.vn/handle/123456789/3956 | - |
dc.description | Proceedings of the 16th National Scientific Conference on Fundamental and Applied It Research (FAIR-2023); pp: 258-265. | vi_VN |
dc.description.abstract | Software fault prediction (SFP) is the process of building models to predict faults in the early stage of software development. Prediction of software fault-prone modules can help developers allocate testing efforts more effectively and optimize maintenance cost. However, the performance of SFP models is influenced by the quality of software fault datasets. The irrelevant and redundant features of datasets may lead to negative impacts on the speed and accuracy of the trained models. Additionally, the presence of data imbalance that the number of faulty modules is significantly less than the number of non-faulty modules is the challenge in SFP. The study has applied 3 Generative adversarial networks (GAN) models including VanillaGAN, CTGAN and WGANGP along with 4 feature selection ranking methods including Chi-Squared, Information Gain, Fisher and Relief on four software fault datasets. The comparative analysis is performed by using 4 different classifiers to predict software faults. We have considered precision, recall, F1-score and Area Under the ROC (receiver operating characteristic curve) Curve (AUC) as performance evaluation metrics. The experimental results reveal that combinations of CTGAN, VanillaGAN and feature selection approaches outperformed the SFP models without applying data sampling and feature selection methods. The combinational pair of CTGAN and Relief demonstrated the best performance than other combinations with the highest average precision, recall, F1-score and AUC values of 0.857, 0.873, 0.856 and 0.767, respectively on Extra Tree. | vi_VN |
dc.language.iso | en | vi_VN |
dc.publisher | Publishing House for Science and Technology | vi_VN |
dc.subject | Software fault prediction | vi_VN |
dc.subject | Feature selection | vi_VN |
dc.subject | Data sampling | vi_VN |
dc.subject | Promise | vi_VN |
dc.title | A combination of feature selection and data sampling techniques for software fault prediction | vi_VN |
dc.type | Working Paper | vi_VN |
Appears in Collections: | NĂM 2023 |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.