An Investigation on Vietnamese Credit Scoring Based on Big Data Platform and Ensemble Learning

Tran, Quang Linh; Duong, Van Binh; Lam, Gia Huy; Vuong, Cong Dat; Do, Trong Hop

Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/3194

Full metadata record

DC Field	Value	Language
dc.contributor.author	Tran, Quang Linh	-
dc.contributor.author	Duong, Van Binh	-
dc.contributor.author	Lam, Gia Huy	-
dc.contributor.author	Vuong, Cong Dat	-
dc.contributor.author	Do, Trong Hop	-
dc.date.accessioned	2023-10-05T09:23:20Z	-
dc.date.available	2023-10-05T09:23:20Z	-
dc.date.issued	2022-08	-
dc.identifier.isbn	978-3-031-15063-0 (e)	-
dc.identifier.uri	https://doi.org/10.1007/978-3-031-15063-0_27	-
dc.identifier.uri	http://elib.vku.udn.vn/handle/123456789/3194	-
dc.description	International Conference on Intelligence of Things (ICIT 2022); Lecture Notes on Data Engineering and Communications Technologies, Vol.148; pp: 289-298	vi_VN
dc.description.abstract	The credit score is a vital indicator that can affect many aspects of people’s lives. However, evaluating credit scores is done manually, so it costs a large amount of money and time. This paper learns from disadvantages of previous research and brings some insights and empirical experiments so as to the advantages of distributed solutions for the problem of credit score in the future. The research compares some feature engineering techniques using a big data platform and ensemble learning methods to find the best solution for predicting the credit score. Since data related to customers’ financial activities grows enormously, a big data platform is necessary to handle this amount of data. In this paper, Spark which is a distributed, data processing framework, is used to save and process data. Some experiments are carried out to compare the effectiveness of feature engineering in this problem. Moreover, a comparative study about the performance of ensemble learning models is also given in this paper. A real-world Vietnamese credit scoring data set is used to develop and evaluate models. Four metrics are used to evaluate the performance of credit scoring models, namely F1-score, recall, precision, and accuracy. The results are promising with the highest accuracy of 72.9% in the combination Gradient-boosted Tree and cleaned data set with removing categorical features. This paper is a foundation for using big data platforms to handle financial data and much future research can be carried out to optimize the performance of this paper.	vi_VN
dc.language.iso	en	vi_VN
dc.publisher	Springer Nature	vi_VN
dc.subject	Credit scoring	vi_VN
dc.subject	Big data	vi_VN
dc.subject	Ensemble learning	vi_VN
dc.subject	Feature engineering	vi_VN
dc.title	An Investigation on Vietnamese Credit Scoring Based on Big Data Platform and Ensemble Learning	vi_VN
dc.type	Working Paper	vi_VN
Appears in Collections:	NĂM 2022

Files in This Item:

Sign in to read

Show simple item record