Page 28 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 28
12
Evolutionary Generative Adversarial Network for
Missing Data Imputation
1
2
3
Bao Ngoc Vi , Cao Truong Tran , Chi Cong Nguyen
Institute of Informationand Communication Technology
Le Quy Don Technical University, Hanoi, Vietnam;
2
1 ngocvb@lqdtu.edu.vn, truongct@lqdtu.edu.vn,
3 congnc@lqdtu.edu.vn
Abstract. Generative adversarial networks (GAN) have been a compelling
method for generating new data in data science industry. This generative model
has been accepted for data imputation in specific areas. However, existing GANs
(GAN and its variants) are likely to suffer from training problems such as insta-
bility and mode collapse. This paper proposes a new novel method for imputing
missing data by adapting GAN and Evolutionary Computation framework.
Therefore, the new methods is named Evolutionary Generative Adversarial for
Imputation Data (EGAIN). EGAIN utilises the different training observations
with mutation, selection, and evolving process among a population of generator
. In this experiment, three different loss functions is used to validate the output
of and the training process of discriminator . EGAIN is also tested on var-
ious datasets and is compared with state-of-the-art imputation method for
illustrating its performance.
Keywords: Missing Data, Imputation, Generative Adversarial Network, Evolu-
tionary Computation
1 Introduction
Missing values where the values of some features are unknown and have presented as
one of the common issues in many real-world datasets. For instance, about 45\% of the
UCI machine learning repository [4] often encounters with the missing values [6].
There are different causes of missing values. For example, in social surveys, when
respondents tended to deny to reply in specific questions, the collected datasets will be
incomplete [19]. Furthermore, medical datasets usually contain a huge number of
missing values since it is extreme rare to achieve 100% of task completion on every
patient [8]. To address this problem, a number of studies are being conducted on miss-
ing data imputation which is a method to fill the empty portion of the data with plausible
data. There are a number of different machine learning and deep learning techniques.
Among them, the most frequently presented models are the GAN (generative
adversarial network)-based models, which are being studied in various ways these days.
A generative adversarial networks are machine learning models in which two neural
networks compete with each other in a zero-sum game. GANs have been successfully
CITA 2023 ISBN: 978-604-80-8083-9