Page 28 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 28

12


                          Evolutionary Generative Adversarial Network for

                                            Missing Data Imputation




                                                 1
                                                                    2
                                                                                       3
                                    Bao Ngoc Vi , Cao Truong Tran , Chi Cong Nguyen
                                      Institute of Informationand Communication Technology
                                        Le Quy Don Technical University, Hanoi, Vietnam;
                                                             2
                                  1  ngocvb@lqdtu.edu.vn,  truongct@lqdtu.edu.vn,
                                                 3 congnc@lqdtu.edu.vn




                           Abstract.  Generative  adversarial  networks  (GAN)  have  been  a  compelling
                           method for generating new data in data science industry. This generative model
                           has been accepted for data imputation in specific areas. However, existing GANs
                           (GAN and its variants) are likely to suffer from training problems such as insta-
                           bility and mode collapse. This paper proposes a new novel method for imputing
                           missing  data  by  adapting  GAN  and  Evolutionary  Computation  framework.
                           Therefore, the new methods is named  Evolutionary Generative Adversarial for
                           Imputation Data (EGAIN). EGAIN utilises the different training observations
                           with mutation, selection, and evolving process among a population of generator
                              . In this experiment, three different loss functions is used to validate the  output
                           of    and the training process of discriminator   . EGAIN is also tested on var-
                           ious  datasets  and  is  compared  with  state-of-the-art  imputation  method  for
                           illustrating its performance.

                           Keywords: Missing Data, Imputation, Generative Adversarial Network, Evolu-
                           tionary Computation


                     1     Introduction


                     Missing values where the values of some features are unknown and have presented as
                     one of the common issues in many real-world datasets. For instance, about 45\% of the
                     UCI  machine  learning  repository  [4]  often  encounters  with  the  missing  values  [6].
                     There are different causes of missing values. For example, in social surveys, when
                     respondents tended to deny to reply in specific questions, the collected datasets will be
                     incomplete  [19].  Furthermore,  medical  datasets  usually  contain  a  huge  number  of
                     missing values since it is extreme rare to achieve 100% of task completion on every
                     patient [8]. To address this problem, a number of studies are being conducted on miss-
                     ing data imputation which is a method to fill the empty portion of the data with plausible
                     data. There are a number of different machine learning and deep learning techniques.
                     Among  them,  the  most  frequently  presented  models  are  the  GAN  (generative
                     adversarial network)-based models, which are being studied in various ways these days.
                       A generative adversarial networks are machine learning models in which two neural
                     networks compete with each other in a zero-sum game. GANs have been successfully



                     CITA 2023                                                   ISBN: 978-604-80-8083-9
   23   24   25   26   27   28   29   30   31   32   33