Page 29 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 29

Bao Ngoc Vi, Cao Truong Tran, Chi Cong Nguyen                                    13


                     applied in many applications such as human image synthesis, improving astronomical
                     images  and  inpainting  photographs  [7].  Recently,  GANs  has  also  been  applied  to
                     imputing missing values such as  GAIN [21], MisGan [15], and GAMIN [22].
                       GANs usually suffer from the vanishing gradient issue and mode collapse, which
                     leads to training a generator able to generate a limited diversity of samples, diverging
                     from the real-data distribution to be imitated. Thus, a huge number of recent works have

                     functions. In [5] the authors proposed the Evolutionary GAN (EGAN) model. In that
                     work,  a  population  of  generators  is  evolved  using  different  training  objectives  to


                     population of generators is reduced using a fitness function which takes into account
                     quality and diversity of the solutions.
                       In this work, we propose the model named Evolutionary Generative Adversarial for
                     Imputation Data (EGAIN), where the different training observations with mutation,
                     selection, and evolving process among a population of generator    are used. In this
                     experiment, three different loss functions are used to validate the  output of   . EGAIN
                     is also tested on various datasets and the experimental results show that our proposed
                     method is potential.



                     2     Related work



                     2.1   Missing data

                     Dealing with missing data is a common issue in empirical research. Data scientists en-
                     counter  various  types  of  missing  data,  such  as  Missing  Completely  At  Random
                     (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), which
                     are classified based on the mechanisms of missing data. MCAR occurs when the miss-
                     ingness is unrelated to the hypothetical value, values of other variables, or observed
                     records. In MAR, on the other hand, missing data points  are unrelated to the specific
                     missing values, but may depend on a subset of observed data. Lastly, MNAR is the
                     ultimate type of missing data, and it occurs when missing data points depend on both
                     hypothetical values and specific variable values.
                       To ensure the validity of experiments, researchers have developed methods to handle
                     missing data. One common approach is the deletion method, which involves deleting
                     incomplete features or samples during the imputation process. However, this method is
                     not suitable when the missing rate is high [17]. Furthermore, Imputation methods that
                     substitute missing values with plausible values are more effective [14]. These methods
                     leverage statistical analysis and mathematical models, such as maximum likelihood,
                     expectation maximization, regression imputation, multiple imputations, and sensitivity
                     analysis [14, 16]. Machine learning algorithms such as decision trees and k-nearest
                     neighbors are often used to impute missing values, but their effectiveness depends on
                     the quality and size of the dataset [13, 17].








                     ISBN: 978-604-80-8083-9                                                  CITA 2023
   24   25   26   27   28   29   30   31   32   33   34