Page 30 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 30

14


                       Imputation methods can be classified into two categories: single imputation [17] and
                     multiple imputation [18]. Single imputation creates a complete dataset from an incom-
                     plete one, relying on specific assumptions about the missing values rather than the type
                     of missing data. These assumptions are not always applicable or accurate and can lead
                     to biased results. On the other hand, multiple imputation methods are a robust approach
                     for reducing imputation bias. They generate multiple datasets from an incomplete da-
                     taset, introducing random values to restore the lost randomness. By reducing random-
                     ness, statistical analyses based on the distribution of the imputed values become more
                     reliable. Multiple imputation methods offer greater flexibility and can be applied in a
                     wide range of scenarios.


                     2.2   Generative Adversarial Networks for Data Imputation

                     Generative adversarial networks (GANs) have been a subject of debate in recent years
                     due to their potential in data synthesis. Comprised of a generator and discriminator,
                     GANs  are  trained  in  an  adversarial  manner,  with  both  models  being  implemented
                     through neural networks. GANs have been successfully applied to various fields such
                     as image processing and computer vision, natural language processing, and medicine
                     [7].
                       While GANs have primarily been utilized for data synthesis, researchers have also
                     explored  their  application  in data imputation.  Initially, GAN-based data imputation
                     methods were proposed for image completion tasks [23, 24]. However, these models
                     applied in image inpainting only. However, more recent publications have focused on
                     data imputation in general, such as GAIN [21], MisGan [15], and GAMIN [22].
                       The GAIN model (Generative Adversarial Imputation Nets) model [21] treats the
                     generator as an imputer and employs the discriminator to determine whether each com-
                     ponent of an input has been imputed or not. This algorithm performs well on low-di-
                     mensional datasets with a low missing rate and even shows promise on MNIST datasets
                     with a 50\% missing rate. However, its performance diminishes when confronted with
                     higher missing rates, tending to converge towards zero or mean imputation.
                       On the other hand, MisGAN model [15]  demonstrates better performance on da-
                     tasets with a high missing rate. This approach involves a GAN architecture specifically
                     designed for missing datasets, consisting of two pairs of generators. One pair generates
                     a mask to indicate missing components, while the other generates synthetic complete
                     data. The synthetic complete data is then combined with the mask to create the synthetic
                     missing data. A data discriminator is utilized to distinguish between real and synthetic
                     missing data. Additionally, another generator and discriminator pair are used for data
                     imputation. The imputation generator aims to fool the corresponding discriminator by
                     generating imputed data that is indistinguishable from real complete data.
                       The  GAMIN  (Generative  Adversarial  Multiple  Imputation  Network)[22]  is  pro-
                     posed as a solution for multiple imputation in highly missing data scenarios. Inspired
                     by MisGAN, GAMIN introduces several modifications. Firstly, the imputation archi-
                     tecture is altered to directly incorporate the data generator into the imputation process.
                     Secondly, a novel confidence prediction and top-k imputation strategy is introduced.
                     Lastly, GAMIN employs new loss functions that consider confidence during training.




                     CITA 2023                                                   ISBN: 978-604-80-8083-9
   25   26   27   28   29   30   31   32   33   34   35