Page 29 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 29
Bao Ngoc Vi, Cao Truong Tran, Chi Cong Nguyen 13
applied in many applications such as human image synthesis, improving astronomical
images and inpainting photographs [7]. Recently, GANs has also been applied to
imputing missing values such as GAIN [21], MisGan [15], and GAMIN [22].
GANs usually suffer from the vanishing gradient issue and mode collapse, which
leads to training a generator able to generate a limited diversity of samples, diverging
from the real-data distribution to be imitated. Thus, a huge number of recent works have
functions. In [5] the authors proposed the Evolutionary GAN (EGAN) model. In that
work, a population of generators is evolved using different training objectives to
population of generators is reduced using a fitness function which takes into account
quality and diversity of the solutions.
In this work, we propose the model named Evolutionary Generative Adversarial for
Imputation Data (EGAIN), where the different training observations with mutation,
selection, and evolving process among a population of generator are used. In this
experiment, three different loss functions are used to validate the output of . EGAIN
is also tested on various datasets and the experimental results show that our proposed
method is potential.
2 Related work
2.1 Missing data
Dealing with missing data is a common issue in empirical research. Data scientists en-
counter various types of missing data, such as Missing Completely At Random
(MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), which
are classified based on the mechanisms of missing data. MCAR occurs when the miss-
ingness is unrelated to the hypothetical value, values of other variables, or observed
records. In MAR, on the other hand, missing data points are unrelated to the specific
missing values, but may depend on a subset of observed data. Lastly, MNAR is the
ultimate type of missing data, and it occurs when missing data points depend on both
hypothetical values and specific variable values.
To ensure the validity of experiments, researchers have developed methods to handle
missing data. One common approach is the deletion method, which involves deleting
incomplete features or samples during the imputation process. However, this method is
not suitable when the missing rate is high [17]. Furthermore, Imputation methods that
substitute missing values with plausible values are more effective [14]. These methods
leverage statistical analysis and mathematical models, such as maximum likelihood,
expectation maximization, regression imputation, multiple imputations, and sensitivity
analysis [14, 16]. Machine learning algorithms such as decision trees and k-nearest
neighbors are often used to impute missing values, but their effectiveness depends on
the quality and size of the dataset [13, 17].
ISBN: 978-604-80-8083-9 CITA 2023