Page 200 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 200

184


                     become  flexible  and  can  learn  the  features  of  the  image  at  various  scales.  The
                     Inception-V3  model  also  uses  common  techniques  such  as  regularization,  dropout,
                     and batch normalization to reduce overfitting during model training. Inception-V3 has
                     48 convolutional layers and 3 fully connected layers as follows: The input layer that
                     receives the input data is an image, and the pixel values are normalized to range from
                     -1 to 1. This is followed by 48 convolutional layers, with kernels of size 3x3 or 5x5,
                     stride=2.  In  the  max  pooling  layer,  the  model  applies  with  kernel  3x3,  stride=2  to
                     reduce the input size and retain important features of the image. The peculiarity of the
                     model lies in the Inception Module layer with many parallel branches. Each Inception
                     Module has 4 branches including The 1x1 Convolution branch uses a 1x1 size kernel
                     used  to  reduce  the  input  size  before  applying  other  convolution  layers.  3x3
                     Convolution  Branches  with  Different  Padding:  These  3x3  convolution  layers  are
                     applied with the same or valid padding to ensure the output size of these branches is
                     equal.  The  branch  uses  a  5x5  kernel  to  learn  features  at  a  greater  scale  than  3x3
                     convolutional layers. Branch Max Pooling with 1x1 Convolution, kernel size 3x3 to
                     reduce  the dimension  of  the output.  The  outputs  of  these  branches  are  then  joined
                     together in-depth using concatenation. After the adoption of several Inception layers,
                     an Average Pooling layer is used to reduce the size of the output and retain important
                     features. Finally, there are the Fully Connected, Softmax, and Dropout layers. For the
                     dataset in the paper, the prediction InceptionV3 model has an accuracy of 94.21%.


                     2.3   VGG-19 model

                     The  VGG-19  model  is  a  convolutional  neural  network  (CNN)  developed  by  the
                     Visual Geometry Group (VGG) research group at the University of Oxford. VGG-19
                     is  an  upgraded  version  of  the  VGG-16  model,  with  the  main  difference  being  the
                     number of layers. The architecture of VGG-19 is built from convolution layers and
                     pooling layers, including 16 convolutional layers and 3 fully connected layers. The
                     convolutional layers are divided into five groups, where each group has layers of the
                     same input and output dimensions. Before each group of convolutional layers, there is
                     a max pooling layer to reduce the size of the input. The convolutional layers in VGG-
                     19 use the Relu activation function, with filters of small size of 3x3 with a stride is 1
                     and padding is 1. After the convolution and max pooling layers are done, the features
                     are extracted from the image and put into fully connected layers with the neurons in
                     the network. Finally, the final layer uses the softmax function to classify the images
                     into different labels. For the dataset in the paper, the prediction VGG-19 model has an
                     accuracy of 94.73%.


                     2.4   ResNet152-V2 model

                     The  ResNet152-V2  model  was  developed  by  the  Microsoft  Research  team  and
                     published  in  2017  [23].  ResNet152-V2  is  an  improved  version  of  the  ResNet152
                     model, improving the learning of the model and reducing computational complexity.
                     The  ResNet152-V2  model  consists  of  152  layers  of  neural  networks,  including
                     convolutional  layers,  activation  layers,  pooling  layers,  and  full  connection  layers.




                     CITA 2023                                                   ISBN: 978-604-80-8083-9
   195   196   197   198   199   200   201   202   203   204   205