Page 200 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 200
184
become flexible and can learn the features of the image at various scales. The
Inception-V3 model also uses common techniques such as regularization, dropout,
and batch normalization to reduce overfitting during model training. Inception-V3 has
48 convolutional layers and 3 fully connected layers as follows: The input layer that
receives the input data is an image, and the pixel values are normalized to range from
-1 to 1. This is followed by 48 convolutional layers, with kernels of size 3x3 or 5x5,
stride=2. In the max pooling layer, the model applies with kernel 3x3, stride=2 to
reduce the input size and retain important features of the image. The peculiarity of the
model lies in the Inception Module layer with many parallel branches. Each Inception
Module has 4 branches including The 1x1 Convolution branch uses a 1x1 size kernel
used to reduce the input size before applying other convolution layers. 3x3
Convolution Branches with Different Padding: These 3x3 convolution layers are
applied with the same or valid padding to ensure the output size of these branches is
equal. The branch uses a 5x5 kernel to learn features at a greater scale than 3x3
convolutional layers. Branch Max Pooling with 1x1 Convolution, kernel size 3x3 to
reduce the dimension of the output. The outputs of these branches are then joined
together in-depth using concatenation. After the adoption of several Inception layers,
an Average Pooling layer is used to reduce the size of the output and retain important
features. Finally, there are the Fully Connected, Softmax, and Dropout layers. For the
dataset in the paper, the prediction InceptionV3 model has an accuracy of 94.21%.
2.3 VGG-19 model
The VGG-19 model is a convolutional neural network (CNN) developed by the
Visual Geometry Group (VGG) research group at the University of Oxford. VGG-19
is an upgraded version of the VGG-16 model, with the main difference being the
number of layers. The architecture of VGG-19 is built from convolution layers and
pooling layers, including 16 convolutional layers and 3 fully connected layers. The
convolutional layers are divided into five groups, where each group has layers of the
same input and output dimensions. Before each group of convolutional layers, there is
a max pooling layer to reduce the size of the input. The convolutional layers in VGG-
19 use the Relu activation function, with filters of small size of 3x3 with a stride is 1
and padding is 1. After the convolution and max pooling layers are done, the features
are extracted from the image and put into fully connected layers with the neurons in
the network. Finally, the final layer uses the softmax function to classify the images
into different labels. For the dataset in the paper, the prediction VGG-19 model has an
accuracy of 94.73%.
2.4 ResNet152-V2 model
The ResNet152-V2 model was developed by the Microsoft Research team and
published in 2017 [23]. ResNet152-V2 is an improved version of the ResNet152
model, improving the learning of the model and reducing computational complexity.
The ResNet152-V2 model consists of 152 layers of neural networks, including
convolutional layers, activation layers, pooling layers, and full connection layers.
CITA 2023 ISBN: 978-604-80-8083-9