Page 199 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)

P. 199

Cong Tung Dinh, Thu Huong Nguyen, Huyen Do Thi, Nam Anh Bui 183

It is obviously difficult to evaluate and compare methods with each other because
they are not tested on the same dataset and authors often do not publish source code
details. For the purpose of comparing, evaluating as well as proposing suitable deep
learning models, and fire detection applications based on images from cameras, in this
paper, we study some popular deep learning models Xception, Inception-V3, VGG-19
and ResNet152-V2. Next, forest fire detection methods based on these models were
installed and finally tested on the same large dataset of images collected from the
camera. The test results show that these methods are all capable of good detection and
the method based on ResNet152-V2 achieves the highest accuracy of over 95%.
In the next section, we present deep learning models that apply fire detection based
on surveillance cameras. Part 2 describes the steps of data processing. Part 3 presents
the experiment and the results achieved. Finally, part 4 is conclusive.

2 Some deep learning models that detect forest fire

2.1 Xception model

The Xception network is a deep neural network architecture introduced by researchers
in the paper "Xception: Deep Learning with Depthwise Separable Convolutions" in
2016 [21]. Xception was developed from the Inception network architecture to
improve and enhance performance. One of the special highlights of Xception is the
use of an individual convolutional structure on each feature before performing total
convolution. This helps the network learn the correlation between features. This
approach reduces the number of parameters and calculations in the network, avoids
overfitting, and improves model performance and accuracy. The specifics of the
layers in Xception can be described as follows: The input layer receives a fixed-sized
image with the parameter being the size of the input image. In the next two CONV
layers, there are 32 3x3 filters and 64 3x3 filters, respectively. Next, the most
important layer in Xception is the Depthwise Separable Convolution Block consisting
of two stages. The first stage (Depthwise) is the individual convolution of features on
each image channel, similar to traditional convolution. However, here each channel is
handled independently of the others. Xceptinon then combines features from different
channels to create new features (Separable). The next layer is Average Pooling with a
size of 10x10 to help reduce the number of parameters and avoid overfitting. Finally,
there is the fully connected layer and the output depends on the number of layers of
the classification problem. For the dataset in the paper, Xception predicts up to
94.26% accuracy.

2.2 Inception-V3 model

The Inception-V3 model is a deep neural network architecture developed and
published in 2015 [22]. InceptionV3 is an improved version of earlier versions such
as Inception and Inception-V2. The special feature of Inception-V3 is the use of the
Inception module, a module with many parallel branches that allows the model to

ISBN: 978-604-80-8083-9 CITA 2023

194 195 196 197 198 199 200 201 202 203 204