Page 182 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 182
166
Secondly, the experimental process removes the last layer of image classification
of the above simple CNN (which used Softmax function), we keep only the feature
extraction classes. And then, we use XGBoost to classify the weather images. We
experimentally set the hyper-parameters of maximum tree depth equal 3, the
minimum child weight valued at one, and the number of estimators takes value of 100
in XGBoost algorithm. In addition, the initial prediction score of all instances, global
bias equal 0.5, and the model specified which booster to use booster is gbtree . The
subsample ratio of columns for each level is 1, and the subsample ratio of columns for
each split is also set at 1. The learning rate for boosting is set at 0.1, and the
regularization value of 1.
Thirdly, for the other comparative models, we kept the feature extraction classes at
layers as the same as above experiment with XGBoost. But at this stage we test them
with Support Vector Machine Classification, namely SVC classifier. The hyper
parameter specified the kernel type to be used in the algorithm
degree is valued at 3 and the gamma chose The regularization parameter is
equal at 1.0, and gamma is set at scale. In addition, we set the shrinking parameter is
at true, the size of the kernel cache is at 200, class weight is none, decision function of
shape is one-vs-rest.
In the next step, we take place to test with Decision Tree Classifier. The feature
extraction classes were kept at layers of previous simple CNN model with 3
Convolutional layers we mentioned above. The hyper parameters for Decision Tree
Classifier are as following values. The criterion is gini for the Gini impurity, as
function to measure the quality of a split. The minimum number of samples required
to split an internal node is at 2 and the maximum depth of the tree are expanded until
all leaves are pure or until all leaves contain less than 2 samples. The minimum
number of samples is set at 1 training samples, required to be at a leaf node, it leaves
at 1 training samples in each of the left and right branches. The minimum impurity
decrease is at 0 to let a node would be split if this split induces a decrease of the
impurity greater than or equal to 0. The class weight is set at none to let all classes
which were supposed to have weight one.
Next, we test with AdaBoost classifier as the same as XGBoost and other above
classifiers. The hyper parameters for AdaBoost classifier are followings. The base
estimator is same as Decision Tree Classifier initialized with max depth equals 1. The
maximum number of estimators at 50, we use the SAMME.R real boosting algorithm,
Since there is a trade-off between the learning rate and maximum estimators
parameters so we set learning rate at value of 1.0.
In next step, we test with Multi-layer Perceptron classifier as the same as XGBoost
and other above classifiers. The hyper parameters for Multi-layer Perceptron classifier
are hidden layer sizes that are equal 100, the activation function for the hidden layer is
. The used weight optimization is , the
initial learning rate used at 0.001; momentum for gradient descent update is at 0.9.
The maximum number of iterations is at 200, and we do not use the early stopping to
terminate training when validation score is not improving. The maximum number of
epochs to not meet improvement is set at 10.
CITA 2023 ISBN: 978-604-80-8083-9