Page 182 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 182

166


                       Secondly, the experimental process removes the last layer of image classification
                     of the above simple CNN (which used Softmax function), we keep only the feature
                     extraction  classes.  And  then,  we  use  XGBoost  to  classify  the  weather  images.  We
                     experimentally  set  the  hyper-parameters  of  maximum  tree  depth  equal  3,  the
                     minimum child weight valued at one, and the number of estimators takes value of 100
                     in XGBoost algorithm. In addition, the initial prediction score of all instances, global
                     bias equal 0.5, and the model specified which booster to use booster is  gbtree . The
                     subsample ratio of columns for each level is 1, and the subsample ratio of columns for
                     each  split  is  also  set  at  1.  The  learning  rate  for  boosting  is  set  at  0.1,  and  the
                     regularization                                       value of 1.
                       Thirdly, for the other comparative models, we kept the feature extraction classes at
                     layers as the same as above experiment with XGBoost. But at this stage we test them
                     with  Support  Vector  Machine  Classification,  namely  SVC  classifier.  The  hyper
                     parameter specified the kernel type to be used in the algorithm
                     degree is valued at 3 and the gamma chose           The regularization parameter is
                     equal at 1.0, and gamma is set at scale. In addition, we set the shrinking parameter is
                     at true, the size of the kernel cache is at 200, class weight is none, decision function of
                     shape is one-vs-rest.
                       In the next step, we take place to test with Decision Tree Classifier. The feature
                     extraction  classes  were  kept  at  layers  of  previous  simple  CNN  model  with  3
                     Convolutional layers we mentioned above. The hyper parameters for Decision Tree
                     Classifier  are  as  following  values.  The  criterion  is  gini   for  the  Gini  impurity,  as
                     function to measure the quality of a split. The minimum number of samples required
                     to split an internal node is at 2 and the maximum depth of the tree are expanded until
                     all  leaves  are  pure  or  until  all  leaves  contain  less  than  2  samples.  The  minimum
                     number of samples is set at 1 training samples, required to be at a leaf node, it leaves
                     at 1 training samples in each of the left and right branches. The minimum impurity
                     decrease is at  0 to  let a  node would be split if  this split  induces a decrease  of the
                     impurity greater than or equal to 0. The class weight is set at none to let all classes
                     which were supposed to have weight one.
                       Next, we test with AdaBoost classifier as the same as XGBoost and other above
                     classifiers.  The  hyper  parameters  for  AdaBoost  classifier  are  followings.  The  base
                     estimator is same as Decision Tree Classifier initialized with max depth equals 1. The
                     maximum number of estimators at 50, we use the SAMME.R real boosting algorithm,
                     Since  there  is  a  trade-off  between  the  learning  rate  and  maximum  estimators
                     parameters so we set learning rate at value of 1.0.
                       In next step, we test with Multi-layer Perceptron classifier as the same as XGBoost
                     and other above classifiers. The hyper parameters for Multi-layer Perceptron classifier
                     are hidden layer sizes that are equal 100, the activation function for the hidden layer is
                                                           . The used weight optimization is       , the
                     initial learning rate used at 0.001; momentum for gradient descent update is at 0.9.
                     The maximum number of iterations is at 200, and we do not use the early stopping to
                     terminate training when validation score is not improving. The maximum number of
                     epochs to not meet improvement is set at 10.






                     CITA 2023                                                   ISBN: 978-604-80-8083-9
   177   178   179   180   181   182   183   184   185   186   187