Page 65 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 65

Duy Tran, Thang Le, Khoa Tran, Hoang Le, Cuong Do, Thanh Ha                      49


                     stacked  with  self-attention  layers,  enables  the  model  to  capture  long-range
                     dependencies among various aspects of the face image. The output feature vector from
                     the transformer encoder, representing the holistic face image, feeds into the MLP head,
                     which produces the predicted beauty score for evaluation and backpropagation.


                     3.2.3   Loss function

                     The loss function was employed to assess the performance of the models on the SCUT-
                     FBP5500 dataset integrates two critical metrics: Mean Squared Error (MSE) and Mean
                     Absolute  Error  (MAE).  These  metrics  offer  a quantitative measure  of the disparity
                     between the predicted facial beauty scores ( )  from the MLP and the corresponding
                     ground truth scores ( )  for each image ( ) in the dataset.
                       The MSE is computed by averaging the squared differences between the predicted
                     and ground truth beauty scores.








                     The symbol 'n' is the total number of instances in the dataset. The squared term in MSE
                     amplifies larger prediction errors, making it more sensitive to significant discrepancies
                     between predicted and ground truth scores.
                       Conversely,  the  MAE  is  the  average  of  the  absolute  differences  between  the
                     predicted and ground truth beauty scores.








                     The MAE calculates the average of the absolute differences between the predicted and
                     actual beauty scores. Unlike MSE, MAE is equally sensitive to all differences between
                     the predicted and actual scores, irrespective of their size.
                       Employing  both  MSE  and  MAE  in  the  loss  function  offers  a  comprehensive
                     evaluation of the model's performance from unique perspectives. Smaller MSE and
                     MAE values indicate superior model performance, denoting lesser deviations between
                     the  predicted  and  actual  beauty  scores,  thus  indicating  higher  model  precision  and
                     accuracy.  Additionally,  the  Pearson  correlation  coefficient  is  employed  to  offer
                     additional insight into the model's performance.


                     4      Experiments and Result


                     Although the  ViT  does  require  more computational  resources  to  train,  the  superior
                     performance  and  faster  convergence  time  make  it  a  highly  effective  and  efficient
                     choice. Table 1 illustrates the results of our experiments using the 3 models.




                     ISBN: 978-604-80-8083-9                                                  CITA 2023
   60   61   62   63   64   65   66   67   68   69   70