Page 61 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 61

Duy Tran, Thang Le, Khoa Tran, Hoang Le, Cuong Do, Thanh Ha                      45


                          Facial Beauty Prediction with Vision Transformer




                                               1*
                                   1*
                                                                        1*
                                                            1*
                                                                                     1*
                         Duy Tran , Thang Le , Khoa Tran , Hoang Le , Cuong Do , Thanh Ha        1*
                                              * These authors contributed equally to this work
                            1  The George Washington Institute of Data Science and Artificial Intelligence,
                                      The International Society of Data Scientists, MA, USA
                              {duy.tran, thang.le, khoa.tran, hoang.le, cuong.do,
                                                  thanh.ha}@isods.org



                            Abstract. In addition to its use in the realm of plastic surgery and aesthetics,
                            Facial Beauty Prediction technology also has applications in other areas, such as
                            advertising  and  social  media,  where  it  can  be  used  to  optimize  marketing
                            strategies  and  help  individuals  enhance  their  online  presence.  There  are
                            applications in other areas as well, such as advertising and social media. This
                            study introduces an effective approach to evaluate human face beauty using a
                            transformer-based architecture. While Convolutional Neural Network (CNN) is
                            a conventional method for this task, our experimental results demonstrate that
                            our Vision Transformer (ViT) based model outperforms the other two effective
                            baselines,  VGGNet  and  ResNet50,  in  evaluating  human  face  beauty  on  the
                            widely-used  benchmark  dataset  SCUT-FPB  5500.  Our  ViT-based  model
                            demonstrates superior performance in Mean Absolute Error (MAE) and Mean
                            Squared Error (MSE) compared to VGG16 and ResNet-50, despite employing a
                            simple  data  pipeline  without  any  data  augmentation.  Our  study suggests  that
                            transformer-based  architectures  offer  a  more  effective  means  of  evaluating
                            human beauty and open new avenues for further research in this field.


                            Keywords:  Artificial  Intelligence,  Vision  Transformer,  Facial  features
                            extraction, VGG-16, Resnet-50, ViT-based-16-2k.


                     1      Introduction


                     Facial Beauty Prediction is a Computer Vision task utilizing machine learning methods
                     to  process  and  analyze  facial  features,  attributes,  and  landmarks.  These  extracted
                     features are then used as input to machine learning algorithms, such as deep neural
                     networks, to learn the relationship between face beauty and facial patterns. The output
                     of  beauty  prediction  algorithms  is  usually  a  numerical  score  representing  facial
                     attractiveness.
                       In the present study, most state-of-the-art performances for facial beauty prediction
                     are achieved by Convolutional Neural Networks (CNN). Xu et al. (2018) proposed


                     VGG16-feature extraction and a Bayesian Ridge regression to achieve relatively high
                     performance  on  SCUT-FBP5500  with  a  Mean  Absolute  Error  score  of  0.2595[1].
                     Following the success of CNN, Bougourzi et al. (2021) experimented with a variety of




                     ISBN: 978-604-80-8083-9                                                  CITA 2023
   56   57   58   59   60   61   62   63   64   65   66