Page 63 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 63

Duy Tran, Thang Le, Khoa Tran, Hoang Le, Cuong Do, Thanh Ha                      47


                     architecture is currently restricted due to difficulties in implementation, and the 3D
                     dataset  for  facial  beauty  prediction  is  limited  because  of  constraints  in  3D  data
                     collection.
                       An alternative approach is implementing CNN. CNN is a data-driven model that
                     automatically extracts features and learns the associations between rating scores and

                     Expr
                     Christopher Pramerdorfer and Martin Kampel discussed the limitation of using CNN
                     in this sort of task. CNN has a tendency to focus on local regions of an image, which
                     may  not  capture  the  global  spatial  information  effectively  [9].  This  limitation  is
                     particularly relevant as facial features often have a complex interplay, influencing the
                     overall  face  prediction.  For  instance,  in  certain  East  Asian  cultures,  the  culturally
                     desirable traits may include larger eyes with a "double eyelid" and a more "V-shaped"
                     face with a slender jawline.
                       In  contrast,  Transformer-based  models,  as  demonstrated  by  Dosovitskiy  et  al.
                     (2021), offer a solution to address both the challenges mentioned above. Not only was
                     ViT fully capable of capturing both local and global contextual information in images,
                     but it also demanded fewer computational resources for training compared to state-of-
                     the-art CNN [4], which is why a supervised pre-trained ViT is implemented in this
                     paper to overcome these limitations.

                     3      Method


                     3.1    Dataset


                     The availability of the SCUT-FBP5500 dataset has contributed to the development of
                     more accurate and efficient machine learning models for predicting facial beauty, with
                     potential applications in various industries and fields such as cosmetics, advertising,
                     and social media [10].
                       The SCUT-FBP5500 dataset contains 5,500 face images, with 2,500 images of males
                     and 3,000 images of females, with participants aged from 15 to 60. The dataset is split
                     into 2000 Asian males, 2000 Asian females, 750 Caucasian males, and 750 Caucasian
                     females.
                       The SCUT-FBP5500 dataset is evaluated by multiple human evaluators to ensure a
                     diverse and subjective assessment of facial beauty. A total of 60 evaluators individually
                     and independently rate the beauty of the facial images using a 5-point scale with 1 as
                     the smallest score. They consider various facial features such as skin complexion, facial
                     symmetry, attractiveness, and aesthetic appeal. The final beauty score for each image
                     is obtained as the average of the scores given by the evaluators, minimizing subjective
                     biases because different raters may have different perceptions of beauty.













                     ISBN: 978-604-80-8083-9                                                  CITA 2023
   58   59   60   61   62   63   64   65   66   67   68