Page 63 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)

P. 63

Duy Tran, Thang Le, Khoa Tran, Hoang Le, Cuong Do, Thanh Ha 47

architecture is currently restricted due to difficulties in implementation, and the 3D
dataset for facial beauty prediction is limited because of constraints in 3D data
collection.
An alternative approach is implementing CNN. CNN is a data-driven model that
automatically extracts features and learns the associations between rating scores and

Expr
Christopher Pramerdorfer and Martin Kampel discussed the limitation of using CNN
in this sort of task. CNN has a tendency to focus on local regions of an image, which
may not capture the global spatial information effectively [9]. This limitation is
particularly relevant as facial features often have a complex interplay, influencing the
overall face prediction. For instance, in certain East Asian cultures, the culturally
desirable traits may include larger eyes with a "double eyelid" and a more "V-shaped"
face with a slender jawline.
In contrast, Transformer-based models, as demonstrated by Dosovitskiy et al.
(2021), offer a solution to address both the challenges mentioned above. Not only was
ViT fully capable of capturing both local and global contextual information in images,
but it also demanded fewer computational resources for training compared to state-of-
the-art CNN [4], which is why a supervised pre-trained ViT is implemented in this
paper to overcome these limitations.

3 Method

3.1 Dataset

The availability of the SCUT-FBP5500 dataset has contributed to the development of
more accurate and efficient machine learning models for predicting facial beauty, with
potential applications in various industries and fields such as cosmetics, advertising,
and social media [10].
The SCUT-FBP5500 dataset contains 5,500 face images, with 2,500 images of males
and 3,000 images of females, with participants aged from 15 to 60. The dataset is split
into 2000 Asian males, 2000 Asian females, 750 Caucasian males, and 750 Caucasian
females.
The SCUT-FBP5500 dataset is evaluated by multiple human evaluators to ensure a
diverse and subjective assessment of facial beauty. A total of 60 evaluators individually
and independently rate the beauty of the facial images using a 5-point scale with 1 as
the smallest score. They consider various facial features such as skin complexion, facial
symmetry, attractiveness, and aesthetic appeal. The final beauty score for each image
is obtained as the average of the scores given by the evaluators, minimizing subjective
biases because different raters may have different perceptions of beauty.

ISBN: 978-604-80-8083-9 CITA 2023

58 59 60 61 62 63 64 65 66 67 68