Page 61 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)

P. 61

Duy Tran, Thang Le, Khoa Tran, Hoang Le, Cuong Do, Thanh Ha 45

Facial Beauty Prediction with Vision Transformer

1*
1*
1*
1*
1*
Duy Tran , Thang Le , Khoa Tran , Hoang Le , Cuong Do , Thanh Ha 1*
* These authors contributed equally to this work
1 The George Washington Institute of Data Science and Artificial Intelligence,
The International Society of Data Scientists, MA, USA
{duy.tran, thang.le, khoa.tran, hoang.le, cuong.do,
thanh.ha}@isods.org

Abstract. In addition to its use in the realm of plastic surgery and aesthetics,
Facial Beauty Prediction technology also has applications in other areas, such as
advertising and social media, where it can be used to optimize marketing
strategies and help individuals enhance their online presence. There are
applications in other areas as well, such as advertising and social media. This
study introduces an effective approach to evaluate human face beauty using a
transformer-based architecture. While Convolutional Neural Network (CNN) is
a conventional method for this task, our experimental results demonstrate that
our Vision Transformer (ViT) based model outperforms the other two effective
baselines, VGGNet and ResNet50, in evaluating human face beauty on the
widely-used benchmark dataset SCUT-FPB 5500. Our ViT-based model
demonstrates superior performance in Mean Absolute Error (MAE) and Mean
Squared Error (MSE) compared to VGG16 and ResNet-50, despite employing a
simple data pipeline without any data augmentation. Our study suggests that
transformer-based architectures offer a more effective means of evaluating
human beauty and open new avenues for further research in this field.

Keywords: Artificial Intelligence, Vision Transformer, Facial features
extraction, VGG-16, Resnet-50, ViT-based-16-2k.

1 Introduction

Facial Beauty Prediction is a Computer Vision task utilizing machine learning methods
to process and analyze facial features, attributes, and landmarks. These extracted
features are then used as input to machine learning algorithms, such as deep neural
networks, to learn the relationship between face beauty and facial patterns. The output
of beauty prediction algorithms is usually a numerical score representing facial
attractiveness.
In the present study, most state-of-the-art performances for facial beauty prediction
are achieved by Convolutional Neural Networks (CNN). Xu et al. (2018) proposed

VGG16-feature extraction and a Bayesian Ridge regression to achieve relatively high
performance on SCUT-FBP5500 with a Mean Absolute Error score of 0.2595[1].
Following the success of CNN, Bougourzi et al. (2021) experimented with a variety of

ISBN: 978-604-80-8083-9 CITA 2023

56 57 58 59 60 61 62 63 64 65 66