Page 83 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)

P. 83

Fig. 1. YOLO algorithm diagram. Source: guidetomlandai.com

YOLO V3. YOLO-V3 has a similar architecture to YOLO-V2. The YOLO-V3
accelerates three times faster than the YOLO-V2 and thousands of times faster than the
R-CNN, hundreds of times faster than the Fast R-CNN. The improvements of the
YOLO-V3 include:
+ A new backbone: combine skip-connection into the backbone, increase the number
of Convolution layers.
+ Add Feature Pyramid Network, perform prediction at 3 scale.
+ New loss function.

YOLO V5. YOLO-V5 doesn't have too many changes compared to YOLO-v4, but
focuses on speed and ease of use. YOLOv5 proposes 4 versions:
- Yolov5-s: is the smallest version.
- Yolov5-m: medium version.
- Yolov5-l: large version.
- Yolov5-x: extra large version.
Other changes in YOLOv5 include:
+ Data enhancement: Mosaic enhancement, copy paste enhancement, mixed
enhancement (MixUp).
+ Loss function: Add a scale factor for Objectness Loss.
+ Anchor Box: Auto Anchor using GA.
+ Some other minor changes.

YOLO V7. The YOLO-V7 surpasses all object recognition models in both speed
and accuracy from 5 FPS to 160 FPS and achieves the highest accuracy with 56.8% AP
among all real-time object recognition models, clocked at 30 FPS. The YOLOV7-E6
(56 FPS on the V100, 55.9% AP) surpasses the backbone of the Cascade-Mask R-CNN
Transformer model or even high-end CNN backbones like the ConvNeXt-XL Cascade-
Mask R-CNN. YOLOv7 also surpasses YOLOR, YOLOX, Scaled-YOLOv4,
YOLOv5, as well as many other object recognition models both in terms of speed and
accuracy. Furthermore, YOLOv7 was trained on COCO from scratch without using any
training money [9].

ISBN: 978-604-80-8083-9 CITA 2023

78 79 80 81 82 83 84 85 86 87 88