Page 83 - Kỷ yếu hội thảo khoa học lần thứ 12 - Công nghệ thông tin và Ứng dụng trong các lĩnh vực (CITA 2023)
P. 83

67




















                                   Fig. 1. YOLO algorithm diagram. Source: guidetomlandai.com


                     YOLO    V3.  YOLO-V3  has  a  similar  architecture  to  YOLO-V2.  The  YOLO-V3
                     accelerates three times faster than the YOLO-V2 and thousands of times faster than the
                     R-CNN,  hundreds  of  times  faster  than  the  Fast  R-CNN.  The  improvements  of  the
                     YOLO-V3 include:
                       + A new backbone: combine skip-connection into the backbone, increase the number
                     of Convolution layers.
                       + Add Feature Pyramid Network, perform prediction at 3 scale.
                       + New loss function.



                     YOLO   V5.  YOLO-V5 doesn't have too many changes compared to YOLO-v4, but
                     focuses on speed and ease of use. YOLOv5 proposes 4 versions:
                       - Yolov5-s: is the smallest version.
                       - Yolov5-m: medium version.
                       - Yolov5-l: large version.
                       - Yolov5-x: extra   large version.
                       Other changes in YOLOv5 include:
                          + Data enhancement: Mosaic enhancement, copy   paste enhancement, mixed
                     enhancement (MixUp).
                          + Loss function: Add a scale factor for Objectness Loss.
                          + Anchor Box: Auto Anchor using GA.
                          + Some other minor changes.



                     YOLO   V7.  The YOLO-V7 surpasses all object recognition models in both speed
                     and accuracy from 5 FPS to 160 FPS and achieves the highest accuracy with 56.8% AP
                     among all real-time object recognition models, clocked at 30 FPS. The YOLOV7-E6
                     (56 FPS on the V100, 55.9% AP) surpasses the backbone of the Cascade-Mask R-CNN
                     Transformer model or even high-end CNN backbones like the ConvNeXt-XL Cascade-
                     Mask  R-CNN.  YOLOv7  also  surpasses  YOLOR,  YOLOX,  Scaled-YOLOv4,
                     YOLOv5, as well as many other object recognition models both in terms of speed and
                     accuracy. Furthermore, YOLOv7 was trained on COCO from scratch without using any
                     training money [9].





                     ISBN: 978-604-80-8083-9                                                  CITA 2023
   78   79   80   81   82   83   84   85   86   87   88