Small Object Detection Without Attention for Aerial Surveillance

Choi, Yehwan; Nguyen, Duy Linh; Vo, Xuan Thuy; Hyun Jo, Kang

Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/4294

Full metadata record

DC Field	Value	Language
dc.contributor.author	Choi, Yehwan	-
dc.contributor.author	Nguyen, Duy Linh	-
dc.contributor.author	Vo, Xuan Thuy	-
dc.contributor.author	Hyun Jo, Kang	-
dc.date.accessioned	2024-12-06T08:59:19Z	-
dc.date.available	2024-12-06T08:59:19Z	-
dc.date.issued	2024-11	-
dc.identifier.isbn	978-3-031-74126-5	-
dc.identifier.uri	https://elib.vku.udn.vn/handle/123456789/4294	-
dc.identifier.uri	https://doi.org/10.1007/978-3-031-74127-2_31	-
dc.description	Lecture Notes in Networks and Systems (LNNS,volume 882); The 13th Conference on Information Technology and Its Applications (CITA 2024) ; pp: 372-383.	vi_VN
dc.description.abstract	This paper introduces the development of an essential deep-learning model for surveillance systems utilizing high-mounted CCTV or drones. Objects seen from elevated angles often look smaller and may appear at different angles compared to ground-level observations. To improve the detection of small objects, we propose a network incorporating an element-wise multiplication module based on the vanilla Vision Transformer (ViT) architecture. However, traditional transformer models need significant computational resources, which may not be practical for edge devices like CCTV cameras or drones. Therefore, we apply the Attention-Free Transformer (AFT) to reduce computational requirements enabling real-time operation on low-capacity devices. We validate the performance by combining ViT and AFT with the YOLOv5 real-time object detection model. Practical applicability is confirmed by implementing it on the low-capacity device named ODROID H3+. Validation datasets include Autonomous Driving Drone, VisDrone, AerialMaritime, and PKLot, all containing numerous small-sized objects. Experimental results on the VisDrone dataset show that YOLOv5 nano + AFT reduces parameter count by 4.6% while increasing accuracy by 1%, making it an efficient network. The model size is suitable for edge device implementation at 3.7 MB. Similarly, Aerial Maritime and PKLot datasets indicate a decreased amount of parameters and increased accuracy. Hence, the proposed deep learning model is applicable for aerial surveillance systems.	vi_VN
dc.language.iso	en	vi_VN
dc.publisher	Springer Nature	vi_VN
dc.subject	To improve the detection of small objects, we propose a network incorporating an element-wise multiplication module based on the vanilla Vision Transformer (ViT) architecture	vi_VN
dc.subject	However, traditional transformer models need significant computational resources, which may not be practical for edge devices like CCTV cameras or drones	vi_VN
dc.title	Small Object Detection Without Attention for Aerial Surveillance	vi_VN
dc.type	Working Paper	vi_VN
Appears in Collections:	CITA 2024 (International)

Files in This Item:

Sign in to read

Show simple item record