YOLO-SA: An Efficient Object Detection Model Based on Self-attention Mechanism
Paper
Paper/Presentation Title | YOLO-SA: An Efficient Object Detection Model Based on Self-attention Mechanism |
---|---|
Presentation Type | Paper |
Authors | Li, Ang, Song, Xiangyu, Sun, ShiJie, Zhang, Zhaoyang, Cai, Taotao and Song, Huansheng |
Journal or Proceedings Title | Proceedings of the 7th International Joint Conference on Asia-Pacific Web and Web-Age Information Management (APWeb-WAIM 2023) |
Journal Citation | 14334, pp. 1-15 |
Number of Pages | 15 |
Year | 2024 |
Publisher | Springer |
Place of Publication | Singapore |
ISBN | 9789819724215 |
9789819724208 | |
Digital Object Identifier (DOI) | https://doi.org/10.1007/978-981-97-2421-5_1 |
Web Address (URL) of Paper | https://link.springer.com/chapter/10.1007/978-981-97-2421-5_1 |
Web Address (URL) of Conference Proceedings | https://link.springer.com/book/10.1007/978-981-97-2421-5 |
Conference/Event | 7th International Joint Conference on Asia-Pacific Web and Web-Age Information Management (APWeb-WAIM 2023) |
Event Details | 7th International Joint Conference on Asia-Pacific Web and Web-Age Information Management (APWeb-WAIM 2023) Parent Joint International Conference on Asia-Pacific Web Conference (APWeb)/Web-Age Information Management (WAIM) Delivery In person Event Date 06 to end of 08 Oct 2023 Event Location Wuhan, China |
Abstract | Object detector based on CNN structure has been widely used in object detection, object classification and other tasks. The traditional CNN module usually adopts complex multi-branch design, which reduces the reasoning speed and memory utilization. Moreover, in many works, attention mechanism is usually added to the object detector to extract rich features in spatial information, which are usually used as additional modules of convolution without fundamental improvement from the limitations of convolution operation. Finally, traditional object detectors often have coupled detection heads, which can compromise model performance. To solve the above problems, we propose a new object detection model, YOLO-SA, based on the current popular object detector model YOLOv5. We introduce a new reparameterized module RepVGG to replace the original DarkNet53 structure of YOLOv5 model, which greatly reduces the complexity of the model and improves the detection accuracy. We introduce a self-attention mechanism module in the feature fusion part of the model, which is independent from other convolutional layers and has higher performance than other mainstream attention mechanism modules. We replace the coupled detection head in YOLOv5 model with an anchor-based decoupled detection head, which greatly improved the convergence speed in the training process. Experiments show that the detection accuracy of the YOLO-SA model proposed by us reaches 71.2% and 75.8% on COCO2014 and VOC2012 dataset respectively, which is superior to the YOLOv5s model as the baseline and other mainstream object detection models, showing certain superiority. |
Keywords | Object detection; CNN architecture ; Attention mechanis; Decoupled detection head |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 460299. Artificial intelligence not elsewhere classified |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Series | Lecture Notes in Computer Science |
Byline Affiliations | Chang'an University, China |
Swinburne University of Technology | |
Macquarie University |
https://research.usq.edu.au/item/z9v20/yolo-sa-an-efficient-object-detection-model-based-on-self-attention-mechanism
16
total views0
total downloads1
views this month0
downloads this month