YOLO-SA: An Efficient Object Detection Model Based on Self-attention Mechanism

Paper

Li, Ang, Song, Xiangyu, Sun, ShiJie, Zhang, Zhaoyang, Cai, Taotao and Song, Huansheng. 2024. "YOLO-SA: An Efficient Object Detection Model Based on Self-attention Mechanism." 7th International Joint Conference on Asia-Paciﬁc Web and Web-Age Information Management (APWeb-WAIM 2023). Wuhan, China 06 - 08 Oct 2023 Singapore. Springer. https://doi.org/10.1007/978-981-97-2421-5_1

Paper/Presentation Title	YOLO-SA: An Efficient Object Detection Model Based on Self-attention Mechanism
Presentation Type	Paper
Authors	Li, Ang, Song, Xiangyu, Sun, ShiJie, Zhang, Zhaoyang, Cai, Taotao and Song, Huansheng
Journal or Proceedings Title	Proceedings of the 7th International Joint Conference on Asia-Paciﬁc Web and Web-Age Information Management (APWeb-WAIM 2023)
Journal Citation	14334, pp. 1-15
Number of Pages	15
Year	2024
Publisher	Springer
Place of Publication	Singapore
ISBN	9789819724215
	9789819724208
Digital Object Identifier (DOI)	https://doi.org/10.1007/978-981-97-2421-5_1
Web Address (URL) of Paper	https://link.springer.com/chapter/10.1007/978-981-97-2421-5_1
Web Address (URL) of Conference Proceedings	https://link.springer.com/book/10.1007/978-981-97-2421-5
Conference/Event	7th International Joint Conference on Asia-Paciﬁc Web and Web-Age Information Management (APWeb-WAIM 2023)
Event Details	7th International Joint Conference on Asia-Paciﬁc Web and Web-Age Information Management (APWeb-WAIM 2023) Parent Joint International Conference on Asia-Pacific Web Conference (APWeb)/Web-Age Information Management (WAIM) Delivery In person Event Date 06 to end of 08 Oct 2023 Event Location Wuhan, China
Abstract	Object detector based on CNN structure has been widely used in object detection, object classification and other tasks. The traditional CNN module usually adopts complex multi-branch design, which reduces the reasoning speed and memory utilization. Moreover, in many works, attention mechanism is usually added to the object detector to extract rich features in spatial information, which are usually used as additional modules of convolution without fundamental improvement from the limitations of convolution operation. Finally, traditional object detectors often have coupled detection heads, which can compromise model performance. To solve the above problems, we propose a new object detection model, YOLO-SA, based on the current popular object detector model YOLOv5. We introduce a new reparameterized module RepVGG to replace the original DarkNet53 structure of YOLOv5 model, which greatly reduces the complexity of the model and improves the detection accuracy. We introduce a self-attention mechanism module in the feature fusion part of the model, which is independent from other convolutional layers and has higher performance than other mainstream attention mechanism modules. We replace the coupled detection head in YOLOv5 model with an anchor-based decoupled detection head, which greatly improved the convergence speed in the training process. Experiments show that the detection accuracy of the YOLO-SA model proposed by us reaches 71.2% and 75.8% on COCO2014 and VOC2012 dataset respectively, which is superior to the YOLOv5s model as the baseline and other mainstream object detection models, showing certain superiority.
Keywords	Object detection; CNN architecture ; Attention mechanis; Decoupled detection head
Contains Sensitive Content	Does not contain sensitive content
ANZSRC Field of Research 2020	460299. Artificial intelligence not elsewhere classified
Public Notes	Files associated with this item cannot be displayed due to copyright restrictions.
Series	Lecture Notes in Computer Science
Byline Affiliations	Chang'an University, China
	Swinburne University of Technology
	Macquarie University

Permalink -

https://research.usq.edu.au/item/z9v20/yolo-sa-an-efficient-object-detection-model-based-on-self-attention-mechanism

65
total views
0
total downloads
11
views this month
0
downloads this month

Export as

Related outputs

Causal integration in graph neural networks toward enhanced classification: benchmarking and advancements for robust performance

Job, Simi, Tao, Xiaohui, Cai, Taotao, Li, Lin, Sheng, Quan Z, Xie, Haoran and Yong, Jianming. 2025. "Causal integration in graph neural networks toward enhanced classification: benchmarking and advancements for robust performance." World Wide Web. 28 (3). https://doi.org/10.1007/s11280-025-01343-1

HebCGNN: Hebbian-enabled causal classification integrating dynamic impact valuing

Job, Simi, Tao, Xiaohui, Cai, Taotao, Li, Lin, Xie, Haoran, Xu, Cai and Yong, Jianming. 2025. "HebCGNN: Hebbian-enabled causal classification integrating dynamic impact valuing." Knowledge-Based Systems. 311. https://doi.org/10.1016/j.knosys.2025.113094

A Survey on Truth Discovery: Concepts, Methods, Applications, and Opportunities

Wang, Shuang, Zhang, He, Sheng, Quan Z., Li, Xiaoping, Sun, Zhu, Cai, Taotao, Zhang, Wei Emma, Yang, Jian and Gao, Qing. 2025. "A Survey on Truth Discovery: Concepts, Methods, Applications, and Opportunities." IEEE Transactions on Big Data. 11 (2), pp. 314-332. https://doi.org/10.1109/TBDATA.2024.3423677

ECS-STPM: An Efficient Model for Tunnel Fire Anomaly Detection

Song, Huansheng, Wen, Ya, Song, Xiangyu, Sun, ShiJie, Cai, Taotao and Li, Jianxin. 2024. "ECS-STPM: An Efficient Model for Tunnel Fire Anomaly Detection." 7th International Joint Conference on Asia-Paciﬁc Web and Web-Age Information Management (APWeb-WAIM 2023). Wuhan, China 06 - 08 Oct 2023 Singapore . Springer. https://doi.org/10.1007/978-981-97-2421-5_19

MDCGA-Net: Multi-Scale Direction Context-Aware Network with Global Attention for Building Extraction from Remote Sensing Images

Niu, Penghui, Gu, Junhua, Zhang, Yajuan, Zhang, Ping, Cai, Taotao, Xu, Wenjia and Han, Jungong. 2024. "MDCGA-Net: Multi-Scale Direction Context-Aware Network with Global Attention for Building Extraction from Remote Sensing Images." IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 17, pp. 8461-8476. https://doi.org/10.1109/JSTARS.2024.3387969

Special issue on link prediction in complex networks

Sheng, Michael, Cai, Taotao and Mahmood, Adnan. 2024. "Special issue on link prediction in complex networks." Computing. 106 (7), pp. 2079-2079. https://doi.org/10.1007/s00607-024-01298-7

Optimal Treatment Strategies for Critical Patients with Deep Reinforcement Learning

Job, Simi, Tao, Xiaohui, Li, Lin, Xie, Haoran, Cai, Taotao, Yong, Jianming and Li, Qing. 2024. "Optimal Treatment Strategies for Critical Patients with Deep Reinforcement Learning ." ACM Transactions on Intelligent Systems and Technology. 15 (2), pp. 1-22. https://doi.org/10.1145/3643856

Robust equivalent circuit model parameters identification scheme for State of Charge (SOC) estimation based on maximum correntropy criterion

Zhang, Kexin, Zhao, Xuezhuan, Chen, Yu, Wu, Di, Cai, Taotao, Wang, Yi, Li, Lingling and Zhang, Ji. 2024. "Robust equivalent circuit model parameters identification scheme for State of Charge (SOC) estimation based on maximum correntropy criterion." International Journal of Electrochemical Science. 19 (5). https://doi.org/10.1016/j.ijoes.2024.100558

FRAMU: Attention-based Machine Unlearning using Federated Reinforcement Learning

Shaik, Thanveer, Tao, Xiaohui, Li, Lin, Xie, Haoran, Cai, Taotao, Zhu, Xiaofeng and Li, Qing. 2024. "FRAMU: Attention-based Machine Unlearning using Federated Reinforcement Learning ." IEEE Transactions on Knowledge and Data Engineering. 36 (10), pp. 5153-5167. https://doi.org/10.1109/TKDE.2024.3382726

Reconnecting the Estranged Relationships: Optimizing the Influence Propagation in Evolving Networks

Cai, Taotao, Lei, Qi, Sheng, Quan Z., Cui, Ningning, Yang, Shuiqiao, Yang, Jian, Zhang, Wei Emma and Mahmood, Adnan. 2024. "Reconnecting the Estranged Relationships: Optimizing the Influence Propagation in Evolving Networks." IEEE Transactions on Knowledge and Data Engineering. 36 (5), pp. 2151-2165. https://doi.org/10.1109/TKDE.2023.3316268

Top-k socio-spatial co-engaged location selection for social users

Hasan Haldar, Nur Al, Li, Jianxin, Ali, Mohammed Eunus, Cai, Taotao, Chen, Yunliang, Sellis, Timos and Reynolds, Mark. 2023. "Top-k socio-spatial co-engaged location selection for social users." IEEE Transactions on Knowledge and Data Engineering. 35 (5), pp. 5325-5340. https://doi.org/10.1109/TKDE.2022.3151095

Towards Multi-User, Secure, and Verifiable kNN Query in Cloud Database

Cui, Ningning, Qian, Kang, Cai, Taotao, Li, Jianxin, Yang, Xiaochun, Cui, Jie and Zhong, Hong. 2023. "Towards Multi-User, Secure, and Verifiable kNN Query in Cloud Database." IEEE Transactions on Knowledge and Data Engineering. 35 (9), pp. 9333-9349. https://doi.org/10.1109/TKDE.2023.3237879

Incremental graph computation: Anchored Vertex Tracking in Dynamic Social Networks

Cai, Taotao, Yang, Shuiqiao, Li, Jianxin, Sheng, Quan Z., Yang, Jian, Wang, Xin, Zhang, Wei Emma and Gao, Longxiang. 2023. "Incremental graph computation: Anchored Vertex Tracking in Dynamic Social Networks." IEEE Transactions on Knowledge and Data Engineering. 35 (7), pp. 7030-7044. https://doi.org/10.1109/TKDE.2022.3199494

Dynamic Correlation Adjacency Matrix Based Graph Neural Network for Traffic Flow Prediction

Gu, Junhua, Jia, Zhihao, Cai, Taotao, Song, Xiangyu and Mahmood, Adnan. 2023. "Dynamic Correlation Adjacency Matrix Based Graph Neural Network for Traffic Flow Prediction." Sensors. 23 (6). https://doi.org/10.3390/s23062897

Robust cross-network node classification via constrained graph mutual information

Yang, Shuiqiao, Cai, Borui, Cai, Taotao, Song, Xiangyu, Jiang, Jiaojiao, Li, Bing and Li, Jianxin. 2022. "Robust cross-network node classification via constrained graph mutual information." Knowledge-Based Systems. 257. https://doi.org/10.1016/j.knosys.2022.109852

A survey on deep learning based knowledge tracing

Song, Xiangyu, Li, Jianxin, Cai, Taotao, Yang, Shuiqiao, Yang, Tingting and Liu, Chengfei. 2022. "A survey on deep learning based knowledge tracing." Knowledge-Based Systems. 258. https://doi.org/10.1016/j.knosys.2022.110036

Target-Aware Holistic Influence Maximization in Spatial Social Networks

Cai, Taotao, Li, Jianxin, Mian, Ajmal, Li, Rong-Hua, Sellis, Timos and Yu, Jeffrey Xu. 2022. "Target-Aware Holistic Influence Maximization in Spatial Social Networks ." IEEE Transactions on Knowledge and Data Engineering. 34 (4), pp. 1993-2007. https://doi.org/10.1109/TKDE.2020.3003047

Community-diversity Driven Influence Maximization on Social Networks

Li, Jianxin, Cai, Taotao, Ke, Deng, Wang, Xinjue, Sellis, Timos and Xia, Feng. 2020. "Community-diversity Driven Influence Maximization on Social Networks." Information Systems. 92. https://doi.org/10.1016/j.is.2020.101522

Anchor vertex selection for enhanced reliability of traffic offloading service in edge-enabled mobile P2P social networks

Zhang, Hengda, Wang, Xiaofei, Fan, Hao, Cai, Taotao, Li, Jianxin, Li, Xiuhua and Leung, Victor C. M.. 2020. "Anchor vertex selection for enhanced reliability of traffic offloading service in edge-enabled mobile P2P social networks." Journal of Communications and Information Networks. 5 (2), pp. 217-224. https://doi.org/10.23919/JCIN.2020.9130437

Anchored Vertex Exploration for Community Engagement in Social Networks

Cai, Taotao, Li, Jianxin, Hasan Haldar, Nur Al, Mian, Ajmal, Yearwood, John and Sellis, Timos. 2020. "Anchored Vertex Exploration for Community Engagement in Social Networks ." 2020 IEEE 36th International Conference on Data Engineering (ICDE). Dallas, United States 20 - 24 Apr 2020 United States. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICDE48307.2020.00042

Correlate Influential News Article Events to Stock Quote Movement

Mandalapu, Arun Chaitanya, Gunabalan, Saranya, Sadineni, Avinash, Cai, Taotao, Hasan, Nur Al Hasan and Li, Jianxin. 2019. "Correlate Influential News Article Events to Stock Quote Movement ." Li, Jianxin, Wang, Sen, Qin, Shaowen, Li, Xue and Wang, Shuliang (ed.) 15th International Conference on Advanced Data Mining and Applications. Dalian, China 21 - 23 Nov 2019 Switzerland. Springer. https://doi.org/10.1007/978-3-030-35231-8_24

Holistic Influence Maximization for Targeted Advertisements in Spatial Social Networks

Li, Jianxin, Cai, Taotao, Mian, Ajmal, Li, Rong-Hua, Sellis, Timos and Yu, Jeffrey Xu. 2018. "Holistic Influence Maximization for Targeted Advertisements in Spatial Social Networks ." 2018 IEEE 34th International Conference on Data Engineering (ICDE). Paris, France 16 - 19 Apr 2018 United States. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICDE.2018.00145

Efficient Distance-based Representative Skyline Computation in 2D Space

Mao, Rui, Cai, Taotao, Li, Rong-Hua, Yu, Jeffery Xu and Li, Jianxin. 2017. "Efficient Distance-based Representative Skyline Computation in 2D Space." World Wide Web. 20 (4), pp. 621-638. https://doi.org/10.1007/s11280-016-0406-0

Efficient Algorithms for Distance-Based Representative Skyline Computation in 2D Space

Cai, Taotao, Li, Rong-Hua, Yu, Jeffrey Xu, Mao, Rui and Cai, Yadi. 2015. "Efficient Algorithms for Distance-Based Representative Skyline Computation in 2D Space ." 17th Asia-Pacific Web Conference (APWeb2015). Guangzhou, China 18 - 20 Sep 2015 Switzerland . Springer. https://doi.org/10.1007/978-3-319-25255-1_10