KANs-DETR: Enhancing Detection Transformer with Kolmogorov-Arnold Networks for small object

Jingyu Zhang , Wentao Peng , Anyan Xiao , Tao Liu , Junchao Fu , Jian Chen , Zhuo Yan

High-Confidence Computing ›› 2026, Vol. 6 ›› Issue (1) : 100336

PDF (2867KB)
High-Confidence Computing ›› 2026, Vol. 6 ›› Issue (1) :100336 DOI: 10.1016/j.hcc.2025.100336
Research Articles
research-article
KANs-DETR: Enhancing Detection Transformer with Kolmogorov-Arnold Networks for small object
Author information +
History +
PDF (2867KB)

Abstract

This research proposed an end-to-end object detection network based on Kolmogorov-Arnold Networks (KANs)-Detection Transformer (DETR). KANs block was introduced into encoder-decoder structure instead of the full connection layer to dynamically learn the activation function and improve the robustness and accuracy of the model. Experiments showed that the detection capability of KANs-DETR on multicategory object detection was better than that of HGNetv2 and Swin Transformer as backbone. Furthermore, in order to solve the problem of insensitivity to small objects, the Squeeze-and-Excitation module was applied for feature fusion and presented better performance. The KANs-DETR achieved high detection accuracy and efficiency in handling small objects in complex scenes, providing a new perspective for network optimization.

Keywords

Object detection / SE networks / DETR / Kolmogorov-Arnold networks / Swin transformer

Cite this article

Download citation ▾
Jingyu Zhang, Wentao Peng, Anyan Xiao, Tao Liu, Junchao Fu, Jian Chen, Zhuo Yan. KANs-DETR: Enhancing Detection Transformer with Kolmogorov-Arnold Networks for small object. High-Confidence Computing, 2026, 6(1): 100336 DOI:10.1016/j.hcc.2025.100336

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Jingyu Zhang: Writing - review & editing, Conceptualization, Validation, Methodology. Wentao Peng: Resources, Writing - original draft, Validation. Anyan Xiao: Formal analysis, Supervision, Writing - review & editing. Tao Liu: Funding acquisition. Junchao Fu: Investigation. Jian Chen: Investigation. Zhuo Yan: Project administration, Conceptualization, Supervision, Method- ology, Formal analysis.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was funded by the Achievements Transformation and Industrialization of the New Generation Intelligent Airport Operation Control System (2024ZHCG0189) and the Achievement Transformation and Industrialization of the Hub Airport Ground Support and Intelligent Dispatch System (2024ZHCG0013).

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

[1]

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, 2020.

[2]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, 2017, ArXiv.

[3]

D. Meng, X. Chen, Z. Fan, G. Zeng, H. Li, Y. Yuan, L. Sun, J. Wang, Conditional DETR for fast training convergence, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 3651-3660.

[4]

Y. Liu, Y. Zhang, X. Zhang, Deep transfer network of heterogeneous domain feature in machine translation, High-Confid. Comput. 2 (4) (2022) 100083, http://dx.doi.org/10.1016/j.hcc.2022.100083, URL https://www.sciencedirect.com/science/article/pii/S2667295222000356.

[5]

Z. Qin, X. Zhang, S. Li, A robust adversarial attack against speech recognition with UAP, High-Confid. Comput. 3 (1) (2023) 100098, http://dx.doi.org/10.1016/j.hcc.2022.100098, URL https://www.sciencedirect.com/science/article/pii/S2667295222000502.

[6]

X. Tao, Y. Yu, L. Fu, J. Liu, Y. Zhang, An insider user authentication method based on improved temporal convolutional network, High-Confid. Comput. 3 (4) (2023) 100169, http://dx.doi.org/10.1016/j.hcc.2023.100169, URL https://www.sciencedirect.com/science/article/pii/S2667295223000673.

[7]

A.J. Patil, R. Shelke, An effective digital audio watermarking using a deep convolutional neural network with a search location optimization algorithm for improvement in robustness and imperceptibility, High-Confid. Comput. 3 (4) (2023) 100153, http://dx.doi.org/10.1016/j.hcc.2023.100153, URL https://www.sciencedirect.com/science/article/pii/S266729522300051X.

[8]

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable DETR: Deformable transformers for end-to-end object detection, 2020, ArXiv, arXiv:2010. 04159, URL https://api.semanticscholar.org/CorpusID:222208633.

[9]

Z. Shen, M. Zhang, H. Zhao, S. Yi, H. Li, Decomposed attention: Selfattention with linear complexities, 2018, URL https://api.semanticscholar.org/CorpusID:85498136.

[10]

Z. Xia, X. Pan, S. Song, L.E. Li, G. Huang, Vision transformer with deformable attention, 2022 IEEE/CVF Conf. Comput. Vis. Pattern Recognition (CVPR) (2022) 4784-4793, URL https://api.semanticscholar.org/CorpusID:245650206.

[11]

Y. Wang, X. Zhang, T. Yang, J. Sun, Anchor DETR: Query design for transformer-based object detection, 2021, URL https://api.semanticscholar.org/CorpusID:245669131.

[12]

X. Feng, J. Han, R. Zhang, S. Xu, H. Xia, Security defense strategy algorithm for internet of things based on deep reinforcement learning, High-Confid. Comput. 4 (1) (2024) 100167, http://dx.doi.org/10.1016/j.hcc.2023.100167, URL https://www.sciencedirect.com/science/article/pii/S266729522300065X.

[13]

S.I. Rahman, S. Ahmed, T.A. Fariha, A. Mohammad, M.N.M. Haque, S. Chellappan, J. Noor, Unsupervised machine learning approach for tailoring educational content to individual student weaknesses, High-Confid. Comput. 4 (4) (2024) 100228, http://dx.doi.org/10.1016/j.hcc.2024.100228, URL https://www.sciencedirect.com/science/article/pii/S266729522400031X.

[14]

A.J. Patil, R. Shelke, Erratum to "an effective digital audio watermarking using a deep convolutional neural network with a search location optimization algorithm for improvement in robustness and imperceptibility" [high-confid. Comput. 3 (2023) 100153], High-Confid. Comput. 4 (3) (2024) 100256, http://dx.doi.org/10.1016/j.hcc.2024.100256, URL https://www.sciencedirect.com/science/article/pii/S266729522400059X.

[15]

S. Liu, F. Li, H. Zhang, X.B. Yang, X. Qi, H. Su, J. Zhu, L. Zhang, DAB-DETR: Dynamic anchor boxes are better queries for DETR, 2022, ArXiv, arXiv: 2201.12329, URL https://api.semanticscholar.org/CorpusID:246411225.

[16]

M. Wang, P. Yang, Y. Zhang, Capsule networks embedded with prior known support information for image reconstruction, High-Confid. Comput. 3 (4) (2023) 100125, http://dx.doi.org/10.1016/j.hcc.2023.100125, URL https://www.sciencedirect.com/science/article/pii/S2667295223000235.

[17]

Y. Pu, W. Liang, Y. Hao, Y. Yuan, Y. Yang, C. Zhang, H. Hu, G. Huang, Rank-DETR for high quality object detection, 2023, ArXiv, arXiv:2310.08854, URL https://api.semanticscholar.org/CorpusID:264128260.

[18]

C. Zhao, Y. Sun, W. Wang, Q. Chen, E. Ding, Y. Yang, J. Wang, MS-DETR: Efficient DETR training with mixed supervision, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2024, pp. 17027-17036, http://dx.doi.org/10.1109/CVPR52733.2024.01611.

[19]

X. Hou, M. Liu, S. Zhang, P. Wei, B. Chen, Salience DETR: Enhancing detection transformer with hierarchical salience filtering refinement, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2024, pp. 17574-17583, http://dx.doi.org/10.1109/CVPR52733.2024.01664.

[20]

W. Du, L. Geng, J. Liu, Z. Zhao, C. Wang, J. Huo, Decoupled knowledge distillation method based on meta-learning, High-Confid. Comput. 4 (1) (2024) 100164, http://dx.doi.org/10.1016/j.hcc.2023.100164, URL https://www.sciencedirect.com/science/article/pii/S2667295223000624.

[21]

J. Qian, L. Zhang, Q. Huang, X. Liu, X. Xing, X. Li, A self-driving solution for resource-constrained autonomous vehicles in parked areas, High-Confid. Comput. 4 (1) (2024) 100182, http://dx.doi.org/10.1016/j.hcc.2023.100182, URL https://www.sciencedirect.com/science/article/pii/S2667295223000806.

[22]

Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, J. Chen, DETRs beat YOLOs on real-time object detection, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2024, pp. 16965-16974, http://dx.doi.org/10.1109/CVPR52733.2024.01605.

[23]

G. Lu, K. Li, X. Wang, Z. Liu, Z. Cai, W. Li, Neural-based inexact graph de-anonymization, High-Confid. Comput. 4 (1) (2024) 100186, http://dx.doi.org/10.1016/j.hcc.2023.100186, URL https://www.sciencedirect.com/science/article/pii/S2667295223000843.

[24]

Y. Liang, Outlier item detection in bundle recommendation via the attention mechanism, High-Confid. Comput. 4 (3) (2024) 100200, http://dx.doi.org/10.1016/j.hcc.2024.100200, URL https://www.sciencedirect.com/science/article/pii/S2667295224000035.

[25]

H. Shi, C. Wang, H. Zhao, S. Wang, Y. Chen, Bioinvasion risk analysis based on automatic identification system and marine ecoregion data, High-Confid. Comput. 4 (4) (2024) 100210, http://dx.doi.org/10.1016/j.hcc.2024.100210, URL https://www.sciencedirect.com/science/article/pii/S2667295224000138.

[26]

J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Computer Vision & Pattern Recognition, 2016.

[27]

J. Redmon, A. Farhadi, YOLO9000: Better, faster, stronger, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 6517-6525, http://dx.doi.org/10.1109/CVPR.2017.690.

[28]

J. Redmon, A. Farhadi, YOLOv3: An incremental improvement, 2018, ArXiv E-Prints.

[29]

A. Bochkovskiy, C.-Y. Wang, H.-Y.M. Liao, YOLOv4: Optimal speed and accuracy of object detection, 2020, ArXiv, arXiv:2004.10934, URL https://api.semanticscholar.org/CorpusID:216080778.

[30]

C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei, X. Wei, YOLOv6: A single-stage object detection framework for industrial applications, 2022, ArXiv, arXiv:2209.02976, URL https://api.semanticscholar.org/CorpusID:252110986.

[31]

C.Y. Wang, A. Bochkovskiy, H.Y.M. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, 2022, ArXiv E-Prints.

[32]

C.Y. Wang, I.H. Yeh, H.Y.M. Liao, YOLOv9: Learning what you want to learn using programmable gradient information, 2024.

[33]

A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, G. Ding, YOLOv10: Real-time end-to-end object detection, 2024, ArXiv, arXiv:2405.14458, URL https://api.semanticscholar.org/CorpusID:269983404.

[34]

R. Khanam, M. Hussain, YOLOv11: An overview of the key architectural enhancements, 2024.

[35]

T. Cheng, L. Song, Y. Ge, W. Liu, X. Wang, Y. Shan, YOLO-world: Realtime open-vocabulary object detection, 2024 IEEE/CVF Conf. Comput. Vis. Pattern Recognition ( CVPR) (2024) 16901-16911, URL https://api.semanticscholar.org/CorpusID:267320681.

[36]

Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljai, T.Y. Hou, M. Tegmark, KAN: Kolmogorov-arnold networks, 2024.

[37]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, 2021.

[38]

A. Xiao, Z. Yan, Z. Li, H. Xu, H. Zheng, Y. Ai, X. Zhang, Q. Sun, C. Zhao, An improved lightweight linear K-value transformer, in: 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2023, pp. 2246-2250, http://dx.doi.org/10.1109/TrustCom60117.2023.00316.

[39]

D. Yu, H. Zhang, Y. Huang, Z. Xie, Data distribution inference attack in federated learning via reinforcement learning support, High-Confid. Comput. 5 (1) (2025) 100235, http://dx.doi.org/10.1016/j.hcc.2024.100235, URL https://www.sciencedirect.com/science/article/pii/S2667295224000382.

[40]

J. Xi, W. Zhang, Z. Xu, S. Zhu, L. Tang, L. Zhao, Three-dimensional dynamic gesture recognition method based on convolutional neural network, High-Confid. Comput. 5 (1) (2025) 100280, http://dx.doi.org/10.1016/j.hcc.2024.100280, URL https://www.sciencedirect.com/science/article/pii/S2667295224000837.

[41]

I.O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, D. Keysers, J. Uszkoreit, M. Lucic, A. Dosovitskiy, MLP-mixer: An all-MLP architecture for vision, 2021, ArXiv, arXiv:2105.01601, URL https://api.semanticscholar.org/CorpusID:233714958.

[42]

M.-H. Guo, Z.-N. Liu, T.-J. Mu, S. Hu, Beyond self-attention: External attention using two linear layers for visual tasks, IEEE Trans. Pattern Anal. Mach. Intell. 45 (2021) 5436-5447, URL https://api.semanticscholar.org/CorpusID:233864910.

[43]

X. Ding, C. Xia, X. Zhang, X. Chu, J. Han, G. Ding, RepMLP: Re-parameterizing convolutions into fully-connected layers for image recognition, 2021.

[44]

L. Melas-Kyriazi, Do you even need attention? A stack of feed-forward layers does surprisingly well on ImageNet, 2021, ArXiv, arXiv:2105.02723, URL https://api.semanticscholar.org/CorpusID:233864618.

[45]

C. Zhao, Z. Yan, H. Xu, X. Chen, Z. Li, X. Zhong, C. Liu, A. Xiao, X. Lv, Design and implementation of mask detection system based on improved YOLOv5s, in: 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2023, pp. 2257-2262, http://dx.doi.org/10.1109/TrustCom60117.2023.00318.

[46]

H. Cheng, Y. Zhang, H. Xu, MSGU-net: A lightweight multi-scale ghost Unet for image segmentation, Front. Neurorobotics 18 (000) (2024) http://dx.doi.org/10.3389/fnbot.2024.1480055.

[47]

D. Wu, P. Zhang, Y. He, X. Luo, MMLF: Multi-metric latent feature analysis for high-dimensional and incomplete data, IEEE Trans. Serv. Comput. 17 (2) (2024) 575-588, http://dx.doi.org/10.1109/TSC.2023.3331570.

[48]

E. Beyazit, J. Alagurajah, X. Wu, Online learning from data streams with varying feature spaces, in: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence,in: AAAI’19/IAAI’19/EAAI’19, AAAI Press, 2019, http://dx.doi.org/10.1609/aaai.v33i01.33013232.

[49]

D. Wu, X. Luo, Y. He, M. Zhou, A prediction-sampling-based multilayer-structured latent factor model for accurate representation to highdimensional and sparse data, IEEE Trans. Neural Networks Learn. Syst. 35 (3) (2024) 3845-3858, http://dx.doi.org/10.1109/TNNLS.2022.3200009.

[50]

D. Wu, P. Zhang, Y. He, X. Luo, A double-space and double-norm ensembled latent factor model for highly accurate web service QoS prediction, IEEE Trans. Serv. Comput. 16 (2) (2023) 802-814, http://dx.doi.org/10.1109/TSC.2022.3178543.

[51]

J. Hu, L. Shen, G. Sun, S. Albanie, Squeeze-and-excitation networks, IEEE Trans. Pattern Anal. Mach. Intell. PP (99) (2017).

PDF (2867KB)

14

Accesses

0

Citation

Detail

Sections
Recommended

/