Multi-instance multi-label position-aware doubly graph convolutional networks

Zhi LI; Teng ZHANG; Yilin WANG; Caiwu JIANG; Xuanhua SHI; Hai JIN

doi:10.1007/s11704-025-41159-w

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (2) : 2002312 DOI: 10.1007/s11704-025-41159-w

Artificial Intelligence

RESEARCH ARTICLE

Multi-instance multi-label position-aware doubly graph convolutional networks

Author information +

History +

PDF (1994KB)

Abstract

Multi-instance multi-label learning is a general framework in which each sample is represented as a bag of instances associated with multiple labels. However, two weaknesses remain and hinder its performance in real-world tasks. One is that bag generators often neglect positional information (e.g., the location of a pixel in the image) when generating bags, making position-related labels indistinguishable. The other is that the MIL assumption does not always hold. In some real-world tasks, labels have hierarchical low-level concepts, and these concepts are related to certain combinations of instances instead of one single instance. In this paper, we propose the Position-Aware Doubly Graph Convolutional Networks ( $p a d G C N$ ). On the one hand, $p a d G C N$ generates bags by arranging instances in a multi-instance graph to aggregate instances’ features by exploiting positional relationships among them. Then instances that aggregate other instances’ features are input into a neural network to obtain sub-sub-concepts used for multi-label learning. On the other hand, $p a d G C N$ learns sub-concepts from labels and organizes sub-sub-concepts, sub-concepts, and labels in a tripartite multi-label graph in hyperbolic space to exploit their hierarchical structure. Experiments are conducted on 6 image and text data sets. Compared to the SOTA methods, padGCN averagely achieves 5% improvement on 7 measurements. Pair-wise t-test results on 42 experiments indicate that padGCN is significantly better than SOTA methods in 30 experiments, comparable to SOTA methods in 12 experiments, and never worse than SOTA methods, which verifies the superiority and robustness of padGCN. Runtime experiments show that padGCN is comparable to SOTA methods and is computationally efficient.

Graphical abstract

Keywords

multi-instance / multi-label / GCN

Cite this article

Download citation ▾

Zhi LI, Teng ZHANG, Yilin WANG, Caiwu JIANG, Xuanhua SHI, Hai JIN. Multi-instance multi-label position-aware doubly graph convolutional networks. Front. Comput. Sci., 2026, 20(2): 2002312 DOI:10.1007/s11704-025-41159-w

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Zhou Z H, Zhang M L. Multi-instance multi-label learning with application to scene classification. In: Proceedings of the 20th International Conference on Neural Information Processing Systems. 2006, 1609−1616

[2]	Chen Z, Chi Z, Fu H, Feng D . Multi-instance multi-label image classification: a neural approach. Neurocomputing, 2013, 99: 298–306

[3]	Zha Z J, Hua X S, Mei T, Wang J, Qi G J, Wang Z. Joint multi-label multi-instance learning for image classification. In: Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1−8

[4]	Zhang Q, Goldman S A, Yu W, Fritts J. Content-based image retrieval using multiple-instance learning. In: Proceedings of the 19th International Conference on Machine Learning. 2002, 682−689

[5]	Yan K, Li Z, Zhang C . A new multi-instance multi-label learning approach for image and text classification. Multimedia Tools and Applications, 2016, 75( 13): 7875–7890

[6]	Surdeanu M, Tibshirani J, Nallapati R, Manning C D. Multi-instance multi-label learning for relation extraction. In: Proceedings of 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012, 455−465

[7]	Dietterich T G, Lathrop R H, Lozano-Pérez T . Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 1997, 89( 1−2): 31–71

[8]	Ding X, Li B, Xiong W, Guo W, Hu W, Wang B . Multi-instance multi-label learning combining hierarchical context and its application to image annotation. IEEE Transactions on Multimedia, 2016, 18( 8): 1616–1627

[9]	Wang H, Dong L, Sun M . Local feature aggregation algorithm based on graph convolutional network. Frontiers of Computer Science, 2022, 16( 3): 163309

[10]	Li Y X, Ji S, Kumar S, Ye J, Zhou Z H . Drosophila gene expression pattern annotation through multi-instance multi-label learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012, 9( 1): 98–112

[11]	Xu Y, Min H, Song H, Wu Q . Multi-instance multi-label distance metric learning for genome-wide protein function prediction. Computational Biology and Chemistry, 2016, 63: 30–40

[12]	Xiao L, Chen B L, Huang X, Liu H F, Jing L P, Yu J . Multi-label text classification method based on label semantic information. Journal of Software, 2020, 31( 4): 1079–1089

[13]	Ma H, Lu N, Mei J, Guan T, Zhang Y, Geng X . Label distribution learning for scene text detection. Frontiers of Computer Science, 2023, 17( 6): 176339

[14]	Ren Y, Xu N, Ling M, Geng X . Label distribution for multimodal machine learning. Frontiers of Computer Science, 2022, 16( 1): 161306

[15]	Dong X, Luo T, Fan R, Zhuge W, Hou C . Active label distribution learning via kernel maximum mean discrepancy. Frontiers of Computer Science, 2023, 17( 4): 174327

[16]	Tang W, Zhang W, Zhang M L . Multi-instance partial-label learning: towards exploiting dual inexact supervision. Science China Information Sciences, 2024, 67( 3): 132103

[17]	Feng J, Zhou Z H. Deep MIML network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 1884−1890

[18]	Huang S J, Gao W, Zhou Z H . Fast multi-instance multi-label learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41( 11): 2614–2627

[19]	Xing Y, Yu G, Domeniconi C, Wang J, Zhang Z L, Guo M. Multi-view multi-instance multi-label learning based on collaborative matrix factorization. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 5508−5515

[20]	Hang C, Wang W, Zhan D C. Multi-modal multi-instance multi-label learning with graph convolutional network. In: Proceedings of 2021 International Joint Conference on Neural Networks. 2021, 1−8

[21]	Yang M, Tang W T, Min F. Multi-instance multi-label learning based on parallel attention and local label manifold correlation. In: Proceedings of 2022 IEEE 9th International Conference on Data Science and Advanced Analytics. 2022, 1−10

[22]	Zhang M L, Wang Z J . MIMLRBF: RBF neural networks for multi-instance multi-label learning. Neurocomputing, 2009, 72( 16−18): 3951–3956

[23]	Li Y F, Hu J H, Jiang Y, Zhou Z H. Towards discovering what patterns trigger what labels. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. 2012, 1012−1018

[24]	Tu M, Huang J, He X, Zhou B. Multiple instance learning with graph neural networks. 2019, arXiv preprint arXiv: 1906.04881

[25]	Zhang M L. A k-nearest neighbor based multi-instance multi-label learning algorithm. In: Proceedings of the 22nd IEEE International Conference on Tools with Artificial Intelligence. 2010, 207−212

[26]	Pan Z, Wang B, Zhang R, Wang S, Li Y . MIML-GAN: a GAN-based algorithm for multi-instance multi-label learning on overlapping signal waveform recognition. IEEE Transactions on Signal Processing, 2023, 71( 1): 859–872

[27]	Zhang M L, Zhou Z H. M3MIML: a maximum margin method for multi-instance multi-label learning. In: Proceedings of the 18th IEEE International Conference on Data Mining. 2008, 688−697

[28]	Ghaseminezhad M H, Karami A . A novel self-organizing map (SOM) neural network for discrete groups of data clustering. Applied Soft Computing, 2011, 11( 4): 3771–3778

[29]	Andrews S, Tsochantaridis I, Hofmann T. Support vector machines for multiple-instance learning. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2002, 577−584

[30]

Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171−4186

[31]	Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 149

[32]	Xiao S, Bai T, Cui X, Wu B, Meng X, Wang B . A graph-based contrastive learning framework for medicare insurance fraud detection. Frontiers of Computer Science, 2023, 17( 2): 172341

[33]	Ma X, Li Z, Song G, Shi C . Learning discrete adaptive receptive fields for graph convolutional networks. Science China Information Sciences, 2023, 66( 12): 222101

[34]	Zhou L, Wang T, Qu H, Huang L, Liu Y. A weighted GCN with logical adjacency matrix for relation extraction. In: Proceedings of the 24th European Conference on Artificial Intelligence. 2020, 2314−2321

[35]	Chinchor N, Sundheim B. MUC-5 evaluation metrics. In: Proceedings of the 5th Conference on Message Understanding Conference. 1993, 69−78

[36]	Chami I, Ying R, Re C, Leskovec J. Hyperbolic graph convolutional neural networks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 438

[37]	Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2014, 1−15

[38]	Maron O, Ratan A L. Multiple-instance learning for natural scene classification. In: Proceedings of the 15th International Conference on Machine Learning. 1998, 341−349

[39]	Sebastiani F . Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 2002, 34( 1): 1–47

[40]	Duygulu P, Barnard K, de Freitas J F G, Forsyth D A. Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European Conference on Computer Vision. 2002, 97−112

[41]	Huiskes M J, Lew M S. The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. 2008, 39−43

[42]	Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft coco: common objects in context. In: Proceedings of the 13th European Conference on Computer Vision. 2014, 740−755

[43]	Xu X, Frank E. Logistic regression and boosting for labeled bags of instances. In: Proceedings of the 8th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2004, 272−281

[44]	Ren S, He K, Girshick R, Sun J . Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39( 6): 1137–1149