Federated Bimodal Graph Neural Networks for Text-Image Retrieval
Xueming Yan , Chuyue Wang , Yaochu Jin
International Journal of Network Dynamics and Intelligence ›› 2025, Vol. 4 ›› Issue (2) : 100009
Federated Bimodal Graph Neural Networks for Text-Image Retrieval
Text-image retrieval is a key challenge in computer vision and natural language processing, aiming to retrieve the most semantically relevant image or text given a query in the opposite modality. However, growing privacy and security concerns make traditional centralized learning approaches increasingly unsuitable for handling sensitive multimodal data. In this paper, we propose FedBi-GNNs, a federated learning framework for bimodal graph neural networks, which enables collaborative training across decentralized clients without sharing private data. Each client independently constructs heterogeneous graphs from local text and image data and learns correspondences via bimodal graph matching. These local representations are then aggregated at a central server using a heterogeneous federated aggregation scheme. Empirical results on the MSCOCO benchmark demonstrate that FedBi-GNNs significantly outperform existing state-of-the-art methods, offering improved retrieval accuracy, enhanced privacy preservation, and greater robustness to data heterogeneity across clients.
federated learning / bimodal graph neural networks / text-image retrieval / graph matching
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
/
| 〈 |
|
〉 |