Robust cross-modal retrieval with alignment refurbishment

Jinyi GUO; Jieyu DING

doi:10.1631/FITEE.2200514

PDF(7372 KB)

Front. Inform. Technol. Electron. Eng ›› 2023, Vol. 24 ›› Issue (10) : 1403-1415. DOI: 10.1631/FITEE.2200514

Orginal Article

Robust cross-modal retrieval with alignment refurbishment

Jinyi GUO¹ ,
Jieyu DING²

Author information +

History +

Abstract

Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data. Currently, many cross-modal retrieval methods have been proposed and have achieved excellent results; however, these are trained with clean cross-modal pairs, which are semantically matched but costly, compared with easily available data with noise alignment (i.e., paired but mismatched in semantics). When training these methods with noise-aligned data, the performance degrades dramatically. Therefore, we propose a robust cross-modal retrieval with alignment refurbishment (RCAR), which significantly reduces the impact of noise on the model. Specifically, RCAR first conducts multi-task learning to slow down the overfitting to the noise to make data separable. Then, RCAR uses a two-component beta-mixture model to divide them into clean and noise alignments and refurbishes the label according to the posterior probability of the noise-alignment component. In addition, we define partial and complete noises in the noise-alignment paradigm. Experimental results show that, compared with the popular cross-modal retrieval methods, RCAR achieves more robust performance with both types of noise.

Keywords

Cross-modal retrieval / Robust learning / Alignment correction / Beta-mixture model

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Jinyi GUO, Jieyu DING. Robust cross-modal retrieval with alignment refurbishment. Front. Inform. Technol. Electron. Eng, 2023, 24(10): 1403‒1415 https://doi.org/10.1631/FITEE.2200514