Invariant graph learning meets information bottleneck for out-of-distribution generalization

Wenyu MAO , Jiancan WU , Haoyang LIU , Yongduo SUI , Xiang WANG

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (1) : 2001305

PDF (1929KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (1) : 2001305 DOI: 10.1007/s11704-025-40798-3
Artificial Intelligence
RESEARCH ARTICLE

Invariant graph learning meets information bottleneck for out-of-distribution generalization

Author information +
History +
PDF (1929KB)

Abstract

Graph out-of-distribution (OOD) generalization remains a major challenge in graph learning since graph neural networks (GNNs) often suffer from severe performance degradation under distribution shifts. Invariant learning, aiming to extract invariant features across varied distributions, has recently emerged as a promising approach for OOD generalization. Despite the great success of invariant learning in OOD problems for Euclidean data (i.e., images), the exploration within graph data remains constrained by the complex nature of graphs. The invariant features at both the attribute and structural levels, combined with the absence of prior knowledge regarding environmental factors, make the invariance and sufficiency conditions of invariant learning hard to satisfy on graph data. Existing studies, such as data augmentation or causal intervention, either suffer from disruptions to invariance during the graph manipulation process or face reliability issues due to a lack of supervised signals for causal parts. In this work, we propose a novel framework, called Invariant Graph Learning based on Information bottleneck theory (InfoIGL), to extract the invariant features of graphs and enhance models’ generalization ability to unseen distributions. Specifically, InfoIGL introduces a redundancy filter to compress task-irrelevant information related to environmental factors. Cooperating with our designed multi-level contrastive learning, we maximize the mutual information among graphs of the same class in the downstream classification tasks, preserving invariant features for prediction to a great extent. An appealing feature of InfoIGL is its strong generalization ability without depending on supervised signal of invariance. Experiments on both synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance under OOD generalization for graph classification tasks. The source code is available at github.com/maowenyu-11/InfoIGL website.

Graphical abstract

Keywords

graph OOD / contrastive learning / information bottleneck theory / invariant learning

Cite this article

Download citation ▾
Wenyu MAO, Jiancan WU, Haoyang LIU, Yongduo SUI, Xiang WANG. Invariant graph learning meets information bottleneck for out-of-distribution generalization. Front. Comput. Sci., 2026, 20(1): 2001305 DOI:10.1007/s11704-025-40798-3

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Wu J, He X, Wang X, Wang Q, Chen W, Lian J, Xie X . Graph convolution machine for context-aware recommender system. Frontiers of Computer Science, 2022, 16( 6): 166614

[2]

Mao W, Wu J, Chen W, Gao C, Wang X, He X. Reinforced prompt personalization for recommendation with large language models[J]. ACM Transactions on Information Systems, 2025, 43(3): 1−27

[3]

Mao W, Liu S, Liu H, Liu H, Li X, Hu L. Distinguished quantized guidance for diffusion-based sequence recommendation. In: Proceedings of Web Conference 2025, 2025

[4]

Wu J, Wang X, Gao X, Chen J, Fu H, Qiu T . On the effectiveness of sampled softmax loss for item recommendation. ACM Transactions on Information Systems, 2024, 42( 4): 98

[5]

Wu J, Yang Y, Qian Y, Sui Y, Wang X, He X. GIF: a general graph unlearning strategy via influence function. In: Proceedings of Web Conference 2023. 2023, 651–661

[6]

Xu K, Hu W, Leskovec J, Jegelka S. How powerful are graph neural networks? In: Proceedings of the 7th International Conference on Learning Representations. 2019

[7]

Sui Y, Mao W, Wang S, Wang X, Wu J, He X, Chua T . Enhancing out-of-distribution generalization on graphs via causal attention learning. ACM Transactions on Knowledge Discovery from Data, 2024, 18( 5): 127

[8]

Wang Z, Veitch V. A unified causal view of domain invariant representation learning. In: Proceedings of ICML 2022 Workshop on Spurious Correlations, Invariance, and Stability. 2022

[9]

Wu Y, Wang X, Zhang A, He X, Chua T S. Discovering invariant rationales for graph neural networks. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[10]

Arjovsky M, Bottou L, Gulrajani I, Lopez-Paz D. Invariant risk minimization. 2020, arXiv preprint arXiv:1907.02893

[11]

Wu Q, Zhang H, Yan J, Wipf D. Handling distribution shifts on graphs: an invariance perspective. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[12]

Liu G, Zhao T, Xu J, Luo T, Jiang M. Graph rationalization with environment-based augmentations. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 1069–1078

[13]

Li H, Zhang Z, Wang X, Zhu W. Learning invariant graph representations for out-of-distribution generalization. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 859

[14]

Wang Y, Wang W, Liang Y, Cai Y, Hooi B. Mixup for node and graph classification. In: Proceedings of Web Conference 2021. 2021, 3663–3674

[15]

Fan S, Wang X, Shi C, Cui P, Wang B . Generalizing graph neural networks on out-of-distribution graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46( 1): 322–337

[16]

Li H, Wang X, Zhang Z, Zhu W . OOD-GNN: out-of-distribution generalized graph neural network. IEEE Transactions on Knowledge and Data Engineering, 2023, 35( 7): 7328–7340

[17]

Sui Y, Wang X, Wu J, Lin M, He X, Chua T S. Causal attention for interpretable and generalizable graph classification. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022, 1696–1705

[18]

Pearl J . Interpretation and identification of causal mediation. Psychological Methods, 2014, 19( 4): 459–481

[19]

Saxe A M, Bansal Y, Dapello J, Advani M, Kolchinsky A, Tracey B D, Cox D D. On the information bottleneck theory of deep learning. In: Proceedings of the 6th International Conference on Learning Representations. 2018

[20]

Yue X, Zheng Z, Zhang S, Gao Y, Darrell T, Keutzer K, Vincentelli A S. Prototypical cross-domain self-supervised learning for few-shot unsupervised domain adaptation. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 13829–13839

[21]

Wang R, Wu Z, Weng Z, Chen J, Qi G J, Jiang Y G . Cross-domain contrastive learning for unsupervised domain adaptation. IEEE Transactions on Multimedia, 2023, 25: 1665–1673

[22]

Yao X, Bai Y, Zhang X, Zhang Y, Sun Q, Chen R, Li R, Yu B. PCL: proxy-based contrastive learning for domain generalization. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 7087–7097

[23]

Zhao H, des Combes R T, Zhang K, Gordon G. On learning invariant representations for domain adaptation. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 7523–7532

[24]

Rosenfeld E, Ravikumar P K, Risteski A. The risks of invariant risk minimization. In: Proceedings of the 9th International Conference on Learning Representations. 2021

[25]

Yang S, Fu K, Yang X, Lin Y, Zhang J, Cheng P . Learning domain-invariant discriminative features for heterogeneous face recognition. IEEE Access, 2020, 8: 209790–209801

[26]

Miao S, Liu M, Li P. Interpretable and generalizable graph learning via stochastic attention mechanism. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 15524–15543

[27]

Rong Y, Huang W, Xu T, Huang J. DropEdge: towards deep graph convolutional networks on node classification. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[28]

Krueger D, Caballero E, Jacobsen J H, Zhang A, Binas J, Zhang D, Le Priol R, Courville A. Out-of-distribution generalization via risk extrapolation (REx). In: Proceedings of the 38th International Conference on Machine Learning. 2021, 5815–5826

[29]

Sagawa S, Koh P W, Hashimoto T B, Liang P. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. In: Proceedings of International Conference on Learning Representations. 2020

[30]

Kong K, Li G, Ding M, Wu Z, Zhu C, Ghanem B, Taylor G, Goldstein T. Robust optimization as data augmentation for large-scale graphs. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 60–69

[31]

Federici M, Tomioka R, Forré P. An information-theoretic approach to distribution shifts. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 1349

[32]

Fan S, Wang X, Mo Y, Shi C, Tang J. Debiasing graph neural networks via learning disentangled causal substructure. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1808

[33]

Sui Y, Wu Q, Wu J, Cui Q, Li L, Zhou J, Wang X, He X. Unleashing the power of graph data augmentation on covariate distribution shift. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 18109–18131

[34]

Yang N, Zeng K, Wu Q, Jia X, Yan J. Learning substructure invariance for out-of-distribution molecular representations. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 12964–12978

[35]

Gui S, Liu M, Li X, Luo Y, Ji S. Joint learning of label and environment causal independence for graph out-of-distribution generalization. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 174

[36]

Zhuang X, Zhang Q, Ding K, Bian Y, Wang X, Lv J, Chen H, Chen H. Learning invariant molecular representation in latent discrete space. In: Proceedings of the 37th International Conference on Neural Information Processing System. 2023, 3429

[37]

Han X, Jiang Z, Liu N, Hu X. G-Mixup: graph data augmentation for graph classification. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 8230–8248

[38]

Chen Y, Bian Y, Zhou K, Xie B, Han B, Cheng J. Does invariant graph learning via environment augmentation learn invariance? In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 3130

[39]

Tishby N, Pereira F C, Bialek W. The information bottleneck method. 2000, arXiv preprint arXiv:physics/0004057

[40]

Fang J, Zhang G, Wang K, Du W, Duan Y, Wu Y, Zimmermann R, Chu X, Liang Y. On regularization for explaining graph neural networks: an information theory perspective. IEEE Transactions on Knowledge and Data Engineering, 2024

[41]

Ye N, Li K, Bai H, Yu R, Hong L, Zhou F, Li Z, Zhu J. OoD-bench: quantifying and understanding two dimensions of out-of-distribution generalization. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 7937–7948

[42]

Du Y, Xu J, Xiong H, Qiu Q, Zhen X, Snoek C G M, Shao L. Learning to learn with variational information bottleneck for domain generalization. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 200–216

[43]

Ahuja K, Caballero E, Zhang D, Gagnon-Audet J C, Bengio Y, Mitliagkas I, Rish I. Invariance principle meets information bottleneck for out-of-distribution generalization. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 263

[44]

Li B, Shen Y, Wang Y, Zhu W, Reed C, Li D, Keutzer K, Zhao H. Invariant information bottleneck for domain generalization. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 7399–7407

[45]

Poole B, Ozair S, Van Den Oord A, Alemi A, Tucker G. On variational bounds of mutual information. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5171–5180

[46]

He K, Fan H, Wu Y, Xie S, Girshick R. Momentum contrast for unsupervised visual representation learning. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 9726–9735

[47]

Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 149

[48]

Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D. Supervised contrastive learning. In: Proceedings of the 34th International Conference on Neural Information Processing System. 2020, 1567

[49]

Tian Y, Sun C, Poole B, Krishnan D, Schmid C, Isola P. What makes for good views for contrastive learning? In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 573

[50]

Hassani K, Ahmadi A H K. Contrastive multi-view representation learning on graphs. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 4116–4126

[51]

Huang W, Yi M, Zhao X, Jiang Z. Towards the generalization of contrastive self-supervised learning. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[52]

Zhang M, Sohoni N S, Zhang H R, Finn C, Re C. Correct-N-contrast: a contrastive approach for improving robustness to spurious correlations. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 26484–26516

[53]

Chen Y, Zhang Y, Bian Y, Yang H, Ma K, Xie B, Liu T, Han B, Cheng J. Learning causally invariant representations for out-of-distribution generalization on graphs. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1608

[54]

Boonlia H, Dam T, Ferdaus M M, Anavatti S G, Mullick A. Improving self-supervised learning for out-of-distribution task via auxiliary classifier. In: 2022 IEEE International Conference on Image Processing. 2022, 3036–3040

[55]

Yang M, Fang Z, Zhang Y, Du Y, Liu F, Ton J F, Wang J, Wang J. Invariant learning via probability of sufficient and necessary causes. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 3496

[56]

Xu M, Wang H, Ni B, Guo H, Tang J. Self-supervised graph-level representation learning with local and global structure. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 11548–11558

[57]

Zhang T, Qiu C, Ke W, Süsstrunk S, Salzmann M. Leverage your local and global representations: a new self-supervised learning strategy. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 16559–16568

[58]

Belghazi M I, Baratin A, Rajeshwar S, Ozair S, Bengio Y, Courville A, Hjelm D. Mutual information neural estimation. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 531–540

[59]

Brody S, Alon U, Yahav E. How attentive are graph attention networks? In: Proceedings of the 10th International Conference on Learning Representations. 2022

[60]

Jing L, Vincent P, LeCun Y, Tian Y. Understanding dimensional collapse in contrastive self-supervised learning. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[61]

Xuan H, Stylianou A, Liu X, Pless R. Hard negative examples are hard, but useful. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 126–142

[62]

Robinson J D, Chuang C Y, Sra S, Jegelka S. Contrastive learning with hard negative samples. In: Proceedings of the 9th International Conference on Learning Representations. 2021

[63]

Gui S, Li X, Wang L, Ji S. GOOD: a graph out-of-distribution benchmark. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 150

[64]

Yang Z, Cohen W W, Salakhutdinov R. Revisiting semi-supervised learning with graph embeddings. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 40–48

[65]

Rozemberczki B, Allen C, Sarkar R . Multi-scale attributed node embedding. Journal of Complex Networks, 2021, 9( 2): cnab014

[66]

Ying Z, Bourgeois D, You J, Zitnik M, Leskovec J. Gnnexplainer: generating explanations for graph neural networks. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 829

[67]

Hu W, Fey M, Zitnik M, Dong Y, Ren H, Liu B, Catasta M, Leskovec J. Open graph benchmark: Datasets for machine learning on graphs. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1855

[68]

Wu Z, Ramsundar B, Feinberg E N, Gomes J, Geniesse C, Pappu A S, Leswing K, Pande V . MoleculeNet: a benchmark for molecular machine learning. Chemical Science, 2018, 9( 2): 513–530

[69]

Peng Z, Huang W, Luo M, Zheng Q, Rong Y, Xu T, Huang J. Graph representation learning via graphical mutual information maximization. In: Proceedings of Web Conference 2020. 2020, 259–270

[70]

Sun F Y, Hoffmann J, Verma V, Tang J. InfoGraph: unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[71]

Hafidi H, Ghogho M, Ciblat P, Swami A. GraphCL: contrastive self-supervised learning of graph representations. 2020, arXiv preprint arXiv: 2007.08025

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (1929KB)

Supplementary files

Highlights

579

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/