Mitigating scale imbalance and conflicting gradients in deep multi-task learning

Yuepeng JIANG , Yunhao GOU , Wenbo ZHANG , Xuehao WANG , Yu ZHANG , Qiang YANG

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (2) : 2002318

PDF (1106KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (2) : 2002318 DOI: 10.1007/s11704-024-40632-2
Artificial Intelligence
RESEARCH ARTICLE

Mitigating scale imbalance and conflicting gradients in deep multi-task learning

Author information +
History +
PDF (1106KB)

Abstract

While deep learning systems demonstrate good performance in many fields such as computer vision, natural language processing, and computational biology, time and data efficiency still remain as two major challenges. Deep multi-task learning, in which one network produces predictive outputs for multiple tasks, has emerged as a promising approach with fast inference and good performance. However, how to balance the learning of each individual task is difficult in deep multi-task learning. In this paper, we present a combinational method called POMSI to project conflicting gradients and mitigate the scale imbalance in multi-task learning. The proposed POMSI method can be trained end-to-end with all kinds of losses without any distributional assumption. Moreover, the POMSI model is model-agnostic and can be applied to existing multi-task architectures for further enhancement. Through extensive experiments on benchmark datasets, the proposed POMSI method achieves substantial gains in the performance compared with state-of-the-art methods.

Graphical abstract

Keywords

artificial intelligence / machine learning / multi-task learning

Cite this article

Download citation ▾
Yuepeng JIANG, Yunhao GOU, Wenbo ZHANG, Xuehao WANG, Yu ZHANG, Qiang YANG. Mitigating scale imbalance and conflicting gradients in deep multi-task learning. Front. Comput. Sci., 2026, 20(2): 2002318 DOI:10.1007/s11704-024-40632-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Buetti-Dinh A, Galli V, Bellenberg S, Ilie O, Herold M, Christel S, Boretska M, Pivkin I V, Wilmes P, Sand W, Vera M, Dopson M . Deep neural networks outperform human expert’s capacity in characterizing bioleaching bacterial biofilm composition. Biotechnology Reports, 2019, 22: e00321

[2]

O’Toole A J, Phillips P J, Jiang F, Ayyad J, Penard N, Abdi H . Face recognition algorithms surpass humans matching faces over changes in illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29( 9): 1642–1646

[3]

Liu S K, Johns E, Davison A J. End-to-end multi-task learning with attention. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, 1871−1880

[4]

Dai Z, Jiang Y, Li Y, Liu B, Chan A B, Vasconcelos N. BEV-Net: assessing social distancing compliance by joint people localization and geometric reasoning. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 5381–5391

[5]

Chen S, Zhang Y, Yang Q . Multi-task learning in natural language processing: an overview. ACM Computing Surveys, 2024, 56( 12): 295

[6]

Zhang Y, Yang Q . A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 2022, 34( 12): 5586–5609

[7]

Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L . Multi-task learning for dense prediction tasks: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 7): 3614–3633

[8]

Ruder S. An overview of multi-task learning in deep neural networks. 2017, arXiv preprint arXiv: 1706.05098

[9]

Yu T, Kumar S, Gupta A, Levine S, Hausman K, Finn C. Gradient surgery for multi-task learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 489

[10]

Chen Z, Badrinarayanan V, Lee C-Y, Rabinovich A. GradNorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 794–803

[11]

Teichmann M, Weber M, Zöllner J, Cipolla R, Urtasun R. MultiNet: Real-time joint semantic reasoning for autonomous driving. In: Proceedings of 2018 IEEE Intelligent Vehicles Symposium. 2018, 1013−1020

[12]

Misra I, Shrivastava A, Gupta A, Hebert M. Cross-stitch networks for multi-task learning. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, 3994−4003

[13]

Cipolla R, Gal Y, Kendall A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7482−7491

[14]

Liu S, Liang Y, Gitter A. Loss-balanced task weighting to reduce negative transfer in multi-task learning. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 9977−9978

[15]

Leang I, Sistu G, Bürger F, Bursuc A, Yogamani S K. Dynamic task weighting methods for multi-task networks in autonomous driving systems. In: Proceedings of the 23rd IEEE International Conference on Intelligent Transportation Systems. 2020, 1−8

[16]

Wang Z, Tsvetkov Y, Firat O, Cao Y. Gradient vaccine: investigating and improving multi-task optimization in massively multilingual models. In: Proceedings of the 9th International Conference on Learning Representations. 2021, 1−22

[17]

Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, 3213–3223

[18]

Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. In: Proceedings of the 12th European Conference on Computer Vision-ECCV 2012. 2012, 746–760

[19]

Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A . The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88( 2): 303–338

[20]

Maninis K K, Radosavovic I, Kokkinos I. Attentive single-tasking of multiple tasks. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 1851−1860

[21]

Badrinarayanan V, Kendall A, Cipolla R . SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39( 12): 2481–2495

[22]

Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder-decoder with Atrous separable convolution for semantic image segmentation. In: Proceedings of the 15th European Conference on Computer Vision-ECCV 2018. 2018, 833–851

[23]

Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of 2015 IEEE International Conference on Computer Vision. 2015, 2650–2658

[24]

Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2014, 13

[25]

Chen L-C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. 2017, arXiv preprint arXiv: 1706.05587

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (1106KB)

Supplementary files

Highlights

1008

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/