Robust Transfer Regression with Corrupted Labels

Sheng Pan

doi:10.1007/s40304-025-00473-2

Communications in Mathematics and Statistics ›› :1 -41. DOI: 10.1007/s40304-025-00473-2

Article

research-article

Robust Transfer Regression with Corrupted Labels

Sheng Pan ¹^,^a

Author information +

History +

PDF

Abstract

In this paper, we introduce a robust transfer regression method designed to handle corrupted labels in target data, under the scenarios that the corruption affects a substantial portion of the labels and the locations of these corruptions are unknown. Our theoretical analysis decomposes the estimation error into three interpretable components: (1) source data, (2) domain shift, and (3) label corruption. This framework guarantees that our method consistently outperforms target-only estimation. We validate our method through numerical experiments focused on reconstructing corrupted compressed signals, showing robustness even when a high fraction of labels are corrupted, especially when some source data exhibit structural similarities to the target data. Additionally, we apply our method to analyze the association between O6-methylguanine-DNA methyltransferase (MGMT) methylation and gene expression in glioblastoma (GBM) patients.

Keywords

Robust transfer regression / Adversarial corruption / Lasso / High-dimensional / Signal recovery / 62F35 / 68T05 / 62J07

Cite this article

Download citation ▾

Sheng Pan. Robust Transfer Regression with Corrupted Labels. Communications in Mathematics and Statistics 1-41 DOI:10.1007/s40304-025-00473-2

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Ackerman B, Gan RW, Meyer CS, et al.. Measurement error and bias in real-world oncology endpoints when constructing external control arms. Front. Drug Safety Regul., 2024, 4: 1423493.

[2]	Barres BA. The mystery and magic of glia: a perspective on their roles in health and disease. Neuron, 2008, 60(3): 430-440.

[3]	Bastani H. Predicting with proxies: transfer learning in high dimension. Manage. Sci., 2021, 67(5): 2964-2984.

[4]	Bayati, M., Erdogdu, MA., Montanari, A.: Estimating lasso risk and noise level. Adv. Neural Inform. Process. Syst. 26 (2013)

[5]	Boyd, S.: Convex Optimization. Cambridge, UP (2004)

[6]	Boyd S, Parikh N, Chu E, et al.. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Machine Learn., 2011, 3(1): 1-122.

[7]	Cai, T., Li, M., Liu, M.: Semi-supervised triply robust inductive transfer learning. J. Am. Stat. Assoc. pp 1–11 (2024)

[8]	Couillet R, Debbah M. Random Matrix Methods for Wireless Communications, 2011Cambridge University Press.

[9]	Descloux P, Boyer C, Josse J, et al.. Robust lasso-zero for sparse corruption and model selection with missing covariates. Scand. J. Stat., 2022, 49(4): 1605-1635.

[10]	Fan, J., Li, R., Zhang, CH., et al.: Statistical Foundations of Data Science. Chapman and Hall/CRC (2020)

[11]	Geoffrey C, Guillaume L, Matthieu L. Robust high dimensional learning for Lipschitz and convex losses. J. Mach. Learn. Res., 2020, 21(233): 1-47

[12]	Gretton A, Borgwardt KM, Rasch MJ, et al.. A kernel two-sample test. J. Mach. Learn. Res., 2012, 13(1): 723-773

[13]	Hastie T, Tibshirani R, Wainwright M. Statistical learning with sparsity. Monogr. Stat. Appl. Prob., 2015, 143(143): 8

[14]	Haupt J, Bajwa WU, Rabbat M, et al.. Compressed sensing for networked data. IEEE Signal Process. Mag., 2008, 25(2): 92-101.

[15]	Holland MJ, Ikeda K. Efficient learning with robust gradient descent. Mach. Learn., 2019, 108: 1523-1560.

[16]	Jordan MI, Lee JD, Yang Y. Communication-efficient distributed statistical inference. J. Am. Stat. Assoc., 2019, 114(526): 668-681.

[17]	Joshua F, McMichael E. Comprehensive molecular portraits of human breast tumours. Nature, 2012, 490(7418): 61-70.

[18]	Lecu’e, G., Lerasle, M.: Robust machine learning by median-of-means: theory and practice. Ann. Stat. (2017) https://api.semanticscholar.org/CorpusID:67123033

[19]	Lecué G, Lerasle M. Learning from mom’s principles: Le cam’s approach. Stochastic Processes Appl., 2019, 129(11): 4385-4410.

[20]	Li S, Cai TT, Li H. Transfer learning for high-dimensional linear regression: prediction, estimation and minimax optimality. J. R. Stat. Soc. Ser. B Stat Methodol., 2022, 84(1): 149-173.

[21]	Li S, Zhang L, Cai TT, et al.. Estimation and inference for high-dimensional generalized linear models with knowledge transfer. J. Am. Stat. Assoc., 2024, 119(546): 1274-1285.

[22]	Li Y, Zhu J. L 1-norm quantile regression. J. Comput. Graph. Stat., 2008, 17(1): 163-185.

[23]	Liu, L., Li, T., Caramanis, C.: High Dimensional Robust Estimation of Sparse Models Via Trimmed Hard Thresholding. (2019) arXiv preprint arXiv:1901.08237

[24]	Long, M., Wang, J., Ding, G., et al.: Transfer feature learning with joint distribution adaptation. In: Proceedings of the IEEE international conference on computer vision, pp 2200–2207 (2013)

[25]	Massagué J. Tgf$\beta $ in cancer. Cell, 2008, 134(2): 215-230.

[26]	Nguyen NH, Tran TD. Robust lasso with missing and grossly corrupted observations. IEEE Trans. Inf. Theory, 2012, 59(4): 2036-2058.

[27]	Pan SJ, Tsang IW, Kwok JT, et al.. Domain adaptation via transfer component analysis. IEEE Trans. Neural Networks, 2010, 22(2): 199-210.

[28]	Raskutti G, Wainwright MJ, Yu B. Restricted eigenvalue properties for correlated gaussian designs. J. Mach. Learn. Res., 2010, 11: 2241-2259

[29]	Sezgin M, Bl S. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging, 2004, 13(1): 146-168.

[30]	She Y, Owen AB. Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc., 2011, 106(494): 626-639.

[31]	Sofroniew MV, Vinters HV. Astrocytes: biology and pathology. Acta Neuropathol., 2010, 119: 7-35.

[32]	Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat Methodol., 1996, 58(1): 267-288.

[33]	Tibshirani RJ. The lasso problem and uniqueness. Electron. J. Stat., 2012, 7: 1456-1490. https://api.semanticscholar.org/CorpusID:5849668.

[34]	Wainwright MJ. Sharp thresholds for high-dimensional and noisy sparsity recovery using l1 constrained quadratic programming lasso. IEEE Trans. Inf. Theory, 2009, 55(5): 2183-2202.

[35]	Wang, J., Kolar, M., Srebro, N., et al.: Efficient distributed learning with sparsity. In: International conference on machine learning, PMLR, pp 3636–3645 (2017)

[36]	Zhao J, Liu C, Niu L, et al.. Multiple influential point detection in high dimensional regression spaces. J. R. Stat. Soc. Ser. B Stat Methodol., 2019, 81(2): 385-408.