In this paper, we introduce a robust transfer regression method designed to handle corrupted labels in target data, under the scenarios that the corruption affects a substantial portion of the labels and the locations of these corruptions are unknown. Our theoretical analysis decomposes the estimation error into three interpretable components: (1) source data, (2) domain shift, and (3) label corruption. This framework guarantees that our method consistently outperforms target-only estimation. We validate our method through numerical experiments focused on reconstructing corrupted compressed signals, showing robustness even when a high fraction of labels are corrupted, especially when some source data exhibit structural similarities to the target data. Additionally, we apply our method to analyze the association between O6-methylguanine-DNA methyltransferase (MGMT) methylation and gene expression in glioblastoma (GBM) patients.
| [1] |
Ackerman B, Gan RW, Meyer CS, et al.. Measurement error and bias in real-world oncology endpoints when constructing external control arms. Front. Drug Safety Regul., 2024, 4: 1423493.
|
| [2] |
Barres BA. The mystery and magic of glia: a perspective on their roles in health and disease. Neuron, 2008, 60(3): 430-440.
|
| [3] |
Bastani H. Predicting with proxies: transfer learning in high dimension. Manage. Sci., 2021, 67(5): 2964-2984.
|
| [4] |
Bayati, M., Erdogdu, MA., Montanari, A.: Estimating lasso risk and noise level. Adv. Neural Inform. Process. Syst. 26 (2013)
|
| [5] |
Boyd, S.: Convex Optimization. Cambridge, UP (2004)
|
| [6] |
Boyd S, Parikh N, Chu E, et al.. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Machine Learn., 2011, 3(1): 1-122.
|
| [7] |
Cai, T., Li, M., Liu, M.: Semi-supervised triply robust inductive transfer learning. J. Am. Stat. Assoc. pp 1–11 (2024)
|
| [8] |
Couillet R, Debbah M. Random Matrix Methods for Wireless Communications, 2011Cambridge University Press.
|
| [9] |
Descloux P, Boyer C, Josse J, et al.. Robust lasso-zero for sparse corruption and model selection with missing covariates. Scand. J. Stat., 2022, 49(4): 1605-1635.
|
| [10] |
Fan, J., Li, R., Zhang, CH., et al.: Statistical Foundations of Data Science. Chapman and Hall/CRC (2020)
|
| [11] |
Geoffrey C, Guillaume L, Matthieu L. Robust high dimensional learning for Lipschitz and convex losses. J. Mach. Learn. Res., 2020, 21(233): 1-47
|
| [12] |
Gretton A, Borgwardt KM, Rasch MJ, et al.. A kernel two-sample test. J. Mach. Learn. Res., 2012, 13(1): 723-773
|
| [13] |
Hastie T, Tibshirani R, Wainwright M. Statistical learning with sparsity. Monogr. Stat. Appl. Prob., 2015, 143(143): 8
|
| [14] |
Haupt J, Bajwa WU, Rabbat M, et al.. Compressed sensing for networked data. IEEE Signal Process. Mag., 2008, 25(2): 92-101.
|
| [15] |
Holland MJ, Ikeda K. Efficient learning with robust gradient descent. Mach. Learn., 2019, 108: 1523-1560.
|
| [16] |
Jordan MI, Lee JD, Yang Y. Communication-efficient distributed statistical inference. J. Am. Stat. Assoc., 2019, 114(526): 668-681.
|
| [17] |
Joshua F, McMichael E. Comprehensive molecular portraits of human breast tumours. Nature, 2012, 490(7418): 61-70.
|
| [18] |
Lecu’e, G., Lerasle, M.: Robust machine learning by median-of-means: theory and practice. Ann. Stat. (2017) https://api.semanticscholar.org/CorpusID:67123033
|
| [19] |
Lecué G, Lerasle M. Learning from mom’s principles: Le cam’s approach. Stochastic Processes Appl., 2019, 129(11): 4385-4410.
|
| [20] |
Li S, Cai TT, Li H. Transfer learning for high-dimensional linear regression: prediction, estimation and minimax optimality. J. R. Stat. Soc. Ser. B Stat Methodol., 2022, 84(1): 149-173.
|
| [21] |
Li S, Zhang L, Cai TT, et al.. Estimation and inference for high-dimensional generalized linear models with knowledge transfer. J. Am. Stat. Assoc., 2024, 119(546): 1274-1285.
|
| [22] |
Li Y, Zhu J. L 1-norm quantile regression. J. Comput. Graph. Stat., 2008, 17(1): 163-185.
|
| [23] |
Liu, L., Li, T., Caramanis, C.: High Dimensional Robust Estimation of Sparse Models Via Trimmed Hard Thresholding. (2019) arXiv preprint arXiv:1901.08237
|
| [24] |
Long, M., Wang, J., Ding, G., et al.: Transfer feature learning with joint distribution adaptation. In: Proceedings of the IEEE international conference on computer vision, pp 2200–2207 (2013)
|
| [25] |
Massagué J. Tgf$\beta $ in cancer. Cell, 2008, 134(2): 215-230.
|
| [26] |
Nguyen NH, Tran TD. Robust lasso with missing and grossly corrupted observations. IEEE Trans. Inf. Theory, 2012, 59(4): 2036-2058.
|
| [27] |
Pan SJ, Tsang IW, Kwok JT, et al.. Domain adaptation via transfer component analysis. IEEE Trans. Neural Networks, 2010, 22(2): 199-210.
|
| [28] |
Raskutti G, Wainwright MJ, Yu B. Restricted eigenvalue properties for correlated gaussian designs. J. Mach. Learn. Res., 2010, 11: 2241-2259
|
| [29] |
Sezgin M, Bl S. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging, 2004, 13(1): 146-168.
|
| [30] |
She Y, Owen AB. Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc., 2011, 106(494): 626-639.
|
| [31] |
Sofroniew MV, Vinters HV. Astrocytes: biology and pathology. Acta Neuropathol., 2010, 119: 7-35.
|
| [32] |
Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat Methodol., 1996, 58(1): 267-288.
|
| [33] |
Tibshirani RJ. The lasso problem and uniqueness. Electron. J. Stat., 2012, 7: 1456-1490. https://api.semanticscholar.org/CorpusID:5849668.
|
| [34] |
Wainwright MJ. Sharp thresholds for high-dimensional and noisy sparsity recovery using l1 constrained quadratic programming lasso. IEEE Trans. Inf. Theory, 2009, 55(5): 2183-2202.
|
| [35] |
Wang, J., Kolar, M., Srebro, N., et al.: Efficient distributed learning with sparsity. In: International conference on machine learning, PMLR, pp 3636–3645 (2017)
|
| [36] |
Zhao J, Liu C, Niu L, et al.. Multiple influential point detection in high dimensional regression spaces. J. R. Stat. Soc. Ser. B Stat Methodol., 2019, 81(2): 385-408.
|
Funding
the National Key R&D Program of China(102022YFA1003701)
RIGHTS & PERMISSIONS
School of Mathematical Sciences, University of Science and Technology of China and Springer-Verlag GmbH Germany, part of Springer Nature