Data complexity-based batch sanitization method against poison in distributed learning

Silv Wang; Kai Fan; Kuan Zhang; Hui Li; Yintang Yang

doi:10.1016/j.dcan.2022.12.001

›› 2024, Vol. 10 ›› Issue (2) :416 -428. DOI: 10.1016/j.dcan.2022.12.001

Research article

research-article

Data complexity-based batch sanitization method against poison in distributed learning

Author information +

History +

PDF

Abstract

The security of Federated Learning (FL)/Distributed Machine Learning (DML) is gravely threatened by data poisoning attacks, which destroy the usability of the model by contaminating training samples, so such attacks are called causative availability indiscriminate attacks. Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations, we propose a new supervised batch detection method for poison, which can fleetly sanitize the training dataset before the local model training. We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model, which will be used in an efficient batch hierarchical detection process. Our model stockpiles knowledge about poison, which can be expanded by retraining to adapt to new attacks. Being neither attack-specific nor scenario-specific, our method is applicable to FL/DML or other online or offline scenarios.

Keywords

Distributed machine learning security / Federated learning / Data poisoning attacks / Data sanitization / Batch detection / Data complexity

Cite this article

Download citation ▾

Silv Wang, Kai Fan, Kuan Zhang, Hui Li, Yintang Yang. Data complexity-based batch sanitization method against poison in distributed learning. , 2024, 10(2): 416-428 DOI:10.1016/j.dcan.2022.12.001

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	R. Shokri, V. Shmatikov, Privacy-preserving deep learning, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing, ACM, 2015, pp. 909-910.

[2]	Q. Yang, Y. Liu, T. Chen, Y. Tong, Federated machine learning: concept and applications, ACM Trans. Intell. Syst. Technol. 10 (2)(2019)1-19.

[3]	V. Mothukuri, R.M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, G. Srivastava, A survey on security and privacy of federated learning, Future Generat. Comput. Syst. 115 (2021) 619-640.

[4]	M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, B. Li, Manipulating machine learning: poisoning attacks and countermeasures for regression learning, 2018 IEEE Symposium on Security and Privacy(SP), IEEE, 2018, pp. 19-35.

[5]	L. Melis, C. Song, E. De Cristofaro, V. Shmatikov, in: Exploiting unintended feature leakage in collaborative learning, IEEE, 2019, pp. 691-706.

[6]	J. Zhang, J. Chen, D. Wu, B. Chen, S. Yu, in: Poisoning attack in federated learning using generative adversarial nets, IEEE, 2019, pp. 374-380.

[7]	J. Hayes, O. Ohrimenko, Contamination attacks and mitigation in multi-party machine learning, Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, ACM, 2018, pp. 6604-6616.

[8]	E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, V. Shmatikov, How to backdoor federated learning, in: S. Chiappa, R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artiﬁcial Intelligence and Statistics,Proceedings of Machine Learning Research, 2020, pp. 2938-2948.

[9]	B. Biggio, B. Nelson, P. Laskov,Poisoning attacks against support vector machines, in:Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, 2012, pp. 1467-1474.

[10]

L. Muñoz González, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C. Lupu, F. Roli, Towards poisoning of deep learning algorithms with back-gradient optimization, in: Proceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, AISec ’17, Association for Computing Machinery, ACM, 2017, pp. 27-38.

[11]

J. Feng, Q.-Z. Cai, Z.-H. Zhou, Learning to confuse: generating training time adversarial data with auto-encoder,in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, R. Garnett (Advances in Neural Information Processing Systems,Eds.), 2019, https://doi.org/10.48550/arXiv.1905.09027.

[12]	B. Biggio, B. Nelson, P. Laskov, Support vector machines under adversarial label noise, J. Mach. Learn. Res. 20 (3) (2011) 97-112.

[13]	H. Xiao, H. Xiao, C. Eckert, Adversarial label flips attack on support vector machines, in: Proceedings of the 20th European Conference on Artiﬁcial Intelligence, ECAI’12, IOS Press, NLD, 2012, pp. 870-875.

[14]

A. Paudice, L. Muñoz-González, E.C. Lupu, Label sanitization against label flipping poisoning attacks, in: C. Alzate, A. Monreale, H. Assem, A. Bifet, T.S. Buda, B. Caglayan, B. Drury, E. García-Martín, R. Gavaldá, I. Koprinska, S. Kramer, N. Lavesson, M. Madden, I. Molloy, M.-I. Nicolae, M. Sinn (ECML PKDD Eds.), 2018 Workshops, Springer International Publishing, Cham, 2019, pp. 5-15.

[15]	M. Fang, X. Cao, J. Jia, N.Z. Gong, Local Model Poisoning Attacks to Byzantine-Robust Federated Learning, USENIX Association, ACM, 2020.

[16]	V. Shejwalkar, A. Houmansadr, in: Manipulating the Byzantine: Optimizing Model Poisoning Attacks and Defenses for Federated Learning, NDSS, 2021, https://doi.org/10.14722/ndss.2021.24498.

[17]

V. Tolpegin, S. Truex, M.E. Gursoy, L. Liu,Data poisoning attacks against federated learning systems, in:Computer Security - ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford,2020,pp.14-18, Proceedings, Part I, Springer-Verlag, Berlin, 2020, pp. 480-501.

[18]	J. Zhang, B. Chen, X. Cheng, H.T.T. Binh, S. Yu, Poisongan: generative poisoning attacks against federated learning in edge computing systems, IEEE Internet Things J. 8 (5) (2021) 3310-3322.

[19]	G. Sun, Y. Cong, J. Dong, Q. Wang, L. Lyu, J. Liu, Data poisoning attacks on federated machine learning, IEEE Internet Things J. 9(13)(2021) 1365-11374.

[20]	S. Weerasinghe, T. Alpcan, S.M. Erfani, C. Leckie, Defending support vector machines against data poisoning attacks, IEEE Trans. Inf. Forensics Secur. 16 (2021) 2566-2578.

[21]	P. Blanchard, E.M. El Mhamdi, R. Guerraoui, J. Stainer, Machine learning with adversaries: byzantine tolerant gradient descent,in:Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Curran Associates Inc., ACM, 2017, pp. 118-128.

[22]	E.M. El Mhamdi, R. Guerraoui, S. Rouault,The hidden vulnerability of distributed learning in byzantium, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, 2018, pp. 3521-3530.

[23]	D. Yin, Y. Chen, R. Kannan, P. Bartlett,Byzantine-robust distributed learning: towards optimal statistical rates,in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, 2018, pp. 5650-5659.

[24]	L. Muñoz-González, K. Co, E. Lupu, Byzantine-robust Federated Machine Learning through Adaptive Model Averaging, DOI:10.48550/arXiv.1909.05125.

[25]	X. Cao, M. Fang, J. Liu, N. Gong, in: Fltrust: Byzantine-Robust Federated Learning via Trust Bootstrapping, Network and Distributed System Security Symposium, 2021, https://doi.org/10.14722/ndss.2021.24434.

[26]	M. Barreno, B. Nelson, R. Sears, A.D. Joseph, J.D. Tygar, Can machine learning be secure, in: Proceedings of the ACM Symposium on Information, Computer, and Communication Security (ASIACCS), ACM, 2006, pp. 16-25.

[27]	L. Ma, Q. Pei, L. Zhou, H. Zhu, L. Wang, Y. Ji, Federated data cleaning: collaborative and privacy-preserving data cleaning for edge intelligence, IEEE Internet Things J. 8 (8) (2021) 6757-6770.

[28]	U. Saini, Machine Learning in the Presence of an Adversary: Attacking and Defending the Spambayes Spam Filter, Master's Thesis, EECS Department, University of California, Berkeley, May 2008.

[29]	B. Nelson, M. Barreno, F.J. Chi, A.D. Joseph, B.I.P. Rubinstein, U. Saini, C. Sutton, J. D. Tygar, K. Xia, Exploiting machine learning to subvert your spam ﬁlter, in: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, LEET’08, USENIX Association, USA, 2008,16-17.

[30]	N. Baracaldo, B. Chen, H. Ludwig, A. Safavi, R. Zhang, Detecting poisoning attacks on machine learning in iot environments, in: Detecting poisoning attacks on machine learning in iot environments, IEEE, 2018, pp. 57-64.

[31]	T. Chiba, Y. Sei, Y. TaHara, A. Ohsuga, A defense method against poisoning attacks on iot machine learning using poisonous data, in: 2020 IEEE Third International Conference on Artiﬁcial Intelligence and Knowledge Engineering, IEEE, 2020,100-107.

[32]	R. Doku, D.B. Rawat, Mitigating data poisoning attacks on a federated learning-edge computing network, in: 2021 IEEE 18th Annual Consumer Communications Networking Conference, IEEE, 2021, pp. 1-6.

[33]	G.F. Cretu, A. Stavrou, M.E. Locasto, S.J. Stolfo, A.D. Keromytis, Casting out demons: sanitizing training data for anomaly sensors, in: 2008 IEEE Symposium on Security and Privacy, IEEE, 2008, pp. 81-95.

[34]	P. Chan, Z.M. He, H. Li, C.C. Hsu, Data sanitization against adversarial label contamination based on data complexity, Int. J. Machine Learn. Cybernet. 9 (6)(2018) 1039-1052.

[35]	T.K. Ho, M. Basu, Complexity measures of supervised classiﬁcation problems, IEEE Trans. Pattern Anal. Mach. Intell. 24 (3) (2002) 289-300.

[36]	F. Smith, Pattern classiﬁer design by linear programming, IEEE Trans. Comp C. 17 (4) (1968) 367-372.

[37]	D. Dua, C. Graff,UCI machine learning repository URL 2017. http://archive.ics.uci.edu/ml, 2021 (accessed 13 Oct. 2021).

[38]	J. Chen, X. Zhang, R. Zhang, C. Wang, L. Liu, De-pois: an attack-agnostic defense against data poisoning attacks, IEEE Trans. Inf. Forensics Secur. 16 (2021) 3412-3425.

[39]	C. Sitawarin, D. Wagner, On the robustness of deep k-nearest neighbors, in: 2019 IEEE Security and Privacy Workshops (SPW), IEEE, 2019, pp. 1-7.

[40]

A. Shafahi, W.R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, T. Goldstein, Poison frogs! targeted clean-label poisoning attacks on neural networks, in: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, Curran Associates Inc., 2018, pp. 6106-6116.