Enhancing data quality with effective feature selection and privacy protection

Lu-Yao WANG , Zhu-Sen LIU , Qi FENG , Wei-Bin WU , Lu ZHOU

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (3) : 2003607

PDF (2980KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (3) : 2003607 DOI: 10.1007/s11704-025-41074-0
Information Systems
RESEARCH ARTICLE

Enhancing data quality with effective feature selection and privacy protection

Author information +
History +
PDF (2980KB)

Abstract

Privacy-preserving feature selection allows identifying more important features while ensuring data privacy, thus enhancing data quality. Secure multiparty computation (MPC) is a cryptographic method that allows effective data processing without a trusted third party. However, most MPC-based feature selection schemes overlook the correlation between features and perform poorly for model training when handling datasets containing both numerical and categorical attributes. This paper proposes a feature selection scheme, MPC-Relief, to select the relevant features while preserving privacy. To achieve safety under MPC, we transform all complex computational steps from data-dependent to data-oblivious with faithful implementations. In detail, we construct bidirectional vectors to partition subsets and propose an MPC-based nonlinear function, MN-Ramp, to calculate the difference between mixed attributes. Besides, we apply a mapping method for the distance calculation to eliminate the need for conditional judgments. We evaluate the computational and communication overhead of the MN-Ramp function in both WAN and LAN environments and validate its effectiveness across various datasets. The comparative analysis demonstrated that our scheme achieves up to an 18% accuracy improvement over other schemes when handling nonlinear datasets. The results of the classification task based on the selected features indicate that our scheme notably enhances the performance of subsequent models while ensuring strong privacy security guarantees.

Graphical abstract

Keywords

feature selection / secure multiparty computation / machine learning as a service / secret sharing

Cite this article

Download citation ▾
Lu-Yao WANG, Zhu-Sen LIU, Qi FENG, Wei-Bin WU, Lu ZHOU. Enhancing data quality with effective feature selection and privacy protection. Front. Comput. Sci., 2026, 20(3): 2003607 DOI:10.1007/s11704-025-41074-0

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Hesamifard E, Takabi H, Ghasemi M, Wright R N . Privacy-preserving machine learning as a service. Proceedings on Privacy Enhancing Technologies, 2018, 2018( 3): 123–142

[2]

Li X, Dowsley R, de Cock M. Privacy-preserving feature selection with secure multiparty computation. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 6326–6336

[3]

Pan F, Meng D, Zhang Y, Li H, Li X. Secure federated feature selection for cross-feature federated learning. In: Proceedings of the Neural Information Processing Systems Workshop on Scalability, Privacy, and Security in Federated Learning (SpicyFL). 2020, 1–12

[4]

Liu X, Zhao X, Xia Z, Feng Q, Yu P, Weng J . Secure outsourced sift: accurate and efficient privacy-preserving image SIFT feature extraction. IEEE Transactions on Image Processing, 2023, 32: 4635–4648

[5]

Rao V, Long Y, Eldardiry H, Rane S, Rossi R, Torres F. Secure two-party feature selection. 2019, arXiv preprint arXiv: 1901.00832

[6]

Le T T, Simmons W K, Misaki M, Bodurka J, White B C, Savitz J, McKinney B A . Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests. Bioinformatics, 2017, 33( 18): 2906–2913

[7]

Li A, Huang J, Jia J, Peng H, Zhang L, Tuan L A, Yu H, Li X Y . Efficient and privacy-preserving feature importance-based vertical federated learning. IEEE Transactions on Mobile Computing, 2024, 23( 6): 7238–7255

[8]

Yao A C. Protocols for secure computations. In: Proceedings of 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982). 1982, 160–164

[9]

Wagh S, Gupta D, Chandran N . SecureNN: 3-party secure computation for neural network training. Proceedings on Privacy Enhancing Technologies, 2019, 2019( 3): 26–49

[10]

Beimel A. Secret-sharing schemes: a survey. In: Proceedings of the Third International Workshop on Coding and Cryptology. 2011, 11–46

[11]

Yadav V K, Andola N, Verma S, Venkatesan S . A survey of oblivious transfer protocol. ACM Computing Surveys (CSUR), 2021, 54( 10s): 211

[12]

Bellare M, Hoang V T, Rogaway P. Foundations of garbled circuits. In: Proceedings of 2012 ACM Conference on Computer and Communications Security. 2012, 784–796

[13]

Xie X R, Yuan M J, Bai X T, Gao W, Zhou Z H. On the gini-impurity preservation for privacy random forests. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 1953

[14]

Wu N, Li H, Chen H, Zhang X, Wang Y. Privacy-preserving feature selection based on mutual information. In: Proceedings of 2023 IEEE Global Communications Conference. 2023, 1789–1794

[15]

Akavia A, Galili B, Shaul H, Weiss M, Yakhini Z . Privacy preserving feature selection for sparse linear regression. Proceedings on Privacy Enhancing Technologies, 2024, 2024( 1): 300–313

[16]

Abspoel M, Escudero D, Volgushev N . Secure training of decision trees with continuous attributes. Proceedings on Privacy Enhancing Technologies, 2021, 2021( 1): 167–187

[17]

Urbanowicz R J, Meeker M, La Cava W, Olson R S, Moore J H . Relief-based feature selection: Introduction and review. Journal of Biomedical Informatics, 2018, 85: 189–203

[18]

Hamacher K, Kussel T, Schneider T, Tkachenko O. PEA: practical private epistasis analysis using MPC. In: Proceedings of the 27th European Symposium on Research in Computer Security. 2022, 320–339

[19]

Palma-Mendoza R J, Rodriguez D, De-Marcos L . Distributed ReliefF-based feature selection in spark. Knowledge and Information Systems, 2018, 57( 1): 1–20

[20]

Hong S J . Use of contextual information for feature ranking and discretization. IEEE Transactions on Knowledge and Data Engineering, 1997, 9( 5): 718–730

[21]

Robnik-Sikonja M, Kononenko I . Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 2003, 53( 1-2): 23–69

[22]

Hamada K, Kikuchi R, Ikarashi D, Chida K, Takahashi K. Practically efficient multi-party sorting protocols from comparison sort algorithms. In: Proceedings of the 15th International Conference on Information Security and Cryptology. 2012, 202–216

[23]

Li J, Cheng K, Wang S, Morstatter F, Trevino R P, Tang J, Liu H . Feature selection: a data perspective. ACM Computing Surveys (CSUR), 2018, 50( 6): 94

[24]

Ono S, Takata J, Kataoka M, I T, Shin K, Sakamoto H . Privacy-preserving feature selection with fully homomorphic encryption. Algorithms, 2022, 15( 7): 229

[25]

Li A, Peng H, Zhang L, Huang J, Guo Q, Yu H, Liu Y. FedSDG-FS: efficient and secure feature selection for vertical federated learning. In: Proceedings of IEEE INFOCOM 2023 - IEEE Conference on Computer Communications. 2023, 1–10

[26]

Zhang R, Li H, Hao M, Chen H, Zhang Y. Secure feature selection for vertical federated learning in ehealth systems. In: Proceedings of IEEE International Conference on Communications. 2022, 1257–1262

[27]

Liu X, Li H, Xu G, Zhang X, Zhang T, Zhou J . Secure and lightweight feature selection for horizontal federated learning. IEEE Transactions on Information Forensics and Security, 2025, 20: 1487–1502

[28]

Hu Y, Zhang Y, Gong D, Sun X . Multiparticipant federated feature selection algorithm with particle swarm optimization for imbalanced data under privacy protection. IEEE Transactions on Artificial Intelligence, 2023, 4( 5): 1002–1016

[29]

Shamir A . How to share a secret. Communications of the ACM, 1979, 22( 11): 612–613

[30]

Beaver D. Efficient multiparty protocols using circuit randomization. In: Proceedings of the 11th Annual International Cryptology Conference on Advances in Cryptology. 1991, 420–432

[31]

Knott B, Venkataraman S, Hannun A, Sengupta S, Ibrahim M, van der Maaten L. CRYPTEN: secure multi-party computation meets machine learning. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 379

[32]

Demmler D, Schneider T, Zohner M. ABY - A framework for efficient mixed-protocol secure two-party computation. In: Proceedings of the 22nd Annual Network and Distributed System Security Symposium. 2015

[33]

Nishide T, Ohta K. Multiparty computation for interval, equality, and comparison without bit-decomposition protocol. In: Proceedings of the 10th International Conference on Practice and Theory in Public-Key Cryptography. 2007, 343–360

[34]

Lindell Y. How to simulate it–a tutorial on the simulation proof technique. In: Lindell Y, ed. Tutorials on the Foundations of Cryptography. Cham: Springer, 2017, 277–346

[35]

Araki T, Furukawa J, Lindell Y, Nof A, Ohara K. High-throughput semi-honest secure three-party computation with an honest majority. In: Proceedings of 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016, 805–817

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (2980KB)

Supplementary files

Highlights

283

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/