Discovering top-k patterns with differential privacy–an accurate approach

Xiaojian ZHANG; Xiaofeng MENG

doi:10.1007/s11704-014-3230-7

Front. Comput. Sci. ›› 2014, Vol. 8 ›› Issue (5) :816 -827. DOI: 10.1007/s11704-014-3230-7

RESEARCH ARTICLE

Discovering top-k patterns with differential privacy–an accurate approach

Xiaojian ZHANG ¹^,²^,^*
, Xiaofeng MENG ¹

Author information +

History +

PDF (557KB)

Abstract

Frequent pattern mining discovers sets of items that frequently appear together in a transactional database; these can serve valuable economic and research purposes. However, if the database contains sensitive data (e.g., user behavior records, electronic health records), directly releasing the discovered frequent patterns with support counts will carry significant risk to the privacy of individuals. In this paper, we study the problem of how to accurately find the top-k frequent patterns with noisy support counts on transactional databases while satisfying differential privacy. We propose an algorithm, called differentially private frequent pattern (DFP-Growth), that integrates a Laplace mechanism and an exponential mechanism to avoid privacy leakage. We theoretically prove that the proposed method is (λ, δ)-useful and differentially private. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct constrained inference in the post-processing step. Extensive experiments, using several real datasets, confirm that our algorithm generates highly accurate noisy support counts and top-k frequent patterns.

Keywords

frequent pattern mining / differential privacy / constrained inference

Cite this article

Download citation ▾

Xiaojian ZHANG, Xiaofeng MENG. Discovering top-k patterns with differential privacy–an accurate approach. Front. Comput. Sci., 2014, 8 (5) : 816-827 DOI:10.1007/s11704-014-3230-7

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases. 1994, 487-499

[2]	Atzori M, Bonchi F, Giannotti F, Pedreschi D. Anonymity preserving pattern discovery. Very Large Data Bases Journal, 2008, 17(4): 703-727

[3]	Xu Y, Wang K, Fu A, Yu P S. Anonymizing Transaction Databases for Publication. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2008, 767-775

[4]	Ganta S R, Kasiviswanathan S P, Smith A. Composition attacks and auxiliary information in data privacy. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 265-273

[5]	Wong R C W, Fu A, Wang K, Yu P S, Pei J. Can the utility of anonymized data be used for privacy breaches. ACM Transaction on Knowledge Discovery from Data, 2011, 5(3): 16

[6]	Dwork C. Differential privacy. In: Proceedings of the 33th Colloquium on Automata, Languages and Programming. 2006, 1-12

[7]	Bhaskar R, Laxman S, Smith A, Thakurta A. Discovering frequent patterns in sensitive data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 503-512

[8]	Li N, Qardaji W, Su D, Cao J. PrivBasis: frequent item set mining with differential privacy. Very Large Data Bases Endowment, 2012, 5(11): 1340-1351

[9]	Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery, 2004, 8(1): 53-87

[10]	Dwork C, McSherry F, Smith A. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference. 2006, 265-284

[11]	Hay M, Rastogi V, Miklau G, Suciu D. Boosting the accuracy of differentially private histogram through consistency. Very Large Data Bases Endowment, 2010, 3(1): 1021-1032

[12]	Chen R, Mohammed N, Fung B C M, Desai B C, Xiong L. Publishing set-valued data via differential privacy. Very Large Data Bases Endowment, 2011, 4(11): 1087-1098

[13]	Götz M, Machanavajjhala A, Wang G, Xiao X, Gehrke J. Publishing search logs-a comparative study of privacy guarantees. IEEE Transaction Knowledge and Data Engineering, 2012, 24(3): 520-532

[14]	McSherry F, Mironov I. Differentially private recommender systems: building privacy into the Netflix prize contenders. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 627-636

[15]	Ding B, Winslett M, Han J, Li Z. Differentially private data cubes: optimizing noise sources and consistency. In: Proceedings of the 31th International Conference on Management of Data. 2011, 217-228

[16]	Friedman A, Schuster A. Data mining with differential privacy. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 493-502

[17]	Mohammed N, Chen R, Fung B C M, Yu P S. Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2011, 493-501

[18]	Chen R, Fung B C M, Desai B C, Sossou N M. Differentially private transit data publication: a case study on the Montreal transportation system. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 213-221

[19]	Terrovitis M, Mamoulis N, Kalnis P. Privacy-preserving anonymization of set-valued data. Very Large Data Bases Endowment, 2008, 1(1): 115-125

[20]	He Y, Naughton J F. Anonymization of set-valued data via top-down, local generalization. Very Large Data Bases Endowment, 2009, 2(1): 934-945

[21]	McSherry F. Mechanism design via differential privacy. In: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science. 2007, 94-103

[22]	McSherry F. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2009, 19-30

[23]	Xiao Y, Xiong L, Yuan C. Differentially private data release through multidimensional partitioning. In: Proceedings of the 7th Very Large Data Bases Workshop on Secure Data Management. 2010, 150-168

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

PDF (557KB)

1945

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submission

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Abstract

Keywords

Cite this article

References

RIGHTS & PERMISSIONS