Classifying and clustering in negative databases
Ran LIU, Wenjian LUO, Lihua YUE
Classifying and clustering in negative databases
Recently, negative databases (NDBs) are proposed for privacy protection. Similar to the traditional databases, some basic operations could be conducted over the NDBs, such as select, intersection, update, delete and so on. However, both classifying and clustering in negative databases have not yet been studied. Therefore, two algorithms, i.e., a k nearest neighbor (kNN) classification algorithm and a k-means clustering algorithm in NDBs, are proposed in this paper, respectively. The core of these two algorithms is a novel method for estimating the Hamming distance between a binary string and an NDB. Experimental results demonstrate that classifying and clustering in NDBs are promising.
negative databases / classification / clustering / k nearest neighbor / k-means / hamming distance
[1] |
Esponda F. Everything that is not important: negative databases [research frontier]. IEEE Computational Intelligence Magazine, 2008, 3(2): 60-63
CrossRef
Google scholar
|
[2] |
Esponda F, Forrest S, Helman P. Enhancing privacy through negative representations of data. Technical Report, DTIC Document. 2004
|
[3] |
Esponda F, Forrest S, Helman P. Negative representations of information. International Journal of Information Security, 2009, 8(5): 331-345
CrossRef
Google scholar
|
[4] |
Esponda F, Ackley E S, Helman P, Jia H, Forrest S. Protecting data privacy through hard-to-reverse negative databases. International Journal of Information Security, 2007, 6(6): 403-415
CrossRef
Google scholar
|
[5] |
Liu R, Luo W, Wang X. A hybrid of the prefix algorithm and the q-hidden algorithm for generating single negative databases. In: Proceedings of 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS). 2011, 31-38
CrossRef
Google scholar
|
[6] |
Jia H,Moore C, Strain D. Generating hard satisfiable formulas by hiding solutions deceptively. Journal of Artificial Intelligence Research, 2007, 28(1): 107-118
|
[7] |
Jia H, Moore C, Strain D. Generating hard satisfiable formulas by hiding solutions deceptively. In: Proceedings of the National Conference on Artificial Intelligence. 2005, 384
|
[8] |
Esponda F. Negative surveys. arXiv preprint math.ST/0608176. 2006
|
[9] |
Bao Y, Luo W, Zhang X. Estimating positive surveys from negative surveys. Statistics & Probability Letters, 2013, 83(2): 551-558
CrossRef
Google scholar
|
[10] |
Xie H, Kulik L, Tanin E. Privacy-aware collection of aggregate spatial data. Data & Knowledge Engineering, 2011, 70(6): 576-595
CrossRef
Google scholar
|
[11] |
Horey J, Groat M M, Forrest S, Esponda F. Anonymous data collection in sensor networks. In: Proceedings of the 4th Annual International Conference on Mobile and Ubiquitous Systems: Networking & Services. 2007, 1-8
|
[12] |
Dasgupta D, Saha S. Password security through negative filtering. In: Proceedings of 2010 International Conference on Emerging Security Technologies (EST). 2010, 83-89
CrossRef
Google scholar
|
[13] |
Dasgupta D, Saha S. A biologically inspired password authentication system. In: Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies. 2009, 41
|
[14] |
Dasgupta D, Azeem R. An investigation of negative authentication systems. In: Proceedings of the 3rd International Conference on Information Warfare and Security. 2008, 117-126
|
[15] |
Bringer J, Chabanne H. Negative databases for biometric data. In: Proceedings of the 12th ACMWorkshop on Multimedia and Security. 2010, 55-62
|
[16] |
Esponda F, Trias E D, Ackley E S, Forrest S. A relational algebra for negative databases. University of New Mexico Technical Report, 2007
|
[17] |
Esponda F, Ackley E S, Forrest S, Helman P. Online negative databases. International Journal of Unconventional Computing, 1993, 1(3): 201-220
|
[18] |
Esponda F, Ackley E S, Forrest S, Helman P. Online negative databases. In: Proceedings of the 3rd International Conference on Artificial Immune Systems. 2004, 175-188
CrossRef
Google scholar
|
[19] |
Selman B, Kautz H, Cohen B. Local search strategies for satisfiability testing. Cliques, Coloring, and Satisfiability: Second DIMACS implementation challenge, 1993, 26: 521-532
|
[20] |
Fu Z, Marhajan Y, Malik S. Zchaffsat solver. Technical Report. Department of Electrical Engineering, Princeton University. 2004
|
[21] |
Huang Z. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998, 2(3): 283-304
CrossRef
Google scholar
|
/
〈 | 〉 |