Similarity measure design for high dimensional data

Sang-hyuk Lee , Sun Yan , Yoon-su Jeong , Seung-soo Shin

Journal of Central South University ›› 2014, Vol. 21 ›› Issue (9) : 3534 -3540.

PDF
Journal of Central South University ›› 2014, Vol. 21 ›› Issue (9) : 3534 -3540. DOI: 10.1007/s11771-014-2333-5
Article

Similarity measure design for high dimensional data

Author information +
History +
PDF

Abstract

Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.

Keywords

high dimensional data / similarity measure / difference / neighborhood information / financial fraud

Cite this article

Download citation ▾
Sang-hyuk Lee, Sun Yan, Yoon-su Jeong, Seung-soo Shin. Similarity measure design for high dimensional data. Journal of Central South University, 2014, 21(9): 3534-3540 DOI:10.1007/s11771-014-2333-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Computing community consortium, computing research association.. Advancing discovery in science and engineering [R]. Computing Community Consortium, Computing Research Association, Springer, 2011

[2]

Computing community consortium, computing research association.. Advancing personalized education [R]. Computing Community Consortium, Computing Research Association, Springer, 2011

[3]

Smart healthwellbeing [R].Computing Community Consortium, Computing Research Association, Springer, 2011

[4]

WestD M. Big data for education: Data mining, data analytics, and web dashboards [R]. Washington, USA, Governance Studies at Brookings, 2012

[5]

ManyikaJ, ChuiM, BrownB, BughinJ, DobbsR, RoxburghC, ByersA. Big Data: The next frontier for innovation, competition, and productivity [R]. McKinsey Global Institute, 2011

[6]

CastroF, VellidoA, NebotA, MugicaF. Applying data mining techniques to e-learning problems [J]. Studies in Computational Intelligence, 2007, 62: 183-221

[7]

LiuX-cheng. Entropy, distance measure and similarity measure of fuzzy sets and their relations [J]. Fuzzy Sets and Systems, 1992, 52: 305-318

[8]

FisherD H. Knowledge acquisition via incremental conceptual clustering [J]. Machine Learning, 1987, 2: 139-172

[9]

JainA K, DubesR C. Algorithms for clustering data [M]. Prentice-Hall, 198878-110

[10]

MurtaghF. A survey of recent hierarchical clustering algorithms [J]. The Computer Journal, 1983, 26(4): 354-359

[11]

MichalskiR S, SteppR E. Learning from observation: Conceptual clustering [J]. Machine Learning: An artificial Intelligence Approaches, 1983331-363

[12]

FriedmanH P, RubinJ. On some invariant criteria for grouping data [J]. Journal of American Statistical Association, 1967, 62: 1159-1178

[13]

FukunagaK. Introduction to statistical pattern recognition [M]. Academic Press, 199045-89

[14]

LeeS H, PedryczW, Gyoyongsohn. Design of similarity and dissimilarity measures for fuzzy sets on the basis of distance measure [J]. International Journal of Fuzzy Systems, 2009, 11: 67-72

[15]

LeeS H, RyuK H, SohnG Y. Study on entropy and similarity measure for fuzzy set [J]. IEICE Trans Inf & Syst, 2009, E92-D: 1783-1786

[16]

LeeS H, SunY. Data analysis on high dimensional data via calculation of degree of similarity [C]. Proceeding of International Symposium on System Informatics and Engineering, Xi’an, China, 2013160-166

[17]

KenndyJ, EberhartR. Particle swam optimization [C]. Neural Networks, Proceedings, IEEE International Conference on, Perth. WA, USA, 19951942-1948

AI Summary AI Mindmap
PDF

99

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/