A parallel computing framework for big data

Guoliang CHEN , Rui MAO , Kezhong LU

Front. Comput. Sci. ›› 2017, Vol. 11 ›› Issue (4) : 608 -621.

PDF (627KB)
Front. Comput. Sci. ›› 2017, Vol. 11 ›› Issue (4) : 608 -621. DOI: 10.1007/s11704-016-5003-y
RESEARCH ARTICLE

A parallel computing framework for big data

Author information +
History +
PDF (627KB)

Abstract

Big data has received great attention in research and application. However, most of the current efforts focus on system and application to handle the challenges of “volume” and “velocity”, and not much has been done on the theoretical foundation and to handle the challenge of “variety”. Based on metric-space indexing and computationalcomplexity theory, we propose a parallel computing framework for big data. This framework consists of three components, i.e., universal representation of big data by abstracting various data types into metric space, partitioning of big data based on pair-wise distances in metric space, and parallel computing of big data with the NC-class computing theory.

Keywords

NC-computing / metric space / data partitioning / parallel computing

Cite this article

Download citation ▾
Guoliang CHEN, Rui MAO, Kezhong LU. A parallel computing framework for big data. Front. Comput. Sci., 2017, 11(4): 608-621 DOI:10.1007/s11704-016-5003-y

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

ZhouB, Liu W, FanC . Big Data: Strategy, Technology and Practice. Beijing: Publishing House of Electronics Industry, 2013

[2]

TuZ. Big Data. Guilin: Guangxi Normal University Press, 2012

[3]

FanW, GeertsF, NevenF. Making queries tractable on big data with preprocessing: through the eyes of complexity theory. Proceedings of the VLDB Endowment, 2013, 6(9): 685–696

[4]

ChenG. Design and Analysis of Parallel Algorithms. Beijing: Higher Education Press, 2011

[5]

CookS A. Deterministic CFL’s are accepted simultaneously in polynomial time and log squared space. In: Proceedings of the 11th Annual ACM Symposium on Theory of Computing. 1979, 338–345

[6]

MaoR, XuH L, WuW B, Li J Q, LiY , LuM H. Overcoming the challenge of variety: big data abstraction, the next evolution of data management for AAL communication system. IEEE Communications Magazine, 2015, 53(1): 42–47

[7]

XiongJ, LuJ, TanF. Topology. Beijing:China Machine Press, 2013

[8]

ChávezE, Navarro G, Baeza-YatesR , MarroquínJ L. Searching in metric spaces. ACM Computing Surveys, 2001, 33(3): 273–321

[9]

ZezulaP, AmatoG, DohnalV, Batko M. Similarity Search: the Metric Space Approach. Springer Science & Business Media, 2006

[10]

MaoR, ZhangP H, LiX L, Liu X, LuM H . Pivot selection for metricspace indexing. International Journal of Machine Learning and Cybernetics, 2016, 7(2): 311–323

[11]

MaoR, Miranker W, MirankerD P . Pivot selection: dimension reduction for distance-based indexing. Journal of Discrete Algorithms, 2012, 13: 32–46

[12]

ShiH M, Schaeffer J. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 1992, 14(4): 361–372

[13]

ValiantL G. Parallelism in comparison problems. SIAM Journal on Computing, 1975, 4(3): 348–355

[14]

ShiloachY, Vishkin U. Finding the maximum. Merging and Sorting in a Parallel Computational Model, 1981, 2(1): 88–102

[15]

ChenG. Balanced (n, m)-selection networks. Computer Research and Development, 1984, 21(11): 9–22

[16]

Samet, H. Foundations of Multidimensional and Metric Data Structures. San Francisco: Morgan-Kaufmann, 2006

[17]

HjaltasonG R, SametH. Index-driven similarity search in metric spaces. ACM Transactions on Database Systems, 2003. 28(4): 517–580

[18]

UhlmannJ K. Satisfying general proximity/similarity queries withmetric trees. Information Processing Letter, 1991, 40(4): 175–179

[19]

YianilosP N. Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms. 1993

[20]

MaoR, LiuS, XuH L, Zhang D, MirankerD P . On data partitioning in tree structure metric-space indexes. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications. 2014, 141–155

[21]

BozkayaT, Ozsoyoglu M. Indexing large metric spaces for similarity search queries. ACM Transactions on Database Systems, 1999, 24(3): 361–404

[22]

SergeyB. Near neighbor search in large metric spaces. In: Proceedings of the 21st International Conference on Very Large Data Bases. 1995

[23]

CiacciaP, Patella M, ZezulaP . M-tree: an efficient access method for similarity search in metric spaces. In: proceedings of the 23rd International Conference on Very Large Data Bases. 1997

[24]

NavarroG. Searching in metric spaces by spatial approximation. The VLDB Journal, 2002, 11(1): 28–46

[25]

JagadishH V, OoiB C, TanK L, Yu C, ZhangR . iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS), 2005, 30(2): 364–397

[26]

ChavezE, Navarro G. Unbalancing: the key to index high dimensional metric spaces. Technical Report. 1999

[27]

MaoR, XuW, RamakrishnanS . NuckollsG, Miranker D P. On optimizing distance-based similarity search for biological databases. In: Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference. 2005, 351–361

[28]

MacQueenJ B. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967, 281–297

[29]

EsterM, Kriegel H P, SanderJ , XuX. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd ACM SIGKDD Conference. 1996. 226–231

[30]

CullerD, KarpR, PattersonD, Sahay A, SchauserK E , SantosE, Subramonian R, von EickenT . LogP: towards a realistic model of parallel computation. In: Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1993, 1–12

[31]

TuB B, ZouM, ZhanJ F, Zhao X F, FanJ P . Research on parallel computation model with memory hierarchy on multi-core clusters. Chinese Journal of Computers, 2008, 31(11): 1948–1955

[32]

LadnerR E. The circuit value problem is log space complete for P. ACM SIGACT News, 1975, 7(1): 18–20

[33]

PippengerN J. On simultaneous resource bounds. In: Proceedings of the 20th IEEE Annual Symposium on Foundations of Computer Science. 1979, 307–311

[34]

ChandraA K, Stockmeyer L J. Alternation. In: Proceedings of the 17th IEEE Annual Symposium on Foundations of Computer Science. 1976, 98–108

[35]

GoldschlagerL M. Synchronous parallel computation. Dissertation for the Doctoral Degree. Toronto: University of Toronto, 1977

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap
PDF (627KB)

Supplementary files

FCS-0608-15003-RM_suppl_1

1163

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/