A parallel computing framework for big data

Guoliang CHEN, Rui MAO, Kezhong LU

PDF(627 KB)
PDF(627 KB)
Front. Comput. Sci. ›› 2017, Vol. 11 ›› Issue (4) : 608-621. DOI: 10.1007/s11704-016-5003-y
RESEARCH ARTICLE

A parallel computing framework for big data

Author information +
History +

Abstract

Big data has received great attention in research and application. However, most of the current efforts focus on system and application to handle the challenges of “volume” and “velocity”, and not much has been done on the theoretical foundation and to handle the challenge of “variety”. Based on metric-space indexing and computationalcomplexity theory, we propose a parallel computing framework for big data. This framework consists of three components, i.e., universal representation of big data by abstracting various data types into metric space, partitioning of big data based on pair-wise distances in metric space, and parallel computing of big data with the NC-class computing theory.

Keywords

NC-computing / metric space / data partitioning / parallel computing

Cite this article

Download citation ▾
Guoliang CHEN, Rui MAO, Kezhong LU. A parallel computing framework for big data. Front. Comput. Sci., 2017, 11(4): 608‒621 https://doi.org/10.1007/s11704-016-5003-y

References

[1]
ZhouB, Liu W, FanC . Big Data: Strategy, Technology and Practice. Beijing: Publishing House of Electronics Industry, 2013
[2]
TuZ. Big Data. Guilin: Guangxi Normal University Press, 2012
[3]
FanW, GeertsF, NevenF. Making queries tractable on big data with preprocessing: through the eyes of complexity theory. Proceedings of the VLDB Endowment, 2013, 6(9): 685–696
CrossRef Google scholar
[4]
ChenG. Design and Analysis of Parallel Algorithms. Beijing: Higher Education Press, 2011
[5]
CookS A. Deterministic CFL’s are accepted simultaneously in polynomial time and log squared space. In: Proceedings of the 11th Annual ACM Symposium on Theory of Computing. 1979, 338–345
CrossRef Google scholar
[6]
MaoR, XuH L, WuW B, Li J Q, LiY , LuM H. Overcoming the challenge of variety: big data abstraction, the next evolution of data management for AAL communication system. IEEE Communications Magazine, 2015, 53(1): 42–47
CrossRef Google scholar
[7]
XiongJ, LuJ, TanF. Topology. Beijing:China Machine Press, 2013
[8]
ChávezE, Navarro G, Baeza-YatesR , MarroquínJ L. Searching in metric spaces. ACM Computing Surveys, 2001, 33(3): 273–321
CrossRef Google scholar
[9]
ZezulaP, AmatoG, DohnalV, Batko M. Similarity Search: the Metric Space Approach. Springer Science & Business Media, 2006
[10]
MaoR, ZhangP H, LiX L, Liu X, LuM H . Pivot selection for metricspace indexing. International Journal of Machine Learning and Cybernetics, 2016, 7(2): 311–323
CrossRef Google scholar
[11]
MaoR, Miranker W, MirankerD P . Pivot selection: dimension reduction for distance-based indexing. Journal of Discrete Algorithms, 2012, 13: 32–46
CrossRef Google scholar
[12]
ShiH M, Schaeffer J. Parallel sorting by regular sampling. Journal of Parallel and Distributed Computing, 1992, 14(4): 361–372
CrossRef Google scholar
[13]
ValiantL G. Parallelism in comparison problems. SIAM Journal on Computing, 1975, 4(3): 348–355
CrossRef Google scholar
[14]
ShiloachY, Vishkin U. Finding the maximum. Merging and Sorting in a Parallel Computational Model, 1981, 2(1): 88–102
[15]
ChenG. Balanced (n, m)-selection networks. Computer Research and Development, 1984, 21(11): 9–22
[16]
Samet, H. Foundations of Multidimensional and Metric Data Structures. San Francisco: Morgan-Kaufmann, 2006
[17]
HjaltasonG R, SametH. Index-driven similarity search in metric spaces. ACM Transactions on Database Systems, 2003. 28(4): 517–580
CrossRef Google scholar
[18]
UhlmannJ K. Satisfying general proximity/similarity queries withmetric trees. Information Processing Letter, 1991, 40(4): 175–179
CrossRef Google scholar
[19]
YianilosP N. Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms. 1993
[20]
MaoR, LiuS, XuH L, Zhang D, MirankerD P . On data partitioning in tree structure metric-space indexes. In: Proceedings of the 19th International Conference on Database Systems for Advanced Applications. 2014, 141–155
CrossRef Google scholar
[21]
BozkayaT, Ozsoyoglu M. Indexing large metric spaces for similarity search queries. ACM Transactions on Database Systems, 1999, 24(3): 361–404
CrossRef Google scholar
[22]
SergeyB. Near neighbor search in large metric spaces. In: Proceedings of the 21st International Conference on Very Large Data Bases. 1995
[23]
CiacciaP, Patella M, ZezulaP . M-tree: an efficient access method for similarity search in metric spaces. In: proceedings of the 23rd International Conference on Very Large Data Bases. 1997
[24]
NavarroG. Searching in metric spaces by spatial approximation. The VLDB Journal, 2002, 11(1): 28–46
CrossRef Google scholar
[25]
JagadishH V, OoiB C, TanK L, Yu C, ZhangR . iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS), 2005, 30(2): 364–397
CrossRef Google scholar
[26]
ChavezE, Navarro G. Unbalancing: the key to index high dimensional metric spaces. Technical Report. 1999
[27]
MaoR, XuW, RamakrishnanS . NuckollsG, Miranker D P. On optimizing distance-based similarity search for biological databases. In: Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference. 2005, 351–361
CrossRef Google scholar
[28]
MacQueenJ B. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967, 281–297
[29]
EsterM, Kriegel H P, SanderJ , XuX. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd ACM SIGKDD Conference. 1996. 226–231
[30]
CullerD, KarpR, PattersonD, Sahay A, SchauserK E , SantosE, Subramonian R, von EickenT . LogP: towards a realistic model of parallel computation. In: Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1993, 1–12
CrossRef Google scholar
[31]
TuB B, ZouM, ZhanJ F, Zhao X F, FanJ P . Research on parallel computation model with memory hierarchy on multi-core clusters. Chinese Journal of Computers, 2008, 31(11): 1948–1955
CrossRef Google scholar
[32]
LadnerR E. The circuit value problem is log space complete for P. ACM SIGACT News, 1975, 7(1): 18–20
CrossRef Google scholar
[33]
PippengerN J. On simultaneous resource bounds. In: Proceedings of the 20th IEEE Annual Symposium on Foundations of Computer Science. 1979, 307–311
CrossRef Google scholar
[34]
ChandraA K, Stockmeyer L J. Alternation. In: Proceedings of the 17th IEEE Annual Symposium on Foundations of Computer Science. 1976, 98–108
CrossRef Google scholar
[35]
GoldschlagerL M. Synchronous parallel computation. Dissertation for the Doctoral Degree. Toronto: University of Toronto, 1977

RIGHTS & PERMISSIONS

2016 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(627 KB)

Accesses

Citations

Detail

Sections
Recommended

/