D-Ocean: an unstructured data management system for data ocean environment

Yueting ZHUANG, Yaoguang WANG, Jian SHAO, Ling CHEN, Weiming LU, Jianling SUN, Baogang WEI, Jiangqin WU

PDF(1186 KB)
PDF(1186 KB)
Front. Comput. Sci. ›› 2016, Vol. 10 ›› Issue (2) : 353-369. DOI: 10.1007/s11704-015-5045-6
RESEARCH ARTICLE

D-Ocean: an unstructured data management system for data ocean environment

Author information +
History +

Abstract

Together with the big datamovement, many organizations collect their own big data and build distinctive applications. In order to provide smart services upon big data, massive variable data should be well linked and organized to form Data Ocean, which specially emphasizes the deep exploration of the relationships among unstructured data to support smart services. Currently, almost all of these applications have to deal with unstructured data by integrating various analysis and search techniques upon massive storage and processing infrastructure at the application level, which greatly increase the difficulty and cost of application development.

This paper presents D-Ocean, an unstructured data management system for data ocean environment. D-Ocean has an open and scalable architecture, which consists of a core platform, pluggable components and auxiliary tools. It exploits a unified storage framework to store data in different kinds of data stores, integrates batch and incremental processing mechanisms to process unstructured data, and provides a combined search engine to conduct compound queries. Furthermore, a so-called RAISE process modeling is proposed to support the whole process of Repository, Analysis, Index, Search and Environment modeling, which can greatly simplify application development. The experiments and use cases in production demonstrate the efficiency and usability of D-Ocean.

Keywords

unstructured data / storage / analysis / index / search / RAISE process modeling

Cite this article

Download citation ▾
Yueting ZHUANG, Yaoguang WANG, Jian SHAO, Ling CHEN, Weiming LU, Jianling SUN, Baogang WEI, Jiangqin WU. D-Ocean: an unstructured data management system for data ocean environment. Front. Comput. Sci., 2016, 10(2): 353‒369 https://doi.org/10.1007/s11704-015-5045-6

References

[1]
Cui B, Mei H, Ooi BC. Big data: the driver for innovation in databases. National Science Review, 2014, 1(1): 27–30
CrossRef Google scholar
[2]
Laney D. 3D data management: controlling data volume, velocity and variety. META Group Research Note, 2001, 6: 70
[3]
David F, Adam L. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 2004, 10(3–4): 327–348
[4]
Pan Y. Important developments for the digital library: data ocean and smart library. Journal of Zhejiang University-Science C, 2010, 11(11): 835–836
CrossRef Google scholar
[5]
Martinez J M, Pereira F. MPEG-7: the generic multimedia content description standard, part 1. MultiMedia, IEEE, 2002, 9(2): 78–87
CrossRef Google scholar
[6]
Doller M, Tous R, Gruhne M, Yoon K J, Sano M, Burnett I S. The MPEG query format: unifying access to multimedia retrieval systems. MultiMedia, IEEE, 2008, 15(4): 82–95
CrossRef Google scholar
[7]
Melton J, Eisenberg A. SQL multimedia and application packages (SQL/MM). ACM Sigmod Record, 2001, 30(4): 97–102
CrossRef Google scholar
[8]
Buneman P, Davidson S, Hillebrand G, Suciu D. A query language and optimization techniques for unstructured data. ACMSIGMOD Record, 1996, 25(2): 505–516
CrossRef Google scholar
[9]
Halevy A, Franklin M, Maier D. Principles of dataspace systems. In: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2006, 1–9
CrossRef Google scholar
[10]
Dittrich J P, Salles M V. iDM: a unified and versatile data model for personal dataspace management. In: Proceedings of the 32nd International Conference on Very Large Data Bases. 2006, 367–378
[11]
Stonebraker M, Weisberg A. The voltdb main memory dbms. IEEE Data Engineering Bulletin, 2013, 36(2): 21–27
[12]
LeFevre J, Sankaranarayanan J, Hacigumus H, Tatemura J, Polyzotis N, Carey M J. MISO: souping up big data query processing with a multistore system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2014, 1591–1602
CrossRef Google scholar
[13]
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107–113
CrossRef Google scholar
[14]
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCayley M, Franklin MJ, Shenker S, StoicaI. Resilient distributed datasets: a faulttolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 2012, 2
[15]
Oscar B, Sam R, Ian O C, Jimmy L. Summingbird: a framework for integrating batch and online MapReduce computations. Proceedings of the VLDB Endowment, 2014, 7(13): 1441–1451
CrossRef Google scholar
[16]
Jiang D, Chen G, Ooi B C, Tan K L, Wu S. epiC: an extensible and scalable system for processing big data. Proceedings of the VLDB Endowment, 2014, 7(7): 541–552
CrossRef Google scholar
[17]
Lewis D D, Jones K S. Natural language processing for information retrieval. Communications of the ACM, 1996, 39(1): 92–101
CrossRef Google scholar
[18]
Lew M S, Sebe N, Djeraba C, Jain R. Content-based multimedia information retrieval: state of the art and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications, 2006, 2(1): 1–19
CrossRef Google scholar
[19]
Wu E, Diao Y, Rizvi S. High-performance complex event processing over streams. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 2006, 407–418
CrossRef Google scholar
[20]
Lux M, Chatzichristofis S A. Lire: lucene image retrieval: an extensible java CBIR library. In: Proceedings of the 16th ACM International Conference on Multimedia. 2008, 1085–1088
CrossRef Google scholar
[21]
Brenna L, Demers A, Gehrke J, Hong M, Ossher J, Panda B, Riedewald M, Thatte M, White W. Cayuga: a high-performance event processing engine. In: Proceedings of the 2007 ACMSIGMOD International Conference on Management of Data. 2007, 1100–1102
CrossRef Google scholar
[22]
Alsubaiee S, Altowim Y, Altwaijry H, Behm A, Borkar V, Bu Y, Carey MCetindil I, Cheelangi M, Faraaz K. Asterix DB: a scalable, open source BDMS. Proceedings of the VLDB Endowment, 2014, 7(14): 1905–1916
CrossRef Google scholar
[23]
Wang Y, Lu W, Wei B. Transactional multi-row access guarantee in the key-value store. In: Proceedings of the International Conference on Cluster Computing. 2012, 572–575
CrossRef Google scholar
[24]
Yu Q. FastDFS: framework analysis and configuration optimization. In: Proceedings of Database Technology Conference China. 2012
[25]
Meng X, Wang X, Xie M, Zhang X, Zhou J. OrientX: an integrated, schema based native XML database system.Wuhan University Journal of Natural Sciences, 2006, 11(5): 1192–1196
CrossRef Google scholar
[26]
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed dataparallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 2007, 41(3): 59–72
CrossRef Google scholar
[27]
Jarke M, Koch J. Query optimization in database systems. ACM Computing Surveys, 1984, 16(2): 111–152
CrossRef Google scholar
[28]
Fagin R, Lotem A, Naor M. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 2003, 66(4): 614–656
CrossRef Google scholar
[29]
Zhuang Y, Liu Y, Wu F, Zhang Y, Shao J. Hypergraph spectral hashing for similarity search of social image. In: Proceedings of the 19th ACM International Conference on Multimedia. 2011, 1457–1460
CrossRef Google scholar
[30]
Pavlo A, Paulson E, Rasin A, Abadi D, Dewitt D J, Madden S, Stonebraker M. A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 165–178
CrossRef Google scholar
[31]
Lu W, Zheng L, Shao J, Wei B, Zhuang Y. Digital library engine: adapting digital library for cloud computing. In: Proceedings of the 6th IEEE International Conference on Cloud Computing. 2013, 934–941
CrossRef Google scholar

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(1186 KB)

Accesses

Citations

Detail

Sections
Recommended

/