A survey of RDF data management systems
M. Tamer ÖZSU
A survey of RDF data management systems
RDF is increasingly being used to encode data for the semantic web and data exchange. There have been a large number of works that address RDF data management following different approaches. In this paper we provide an overview of these works. This review considers centralized solutions (what are referred to as warehousing approaches), distributed solutions, and the techniques that have been developed for querying linked data. In each category, further classifications are provided that would assist readers in understanding the identifying characteristics of different approaches.
RDF / SPARQL / linked object data
[1] |
Suchanek F M, Kasneci G, Weikum G. Yago: a core of semanticknowledge. In: Proceedings of the 16th ACM International Conference on World Wide Web. 2007, 697–706
|
[2] |
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S. DBpedia — a crystallization point for the web of data. J. Web Semantics: Science, Services and Agents on the World Wide Web, 2009, 7(3): 154–165
|
[3] |
Schmachtenberg M, Bizer C, Paulheim H. Adoption of best data practices in different topical domains. In: Proceedings of the 13th International Semantic Web Conference. 2014, 245–260
|
[4] |
Zhang Y, Duc P M, Corcho O, Calbimonte J P. SRBench: A streamingRDF/ SPARQL benchmark. In: Proceedings of the 11th International. Semantic Web Conference. 2012, 641–657
|
[5] |
Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, Auer S. Qualityassessment for linked data: a survey. Semantic Web, 2015, 7(1): 63–93
|
[6] |
Tang N. Big RDF data cleaning. In: proceedings of the 31st IEEE International Conference onData Engineering Workshops. 2015, 77–79
|
[7] |
Klyne G, Carroll J J, McBride B. RDF 1.1 concepts and abstract syntax. W3C Recommendation, 2014
|
[8] |
Harris S, Seaborne A, Prud’hommeaux E. SPARQL 1.1 query language. W3C Recommendation, 2013
|
[9] |
Zou L, Özsu M T, Chen L, Shen X, Huang R, Zhao D. gStore: agraphbased SPARQL query engine. The VLDB journal, 2014, 23(4): 565–590
|
[10] |
Hartig O, Özsu M T. Reachable subwebs for traversal-based query execution. In: Proceedings of the 23rd International Conference on World Wide Web. 2014, 541–546
|
[11] |
Hartig O. SPARQL for a web of linked data: semantics and computability. In: Proceedings of the 9th Extended Semantic Web Conference. 2012, 8–23
|
[12] |
W3C. SPARQL query language for RDF — formal definitions. Accessible at https://www.w3.org/2001/sw/DataAccess/rq23/sparqldefns.html. 2006
|
[13] |
Wilkinson K. Jena property table implementation. Technical Report HPL-2006-140.2006
|
[14] |
Angles R, Gutierrez C. Theexpressive power of SPARQL. In: Proceedings of the 7th International Semantic Web Conference. 2008, 114–129
|
[15] |
Sequeda J F, Arenas M, Miranker D P. OBDA: query rewriting or materialization? in practice, both! In: Proceedings of the 13th International Semantic Web Conference. 2014, 535–551
|
[16] |
Broekstra J, Kampman A, Van Harmelen F. Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of the 1st International Semantic Web Conference. 2002, 54–68
|
[17] |
Chong E, Das S, Eadon G, Srinivasan J. An efficient SQL-based RDF querying scheme. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 1216–1227
|
[18] |
Weiss C, Karras P, Bernstein A. Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment, 2008, 1(1): 1008–1019
|
[19] |
Neumann T, Weikum G. RDF-3X: a RISC-style engine for RDF. Proceedings of the VLDB Endowment, 2008, 1(1): 647–659
|
[20] |
Neumann T, Weikum G. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 2009, 19(1): 91–113
|
[21] |
Bornea M A, Dolby J, Kementsietsidis A, Srinivas K, Dantressangle P, Udrea O, Bhattacharjee B. Building an efficient RDF store over a relational database. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 121–132
|
[22] |
Abadi D J, Marcus A, Madden S R, Hollenbach K. Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 411–422
|
[23] |
Abadi D J, Marcus A, Madden S, Hollenbach K. SW-Store: a vertically partitioned DBMS for semantic web data management. The VLDB Journal, 2009, 18(2): 385–406
|
[24] |
Sidirourgos L, Goncalves R, Kersten M, Nes N, Manegold S. Columnstore support for RDF data management: not all swans are white. Proceedings of the VLDB Endowment, 2008, 1(2): 1553–1563
|
[25] |
Bönström V, Hinze A, Schweppe H. Storing RDF as a graph. In: Proceedings of the1st Latin American Web Congress. 2003, 27–36
|
[26] |
Zou L, Mo J, Chen L, Özsu M T, Zhao D. gStore: answering SPARQL queries via subgraph matching. Proceedings of theVLDB Endowment, 2011, 4(8): 482–493
|
[27] |
Aluç G. Workload matters: arobust approach to physical RDF database design. Dissertation for the Doctoral Degree. Waterloo: University of Waterloo, 2015
|
[28] |
Peng P, Zou L, Özsu M T, Chen L, Zhao D. Processing SPARQL queries over distributed RDF graphs. The VLDB Journal, 2016, 25(2): 243–268
|
[29] |
Khadilkar V, Kantarcioglu M, Thuraisingham B M, Castagna P. Jena- HBase: a distributed, scalable and efficient RDF triple store. In: Proceedings of the 11th International Semantic Web Conference Posters & Demonstrations Track. 2012, 85–88
|
[30] |
Rohlo _ K, Schantz R E. High-performance, massively scalable distributed systems using the mapreduce software framework: the SHARD triple-store. In: Proceedings of ACM International Workshop on Programming Support Innovations for Emerging Distributed Applications. 2010
|
[31] |
Husain M F, McGlothlin J, Masud M M, Khan L R, Thuraisingham B. Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(9): 1312–1327
|
[32] |
Zhang X, Chen L,Wang M. Towards efficient join processing overlarge RDF graph using mapreduce. In: Proceedings of the 24th International Conference on Scientific and Statistical Database Management. 2012, 250–259
|
[33] |
Zhang X, Chen L, Tong Y, Wang M. EAGRE: towards scalable I/Oefficient SPARQL query evaluation on the cloud. In: Proceedings of the 29th International Conference on Data Engineering. 2013, 565–576
|
[34] |
Zeng K, Yang J, Wang H, Shao B, Wang Z. A distributed graph engine for web scale RDF data. Proceedings of the VLDB Endowment, 2013, 6(4): 265–276
|
[35] |
Papailiou N, Konstantinou I, Tsoumakos D, Koziris N. H2RDF: adaptive query processing on RDF data in the cloud. In: Proceedings of the 21st ACM International Conference Companion on World Wide Web. 2012, 397–400
|
[36] |
Papailiou N, Tsoumakos D, Konstantinou I, Karras P, Koziris N. H2RDF+: an efficient data management system for big RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2014, 909–912
|
[37] |
Kaoudi Z, Manolescu I. RDF in the clouds: a survey. The VLDB Journal, 2015, 24: 67–91
|
[38] |
Li F, Ooi B C, Özsu M T, Wu S. Distributed data management using MapReduce. ACM Computing Surveys (CSUR), 2014, 46(3)
|
[39] |
Karypis G, Kumar V. Analysis of multilevel graph partitioning. In: Proceedings of the ACM/IEEE Conference on Supercomputing. 1995
|
[40] |
Shao B, Wang H, Li Y. Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 505–516
|
[41] |
Huang J, Abadi D J, Ren K. Scalable SPARQL querying of large RDF graphs. Proceedings of the VLDB Endowment, 2011, 4(11): 1123–1134
|
[42] |
Hose K, Schenkel R. WARP: workload-aware replication and partitioning for RDF. In: Proceedings of the 29th IEEE International Conference on Data Engineering Workshops. 2013, 1–6
|
[43] |
Galarraga L, Hose K, Schenkel R. Partout: a distributed engine for efficient RDF processing. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web. 2014, 267–268
|
[44] |
Lee K, Liu L. Scaling queries over big RDF graphs with semantic hash partitioning. Proceedings of the VLDB Endowment, 2013, 6(14): 1894–1905
|
[45] |
Gurajada S, Seufert S, Miliaraki I, Theobald M. TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2014, 289–300
|
[46] |
Quilitz B. Querying distributed RDF data sources with SPARQL. In: Proceedings of the 5th European Semantic Web Conference. 2008, 524–538
|
[47] |
Harth A, Hose K, Karnstedt M, Polleres A, Sattler K, Umbrich J. Data summaries for on-demand queries over linked data. In: Proceedings of the 19th ACM International Conference on World Wide Web. 2010, 411–420
|
[48] |
Görlitz O, Staab S. SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In: Proceedings of ISWC Workshop on Consuming Linked Data. 2011
|
[49] |
Saleem M, Ngomo A N. HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation. In: Proceedings of the 11th Extended Semantic Web Conference. 2014, 176–191
|
[50] |
Saleem M, Padmanabhuni S S, Ngomo A N, Iqbal A, Almeida J S, Decker S, Deus H F. TopFed: TCGA tailored federated query processing and linking to LOD. Biomedical Semantics, 2014, 5: 47
|
[51] |
Schwarte A, Haase P, Hose K, Schenkel R, Schmidt M. FedX: optimization techniques for federated query processing on linked data. In: Proceedings of the 10th International SemanticWeb Conference. 2011, 601–616
|
[52] |
Astrahan M M, Blasgen M W, Chamberlin D D, Eswaran K P, Gray J N, Griffiths P P, King W F, Lorie R A, McJones P R, Mehl J W, Putzolu G R, Traiger I L, Wade B W, Watson V. System R: relational approach to database management. ACM Transactions on Database Systems (TODS), 1976, 1(2): 97–137
|
[53] |
Hartig O. An overview on execution strategies for linked data queries. Datenbank-Spektrum, 2013, 13(2): 89–99
|
[54] |
Hartig O. SQUIN: a traversal based query execution system for the web of linked data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2013, 1081–1084
|
[55] |
Ladwig G, Tran T. SIHJoin: Querying remote and local linked data. In: Proceedings of the 8th Extended Semantic Web Conference. 2011, 139–153
|
[56] |
Umbrich J, Hose K, Karnstedt M, Harth A, Polleres A. Comparing data summaries for processing live queries over linked data. World Wide Web, 2011, 14(5–6): 495–544
|
[57] |
Ladwig G, Tran T. Linked data query processing strategies. In: Proceedings of the 9th International Semantic Web Conference. 2010, 453–469
|
[58] |
Chaudhuri S, Narasayya V. Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 3–14
|
[59] |
Halim F, Idreos S, Karras P, Yap R H C. Stochastic database cracking: towards robust adaptive indexing main-memory column-stores. Proceedings of the VLDB Endowment, 2012, 5(6): 502–513
|
[60] |
Duan S, Kementsietsidis A, Srinivas K, Udrea O. Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2011, 145–156
|
[61] |
Kim J, Shin H, Han W S, Hong S, Chafi H. Taming subgraph isomorphism for RDF query processing. Proceedings of the VLDB Endowment, 2015, 8(11): 1238–1249
|
[62] |
Aluç G, Hartig O, Özsu M T, Daudjee K. Diversified stress testing of RDF data management systems. In: Proceedings of the 13th International Semantic Web Conference. 2014, 197–212
|
[63] |
Aluç G, Özsu M T, Daudjee K. Workload matters: why RDF databases need a new design. Proceedings of the VLDB Endowment, 2014, 7(10): 837–840
|
[64] |
Aluç G, Özsu M T, Daudjee K, Hartig O. Executing queries over schemaless RDF databases. In: Proceedings of the 31st International Conference on Data Engineering. 2015, 807–818
|
[65] |
Aluç G, Özsu M T, Daudjee K. Clustering RDF databases using Tunable-LSH. Eprint Arxiv, 2015
|
[66] |
Indyk P, Motwani R. Approximate nearest neighbors: towards removingthe curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing. 1998, 604–613
|
[67] |
Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases. 1999, 518–529
|
[68] |
Idreos S, Kersten M L, Manegold S. Database cracking. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research. 2007, 68–78
|
[69] |
Idreos S, Kersten M L, Manegold S. Self-organizing tuple reconstruction in column-stores. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2009, 297–308
|
[70] |
Idreos S, Manegold S, Kuno H A, Graefe G. Merging what’s cracked, cracking what’s merged: Adaptive indexing in main-memory columnstores. Proceedings of the VLDB Endowment, 2011, 4(9): 585–597
|
/
〈 | 〉 |