A best-effort approach to an infrastructure for Chinese Web related research

Weining QIAN; Aoying ZHOU; Minqi ZHOU

doi:10.1007/s11460-011-0137-z

Front. Electr. Electron. Eng. ›› 2011, Vol. 6 ›› Issue (2) :388 -396. DOI: 10.1007/s11460-011-0137-z

RESEARCH ARTICLE

A best-effort approach to an infrastructure for Chinese Web related research

Author information +

History +

PDF (330KB)

Abstract

The design of the infrastructure for Chinese Web (CWI), a prototype system aimed at forum data analysis, is introduced. CWI takes a best effort approach. 1) It tries its best to extract or annotate semantics over the web data. 2) It provides flexible schemes for users to transform the web data into eXtensible Markup Language (XML) forms with more semantic annotations that are more friendly for further analytical tasks. 3) A distributed graph repository, called DISGR is used as backend for management of web data. The paper introduces the design issues, reports the progress of the implementation, and discusses the research issues that are under study.

Keywords

Chinese Web infrastructure / semantic entity / graph data model / distributed storage

Cite this article

Download citation ▾

Weining QIAN, Aoying ZHOU, Minqi ZHOU. A best-effort approach to an infrastructure for Chinese Web related research. Front. Electr. Electron. Eng., 2011, 6 (2) : 388-396 DOI:10.1007/s11460-011-0137-z

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Qian W, Zhou A. Chinese Web infrastructure building: challenges and our roadmap. In: Proceedings of International Workshop on Information-Explosion and Next Generation Search. 2008, 4-11

[2]	China Internet Network Information Center. The 24th Statistical Report on the Development of the Chinese Internet. CNNIC, 2009

[3]	Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation. 2004, 137-150

[4]	Clark J, DeRose S. XML Path language (XPath) version 1.0. World Wide Web Consortium Recommendation, 1999.

[5]	Clark J. XSL Transformations (XSLT). World Wide Web Consortium Recommendation, 1999.

[6]	Sarawagi S. Information extraction. Foundations and Trends in Databases, 2008, 1(3): 261-377

[7]	Cai P, Luo H, Zhou A. Semantic entity detection by integrating CRF and SVM. In: Proceedings of the 11th International Conference on Web-Age Information Management. Lecture Notes in Computer Science, 2010, 6184: 483-494

[8]	Zhou A, Qian W, Tao D, Ma Q. DISG: a distributed graph repository for web infrastructure. In: Proceedings of the Second International Symposium on Universal Communication. 2008, 141-145

[9]	Qian W. Storage and index support for data intensive web applications. In: Proceedings of the 4th International Universal Communication Symposium. 2010, 62-68

[10]	Arocena G O, Mendelzon A O, Mihaila G A. Applications of a web query language. Computer Networks, 1997, 29(8-13): 1305-1315

[11]	Arocena G O, Mendelzon A O. WebOQL: restructuring documents, databases, and webs. In: Proceedings of the 14th International Conference on Data Engineering. 1998, 24-33

[12]	DeWitt D, Gray J. Parallel database systems: the future of high performance database systems. Communications of the ACM, 1992, 35(6): 85-98

[13]	Li J, Gao H, Luo J, Shi S, Zhang W. InfiniteDB: a PCcluster based parallel massive database management system. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 2007, 899-909

[14]	Ghemawat S, Gobioff H, Leung S T. The Google file system. In: Proceedings of the 9th ACM Symposium on Operating Systems Principles. 2003, 29-43

[15]	Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of European Conference on Computer Systems. 2007, 59-72

[16]	Chang F, Dean J, Ghemawat S, Hsieh W C, Wallach D A, Burrows M, Chandra T, Fikes A, Gruber R E. Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems, 2008, 26(2): 1-26

[17]	Pike R, Dorward S, Griesemer R, Quinlan S. Interpreting the data: parallel analysis with Sawzall. Scientific Programming, 2005, 13(4): 277-298

[18]	Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig Latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 1099-1110

[19]	Gates A, Natkovich O, Chopra S, Kamath P, Narayanam S, Olston C, Reed B, Srinivasan S, Srivastava U. Building a highlevel dataflow system on top of mapreduce: the Pig experience. Proceedings of the VLDB Endowment, 2009, 2(2): 1414-1425

[20]	Wen J R, Ma W Y. Webstudio: building infrastructure for web data management. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 2007, 875-876

[21]	Mendelzon A O, Wood P T. Finding regular simple paths in graph databases. SIAM Journal on Computing, 1995, 24(6): 1235-1258

[22]	Cheng J, Ke Y, Ng W. Efficient query processing on graph databases. ACM Transactions on Database Systems, 2009, 34(1): 1-48

[23]	Qun C, Lim A, Ong K W. D(k)-index: an adaptive structural summary for graph-structured data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 2003, 134-144

[24]	Yan Y, Wang C, Zhou A, Qian W, Ma L, Pan Y. Efficient indices using graph partitioning in RDF triple stores. In: Proceedings of the 25th International Conference on Data Engineering. 2009, 1263-1266

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

PDF (330KB)

1132

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Guidelines for authors

Editorial policy

Ethical requirements

Download templates

Abstract

Keywords

Cite this article

References

RIGHTS & PERMISSIONS