Sep 2008, Volume 2 Issue 3

  • Select all
  • ZHOU Aoying
  • Yang Bin, Qian Weining, Zhou Aoying
    With the development of World Wide Web (WWW), storage and utilization of web data has become a big challenge for data management research community. Web data are essentially heterogeneous data, and may change schema frequently, traditional relational data model is inappropriate for web data management. A new data model, called Wide Table (or WT for simplicity), was introduced for this task. There are several characteristics of the WT model. First, WT is usually highly sparsely populated so that most data can be fit into a line or record. Second, queries are composed on only a small subset of the attributes. Thus, existing query processing and optimization techniques for relational database with normalized tables will not work efficiently anymore. Furthermore, WT is usually of extremely large volume. It is thought that only large-scale distributed storage can accommodate the massive data set. In this paper, requirements and challenges to web data management are discussed. Existing techniques for WT, including logical presentation, physical storage, and query processing, are introduced and analyzed in detail.
  • ZHANG Rong, ZETTSU Koji, KIDAWARA Yutaka, KIYOKI Yasushi
    As the development of hardware and software, large scale, flexible, distributed, secure and coordinated resource sharing has attracted much attention. One of the major challenges is to support distributed group-based resource management, e.g. interest-based organization, with resources/services classifiable. Although there have been some proposals to address this challenge, they share the same weakness of using either severs or super peers to keep global knowledge, and win good search efficiency at the expenses of the system scalability. As a result, such designs can not keep both the search efficiency and system scalability. To that end, this paper proposes a group-based distributed architecture. It organizes the nodes inside the groups by Chord protocol, a classical Peer-to-Peer (P2P) technology and it defines new communication protocol for nodes among different groups but removes servers/super peers for group management. Such a design keeps the resource classifiable property together with good system performance. The main characteristics of this architecture are highlighted by its convenience for group activity analysis, promising scalability, high search efficiency, as well as robustness. The experimental performance results presented in the paper demonstrate the efficiency of the design.
  • LI Mei, LEE Wang-Chien, SIVASUBRAMANIAM Anand, ZHAO Jizhong
    Peer-to-peer systems have been widely used for sharing and exchanging data and resources among numerous computer nodes. Various data objects identifiable with high dimensional feature vectors, such as text, images, genome sequences, are starting to leverage P2P technology. Most of the existing works have been focusing on queries on data objects with one or few attributes and thus are not applicable on high dimensional data objects. In this study, we investigate K nearest neighbors query (KNN) on high dimensional data objects in P2P systems. Efficient query algorithm and solutions that address various technical challenges raised by high dimensionality, such as search space resolution and incremental search space refinement, are proposed. An extensive simulation using both synthetic and real data sets demonstrates that our proposal efficiently supports KNN query on high dimensional data in P2P systems.
  • XU Jianliang, TANG Xueyan, LEE Wang-Chien
    Wireless sensor networks are used in a large array of applications to capture, collect, and analyze physical environmental data. Many existing sensor systems instruct sensor nodes to report their measurements to central repositories outside the network, which is expensive in energy cost. Recent technological advances in flash memory have given rise to the development of storage-centric sensor networks, where sensor nodes are equipped with high-capacity flash memory storage such that sensor data can be stored and managed inside the network to reduce expensive communication. This novel architecture calls for new data management techniques to fully exploit distributed in-network data storage. This paper describes some of our research on distributed query processing in such flash-based sensor networks. Of particular interests are the issues that arise in the design of storage management and indexing structures combining sensor system workload and read/write/erase characteristics of flash memory.
  • LI Fengrong, IIDA Takuya, ISHIKAWA Yoshiharu
    In recent years, peer-to-peer (P2P) technologies are used for flexible and scalable information exchange in the Internet, but there exist problems to be solved for reliable information exchange. It is important to trace how data circulates between peers and how data modifications are performed during the circulation before reaching the destination for enhancing the reliability of exchanged information. However, such lineage tracing is not easy in current P2P networks, since data replications and modifications are performed independently by autonomous peers-this creates a lack of reliability among the records exchanged. In this paper, we propose a framework for traceable record exchange in a P2P network. By managing historical information in distributed peers, we make the modification and exchange histories of records traceable. One of the features of our work is that the database technologies are utilized for realizing the framework. Histories are maintained in a relational database in each peer, and tracing queries are written in the datalog query language and executed in a P2P network by cooperating peers. This paper describes the concept of the framework and overviews the approach to query processing.
  • Tang Yuanyan
    Pattern recognition has become one of the fastest growing research topics in the fields of computer science and electrical and electronic engineering in the recent years. Advanced research and development in pattern recognition have found numerous applications in such areas as artificial intelligence, information security, biometrics, military science and technology, finance and economics, weather forecast, image processing, communication, biomedical engineering, document processing, robot vision, transportation, and endless other areas, with many encouraging results. The achievement of pattern recognition is most likely to benefit from some new developments of theoretical mathematics including wavelet analysis. This paper aims at a brief survey of pattern recognition with the wavelet theory. It contains the following respects: analysis and detection of singularities with wavelets; wavelet descriptors for shapes of the objects; invariant representation of patterns; handwritten and printed character recognition; texture analysis and classification; image indexing and retrieval; classification and clustering; document analysis with wavelets; iris pattern recognition; face recognition using wavelet transform; hand gestures classification; character processing with B-spline wavelet transform; wavelet-based image fusion, and others.
  • YANG Jian, YANG Jingyu, ZHANG David
    In existing Linear Discriminant Analysis (LDA) models, the class population mean is always estimated by the class sample average. In small sample size problems, such as face and palm recognition, however, the class sample average does not suffice to provide an accurate estimate of the class population mean based on a few of the given samples, particularly when there are outliers in the training set. To overcome this weakness, the class median vector is used to estimate the class population mean in LDA modeling. The class median vector has two advantages over the class sample average: (1) the class median (image) vector preserves useful details in the sample images, and (2) the class median vector is robust to outliers that exist in the training sample set. In addition, a weighting mechanism is adopted to refine the characterization of the within-class scatter so as to further improve the robustness of the proposed model. The proposed Median Fisher Discriminator (MFD) method was evaluated using the Yale and the AR face image databases and the PolyU(Polytechnic University) palmprint database. The experimental results demonstrated the robustness and effectiveness of the proposed method.
  • HAO Guoshun, MA Shilong, LV Jianghua, SUI Yuefei
    Data integration is the issue of retrieving and combining data residing at distributed and heterogeneous sources, and of providing users with transparent access without being aware of the details of the sources. Data integration is a very important issue because it deals with data infrastructure issues of coordinated computing systems. Despite its importance, the following key challenges make data integration one of the longest standing problems around: 1) how to solve the system heterogeneity; 2) how to build a global model; 3) how to process queries automatically and correctly; and 4) how to solve semantic heterogeneity.This paper presents an extended dynamic description logic language to describe systems with dynamic actions. By this language, a universal and unified model for relational database systems and a model for data integration are presented. This paper presents a universal and unified description logic model for relational databases. The model is universal because any relational database system can be automatically transformed to the model; it is unified because it integrates three essential components of relational databases together: description logic knowledge bases modeling the relational data, atomic modalities modeling the atomic relational operations, and combined modalities modeling the combined relational operations – queries.Furthermore, a description logic model for data integration is proposed which contains four layers of ontologies. Based on the model, a solution for each key challenge is proposed: a universal model eliminates system heterogeneity; a novel global model including three ontologies is proposed with some important benefits; a query process mechanism is provided by which user queries can be decomposed to queries over the sources; and for solving the semantic heterogeneity, this paper provides a framework under which semantic relations can be expressed and inferred.In summary, this paper presents a dynamic knowledge base framework by an extended description logic language. Under the framework, databases and data integration systems are modeled, the query processing problem is converted into a semantic-preserving rewriting problem, and many other issues of data integration can be formally studied.