There is a trend that, virtually everyone, ranging from big Web companies to traditional enterprisers to physical science researchers to social scientists, is either already experiencing or anticipating unprecedented growth in the amount of data available in their world, as well as new opportunities and great untapped value. This paper reviews big data challenges from a data management respective. In particular, we discuss big data diversity, big data reduction, big data integration and cleaning, big data indexing and query, and finally big data analysis and mining. Our survey gives a brief overview about big-data-oriented research and problems.
With computing systems undergone a fundamental transformation from single-processor devices at the turn of the century to the ubiquitous and networked devices and the warehouse-scale computing via the cloud, the parallelism has become ubiquitous at many levels. At micro level, parallelisms are being explored from the underlying circuits, to pipelining and instruction level parallelism on multi-cores or many cores on a chip as well as in a machine. From macro level, parallelisms are being promoted from multiple machines on a rack, many racks in a data center, to the globally shared infrastructure of the Internet.With the push of big data, we are entering a new era of parallel computing driven by novel and ground breaking research innovation on elastic parallelism and scalability. In this paper, we will give an overview of computing infrastructure for big data processing, focusing on architectural, storage and networking challenges of supporting big data paper.We will briefly discuss emerging computing infrastructure and technologies that are promising for improving data parallelism, task parallelism and encouraging vertical and horizontal computation parallelism.
Location-based social network (LBSN) is at the forefront of emerging trends in social network services (SNS) since the users in LBSN are allowed to “check-in” the places (locations) when they visit them. The accurate geographical and temporal information of these check-in actions are provided by the end-user GPS-enabled mobile devices, and recorded by the LBSN system. In this paper, we analyze and mine a big LBSN data, Gowalla, collected by us. First, we investigate the relationship between the spatio-temporal cooccurrences and social ties, and the results show that the cooccurrences are strongly correlative with the social ties. Second, we present a study of predicting two users whether or not they will meet (co-occur) at a place in a given future time, by exploring their check-in habits. In particular, we first introduce two new concepts,
In order to provide citizens with safe, convenient and comfortable services and infrastructure in a metropolis, the prediction of passenger flows in the metro-net of subway system has become more important than ever before. Although a great number of prediction methods have been presented in the field of transportation, all of them belong to the station oriented approach, which is not well suited to the Beijing subway system. This paper proposes a novel metro-net oriented method, called the probability tree based passenger flow model, which is also based on historic origin-destination (OD) information. First it learns and obtains the appearance probabilities for each kind of OD pair. For the real-time origin datum, the destination datum is calculated, and then several kinds of passenger flow in the metro-net can be predicted by gathering all the contributions. The results of experiments, using the historical data of Beijing subway, show that although the proposed method has lower performance than existing prediction approaches for forecasting exit passenger flows, it is able to predict several additional kinds of passenger flow in stations and throughout the subway system; and it is a more feasible, suitable, and advanced passenger flow prediction model for Beijing subway system.
Micro array technologies have become a widespread research technique for biomedical researchers to assess tens of thousands of gene expression values simultaneously in a single experiment. Micro array data analysis for biological discovery requires computational tools. In this research a novel two-dimensional hierarchical clustering is presented. From the review, it is evident that the previous research works have used clustering which have been applied in gene expression data to create only one cluster for a gene that leads to biological complexity. This is mainly because of the nature of proteins and their interactions. Since proteins normally interact with different groups of proteins in order to serve different biological roles, the genes that produce these proteins are therefore expected to co express with more than one group of genes. This constructs that in micro array gene expression data, a gene may makes its presence in more than one cluster. In this research, multi-level micro array clustering, performed in two dimensions by the proposed two-dimensional hierarchical clustering technique can be used to represent the existence of genes in one or more clusters consistent with the nature of the gene and its attributes and prevent biological complexities.
Periodic control systems (PCSs) are widely used in real-time embedded system domain. However, traditional manual requirement analysis assumes the expert knowledge, which is laborious and error-prone. This paper proposes a novel requirement analysis approach, which supports the automated validation of the informal requirement specifications. Based on the normalized initial requirement documents, our approach can construct an intermediate SPARDL model with both formal syntax and semantics. To check the overall system behaviors, our approach can transform the SPARDL models into executable code for simulation. The derived prototype simulator from SPARDL models enables the testing-based system behavior validation. Moreover, our approach enables the analysis of the dataflow relations in SPARDL models. By revealing input/output and affecting relations, our dataflow analysis techniques can help software engineers to figure out the potential data dependencies between SPARDL modules. This is very useful for the module reuse when a new version of the system is developed. A study of our approach using an industry design demonstrates the practicality and effectiveness of our approach.
Confinement is used to prohibit safety-critical objects from unintended access. Approaches for specifying and verifying confinement have been proposed in the last twenty years but their application has been help back. We develop a novel framework for specifying and verifying object confinement in object-oriented (OO) programs. Instead of expressing the confinement requirements within a class for possible future usage, as with ownership types, we specify confinement requirements of the class in its usage class which actually intends to confine the parts, i.e., internal representations. Syntactically, an optional conf clause is introduced in class declarations for annotating the confined attribute-paths. A “same type and confinement” notation is introduced for expressing type and confinement dependence among variables, parameters, and return values of methods. Based on the extension to a Java-like language and existing techniques of alias analysis, we define a sound type-system for checking the wellconfinedness of OO programswith respect to the confinement specifications.
In the research of software reuse, feature models have been widely adopted to capture, organize and reuse the requirements of a set of similar applications in a software domain. However, the construction, especially the refinement, of feature models is a labor-intensive process, and there lacks an effective way to aid domain engineers in refining feature models. In this paper, we propose a new approach to support interactive refinement of feature models based on the view updating technique. The basic idea of our approach is to first extract features and relationships of interest from a possibly large and complicated feature model, then organize them into a comprehensible view, and finally refine the feature model through modifications on the view. The main characteristics of this approach are twofold: a set of powerful rules (as the slicing criterion) to slice the feature model into a view automatically, and a novel use of a bidirectional transformation language to make the view updatable. We have successfully developed a tool, and a nontrivial case study shows the feasibility of this approach.
The global avalanche characteristics (the sumof- squares indicator and the absolute indicator) measure the overall avalanche characteristics of a cryptographic Boolean function. Sung et al. (1999) gave the lower bound on the sumof- squares indicator for a balanced Boolean function satisfying the propagation criterion with respect to some vectors. In this paper, if balanced Boolean functions satisfy the propagation criterion with respect to some vectors, we give three necessary and sufficient conditions on the auto-correlation distribution of these functions reaching the minimum the bound on the sum-of-squares indicator. And we also find all Boolean functions with 3-variable, 4-variable, and 5-variable reaching the minimum the bound on the sum-of-squares indicator.
In certified email (CEM) protocols, trusted third party (TTP) transparency is an important security requirement which helps to avoid bad publicity as well as protecting individual users’ privacy. Cederquist et al. proposed an optimistic certified email protocol, which employs key chains to reduce the storage requirement of the TTP. We extend their protocol to satisfy the property of TTP transparency, using existing verifiably encrypted signature schemes. An implementation with the scheme based on bilinear pairing makes our extension one of the most efficient CEM protocols satisfying strong fairness, timeliness, and TTP transparency. We formally verify the security requirements of the extended protocol. The properties of fairness, timeliness and effectiveness are checked in the model checker Mocha, and TTP transparency is formalised and analysed using the toolsets
This paper introduces a novel interconnection network called KMcube (Kautz-M?bius cube). KMcube is a compound graph of a Kautz digraph and M?bius cubes. That is, it uses the M?bius cubes as the unit cluster and connects many such clusters by means of a Kautz digraph at the cost of only one additional arc being added to any node in each M?bius cubes. The topological benefits of both basic graphs are preserved in the compound network. It utilizes the topological properties of M?bius cubes to conveniently embed parallel algorithms into each cluster and the short diameter of a Kautz digraph to support efficient inter-cluster communication. Additionally, KMcube provides other attractive properties, such as the regularity, symmetry, and expandability. The proposed methodology for KMcube is further applied to the compound graphs of Kautz digraph and other M?bius-like graphs with the similar diameter to a M?bius cube.Moreover, other hybrid graphs of Kautz digraph and M?bius cubes are proposed and compared.