MapReduce is a popular framework for largescale data analysis. As data access is critical forMapReduce’s performance, some recent work has applied different storage models, such as column-store or PAX-store, to MapReduce platforms. However, the data access patterns of different queries are very different. No storagemodel is able to achieve the optimal performance alone. In this paper, we study how MapReduce can benefit from the presence of two different column-store models — pure column-store and PAX-store. We propose a hybrid storage system called hybrid columnstore (HC-store). Based on the characteristics of the incoming MapReduce tasks, our storage model can determine whether to access the underlying pure column-store or PAX-store.We studied the properties of the different storage models and create a cost model to decide the data access strategy at runtime. We have implemented HC-store on top of Hadoop. Our experimental results show that HC-store is able to outperform PAX-store and column-store, especially when confronted with diverse workload.
Delta-based accumulative iterative computation (DAIC) model is currently proposed to support iterative algorithms in a synchronous or an asynchronous way. However, both the synchronous DAIC model and the asynchronous DAIC model only satisfy some given conditions, respectively, and perform poorly under other conditions either for high synchronization cost or for many redundant activations. As a result, the whole performance of both DAIC models suffers fromthe serious network jitter and load jitter caused bymultitenancy in the cloud. In this paper, we develop a system, namely HybIter, to guarantee the performance of iterative algorithms under different conditions. Through an adaptive execution model selection scheme, it can efficiently switch between synchronous and asynchronous DAIC model in order to be adapted to different conditions, always getting the best performance in the cloud. Experimental results show that our approach can improve the performance of current solutions up to 39.0%.
Increasingly mature cloud computing technology promotes virtual desktop technology, which can solve many problems existing in traditional computing models. However, virtual desktop solutions introduce the thorny problem of how to deliver a real desktop experience to users, as if they are using it locally, especially when playing video. The SPICE (simple protocol for independent computing environments) virtual desktop solution provides several image compression algorithms to address this problem with the purpose of making virtual desktops as real as possible. Although different compression algorithms can contribute their own abilities to different images to a large extent, switching between them is a big problem that consumes a large amount of resources to detect the different type of image and also causes jitter of the virtual desktop. This paper proposes a new solution, called SPICEx, using the JPEG2000 compression algorithm with dynamic compression ratios to solve the problem and finally validates that the performance is better than that of SPICE. With better quality of user experience and also reducing bandwidth consumption, SPICEx solution is meaningful in virtual desktop fields and can be widely used.
The compressed sensing (CS) theory makes sample rate relate to signal structure and content. CS samples and compresses the signal with far below Nyquist sampling frequency simultaneously. However, CS only considers the intra-signal correlations, without taking the correlations of the multi-signals into account. Distributed compressed sensing (DCS) is an extension of CS that takes advantage of both the inter- and intra-signal correlations, which is wildly used as a powerful method for the multi-signals sensing and compression in many fields. In this paper, the characteristics and related works of DCS are reviewed. The framework of DCS is introduced. As DCS’s main portions, sparse representation, measurement matrix selection, and joint reconstruction are classified and summarized. The applications of DCS are also categorized and discussed. Finally, the conclusion remarks and the further research works are provided.
In this paper, an efficient scheme for recognition of handwritten Odia numerals using hidden markov model (HMM) has been proposed. Three different feature vectors for each of the numeral is generated through a polygonal approximation of object contour. Subsequently, aggregated feature vector for each numeral is derived from these three primary feature vectors using a fuzzy inference system. The final feature vector is divided into three levels and interpreted as three different states for HMM. Ten different three-state ergodic hidden markov models (HMMs) are thus constructed corresponding to ten numeral classes and parameters are calculated from these models. For the recognition of a probe numeral, its log-likelihood against these models are computed to decide its class label. The proposed scheme is implemented on a dataset of 2500 handwritten samples and a recognition accuracy of 96.3% has been achieved. The scheme is compared with other competent schemes.
A well-designed graph plays a fundamental role in graph-based semi-supervised learning; however, the topological structure of a constructed neighborhood is unstable in most current approaches, since they are very sensitive to the high dimensional, sparse and noisy data. This generally leads to dramatic performance degradation. To deal with this issue, we developed a relative manifold based semi-supervised dimensionality reduction (RMSSDR) approach by utilizing the relative manifold to construct a better neighborhood graph with fewer short-circuit edges. Based on the relative cognitive law and manifold distance, a relative transformation is used to construct the relative space and the relative manifold. A relative transformation can improve the ability to distinguish between data points and reduce the impact of noise such that it may be more intuitive, and the relative manifold can more truly reflect the manifold structure since data sets commonly exist in a nonlinear structure. Specifically, RMSSDR makes full use of pairwise constraints that can define the edge weights of the neighborhood graph by minimizing the local reconstruction error and can preserve the global and local geometric structures of the data set. The experimental results on face data sets demonstrate that RMSSDR is better than the current state of the art comparing methods in both performance of classification and robustness.
In data analysis tasks, we are often confronted to very high dimensional data. Based on the purpose of a data analysis study, feature selection will find and select the relevant subset of features from the original features. Many feature selection algorithms have been proposed in classical data analysis, but very few in symbolic data analysis (SDA) which is an extension of the classical data analysis, since it uses rich objects instead to simple matrices. A symbolic object, compared to the data used in classical data analysis can describe not only individuals, but also most of the time a cluster of individuals. In this paper we present an unsupervised feature selection algorithm on probabilistic symbolic objects (PSOs), with the purpose of discrimination. A PSO is a symbolic object that describes a cluster of individuals by modal variables using relative frequency distribution associated with each value. This paper presents new dissimilarity measures between PSOs, which are used as feature selection criteria, and explains how to reduce the complexity of the algorithm by using the discrimination matrix.
This paper investigates the transition function and the reachability conditions of finite automata by using a semitensor product of matrices, which is a new powerful matrix analysis tool. The states and input symbols are first expressed in vector forms, then the transition function is described in an algebraic form. Using this algebraic representation, a sufficient and necessary condition of the reachability of any two states is proposed, based on which an algorithm is developed for discovering all the paths from one state to another. Furthermore, a mechanism is established to recognize the language acceptable by a finite automaton. Finally, illustrative examples show that the results/algorithms presented in this paper are suitable for both deterministic finite automata (DFA) and nondeterministic finite automata (NFA).
Atomic blocks, a high-level language construct that allows programmers to explicitly specify the atomicity of operations without worrying about the implementations, are a promising approach that simplifies concurrent programming. On the other hand, temporal logic is a successful model in logic programming and concurrency verification, but none of existing temporal programming models supports concurrent programming with atomic blocks yet. In this paper, we propose a temporal programming model (αPTL) which extends the projection temporal logic (PTL) to support concurrent programming with atomic blocks. The novel construct that formulates atomic execution of code blocks, which we call atomic interval formulas, is always interpreted over two consecutive states, with the internal states of the block being abstracted away. We show that the framing mechanism in projection temporal logic also works in the new model, which consequently supports our development of an executive language. The language supports concurrency by introducing a loose interleaving semantics which tracks only the mutual exclusion between atomic blocks. We demonstrate the usage of αPTL by modeling and verifying both the fine-grained and coarse-grained concurrency.
Various redundancy tactics can be modeled at the design stage of safety-critical systems thereby providing a set of fault-tolerance guidelines for subsequent development activities. However, existing approaches usually interweave redundancy tactics into the functional models making them complex and cluttered; the maintenance of such models is time-consuming and error-prone. To address this problem, we provide a modeling approach to separate the redundancy tactics from the base functional models using aspect-oriented modeling. More specifically, the conceptual models of the redundancy tactics and their semantic constraints are first defined for deriving the relevant aspects. Subsequently, a UML profile is proposed to specify the tactic aspects followed by mapping these concepts to the corresponding concepts of aspect-oriented modeling based on pre-defined principles. In accordance with our proposed profile, reuse directives are applied to handle the overlap of structural features between redundancy tactics and other kinds of tactic. Based on our tactic aspects and their configured attributes, a weaving algorithm is proposed to associate the tactic aspects with the base functional models. The proposed approach is compared with a traditional tactic modeling approach using two safety-critical systems, revealing that: 1) our approach significantly reduces the number of extra model elements needed in the tactic design stage; 2) our approach can largely avoid the impact of changing of the base functional model as the model evolves.
The traditional similar code detection approaches are limited in detecting semantically similar codes, impeding their applications in practice. In this paper, we have improved the traditional metrics-based approach as well as the graphbased approach and presented a metrics-based and graphbased combined approach. First, source codes are represented as augmented system dependence graphs. Then, metricsbased candidate similar code extraction is performed to filter out most of the dissimilar code pairs so as to lower the computational complexity. After that, code normalization is performed on the candidate similar codes to remove code variations so as to detect similar code at the semantic level. Finally, program matching is performed on the normalized control dependence trees to output semantically similar codes. Experiment results show that our approach can detect similar codes with code variations, and it can be applied to large software.
To ensure the authenticity and integrity of bundles, the in-transit PDUs of bundle protocol (BP) in space delay/disruption-tolerant networks (DTNs), the bundle security protocol specification (IRTF RFC6257) suggested using a digital signature directly over each bundle. However, when bundle fragment services are needed, this mechanism suffers from heavy computational costs, bandwidth overheads and energy consumption. In this paper, we address the fragment authentication issue for BP by exploiting the combination of RS error correction and erasure codes with the help of batch transmission characteristic of DTNs. The RS error correction and erasure codes are adopted to allow the receivers to locate the false/injected fragments and reconstruct the only one signature shared by all fragments of a bundle, even if some other fragments are lost or routed to a different path. Getting only partial authentic fragments, a DTN node is able to detect and filter the false/injected fragments, and authenticate the origin of a bundle as well. Such an approach tolerates high delays, unexpected link disruption and the BP nature of routing fragments of the same bundle possibly via different paths. The performance analysis demonstrates that both of our schemes, which follow our generic idea based on RS codes, significantly reduce bandwidth overheads and computational costs as compared to the prior works.
This paper discusses optimal binary codes and pure binary quantum codes created using Steane construction. First, a local search algorithm for a special subclass of quasi-cyclic codes is proposed, then five binary quasi-cyclic codes are built. Second, three classical construction methods are generalized for new codes from old such that they are suitable for constructing binary self-orthogonal codes, and 62 binary codes and six subcode chains of obtained self-orthogonal codes are designed. Third, six pure binary quantum codes are constructed from the code pairs obtained through Steane construction. There are 66 good binary codes that include 12 optimal linear codes, 45 known optimal linear codes, and nine known optimal self-orthogonal codes. The six pure binary quantum codes all achieve the performance of their additive counterparts constructed by quaternary construction and thus are known optimal codes.