Deduplication technology has been increasingly used to reduce storage costs. Though it has been successfully applied to backup and archival systems, existing techniques can hardly be deployed in primary storage systems due to the associated latency cost of detecting duplicated data, where every unit has to be checked against a substantially large fingerprint index before it is written. In this paper we introduce Leach, for inline primary storage, a self-learning in-memory fingerprints cache to reduce the writing cost in deduplication system. Leach is motivated by the characteristics of realworld I/O workloads: highly data skew exist in the access patterns of duplicated data. Leach adopts a splay tree to organize the on-disk fingerprint index, automatically learns the access patterns and maintains hot working sets in cachememory, with a goal to service a majority of duplicated data detection. Leveraging the working set property, Leach provides optimization to reduce the cost of splay operations on the fingerprint index and cache updates. In comprehensive experiments on several real-world datasets, Leach outperforms conventional LRU (least recently used) cache policy by reducing the number of cache misses, and significantly improves write performance without great impact to cache hits.
AGM postulates are for belief revision (revision by a single belief), and DP postulates are for iterated revision (revision by a finite sequence of beliefs). R-calculus is given for R-configurations Δ|┌, where Δ is a set of atomic formulas or the negations of atomic formulas, and ┌ is a finite set of formulas. We shall give two R-calculi
We present a symbolic-numeric hybrid method, based on sum-of-squares (SOS) relaxation and rational vector recovery, to compute inequality invariants and ranking functions for proving total correctness and generating preconditions for programs. The SOS relaxation method is used to compute approximate invariants and approximate ranking functions with floating point coefficients. Then Gauss-Newton refinement and rational vector recovery are applied to approximate polynomials to obtain candidate polynomials with rational coefficients, which exactly satisfy the conditions of invariants and ranking functions. In the end, several examples are given to show the effectiveness of our method.
Very large scale integration (VLSI) circuit partitioning is an important problem in design automation of VLSI chips and multichip systems; it is an NP-hard combinational optimization problem. In this paper, an effective hybrid multi-objective partitioning algorithm, based on discrete particle swarm optimzation (DPSO) with local search strategy, called MDPSO-LS, is presented to solve the VLSI twoway partitioning with simultaneous cutsize and circuit delay minimization. Inspired by the physics of genetic algorithm, uniform crossover and random two-point exchange operators are designed to avoid the case of generating infeasible solutions. Furthermore, the phenotype sharing function of the objective space is applied to circuit partitioning to obtain a better approximation of a true Pareto front, and the theorem of Markov chains is used to prove global convergence. To improve the ability of local exploration, Fiduccia-Matteyses (FM) strategy is also applied to further improve the cutsize of each particle, and a local search strategy for improving circuit delay objective is also designed. Experiments on ISCAS89 benchmark circuits show that the proposed algorithm is efficient.
In this paper, we propose a novel ECG arrhythmia classification method using power spectral-based features and support vector machine (SVM) classifier. The method extracts electrocardiogram’s spectral and three timing interval features. Non-parametric power spectral density (PSD) estimation methods are used to extract spectral features. The proposed approach optimizes the relevant parameters of SVM classifier through an intelligent algorithm using particle swarm optimization (PSO). These parameters are: Gaussian radial basis function (GRBF) kernel parameter σ and C penalty parameter of SVM classifier. ECG records from the MIT-BIH arrhythmia database are selected as test data. It is observed that the proposed power spectral-based hybrid particle swarm optimization-support vector machine (SVMPSO) classification method offers significantly improved performance over the SVM which has constant and manually extracted parameter.
Most present research into facial expression recognition focuses on the visible spectrum, which is sensitive to illumination change. In this paper, we focus on integrating thermal infrared data with visible spectrum images for spontaneous facial expression recognition. First, the active appearance model AAM parameters and three defined head motion features are extracted from visible spectrum images, and several thermal statistical features are extracted from infrared (IR) images. Second, feature selection is performed using the F-test statistic. Third, Bayesian networks BNs and support vector machines SVMs are proposed for both decision-level and feature-level fusion. Experiments on the natural visible and infrared facial expression (NVIE) spontaneous database show the effectiveness of the proposed methods, and demonstrate thermal IR images’ supplementary role for visible facial expression recognition.
The goal of infrared (IR) and visible image fusion is for the fused image to contain IR object features from the IR image and retain the visual details provided by the visible image. The disadvantage of traditional fusion method based on independent component analysis (ICA) is that the primary feature information that describes the IR objects and the secondary feature information in the IR image are fused into the fused image. Secondary feature information can depress the visual effect of the fused image. A novel ICA-based IR and visible image fusion scheme is proposed in this paper. ICA is employed to extract features from the infrared image, and then the primary and secondary features are distinguished by the kurtosis information of the ICA base coefficients. The secondary features of the IR image are discarded during fusion. The fused image is obtained by fusing primary features into the visible image. Experimental results show that the proposed method can provide better perception effect.
The value difference metric (VDM) is one of the best-known and widely used distance functions for nominal attributes. This work applies the instanceweighting technique to improveVDM. An instance weighted value difference metric (IWVDM) is proposed here. Different from prior work, IWVDM uses naive Bayes (NB) to find weights for training instances. Because early work has shown that there is a close relationship between VDM and NB, some work on NB can be applied to VDM. The weight of a training instance x, that belongs to the class c, is assigned according to the difference between the estimated conditional probability
Discovering the hierarchical structures of different classes of object behaviors can satisfy the requirements of various degrees of abstraction in association analysis, behavior modeling, data preprocessing, pattern recognition and decision making, etc. In this paper, we call this process as associative categorization, which is different from classical clustering, associative classification and associative clustering. Focusing on representing the associations of behaviors and the corresponding uncertainties, we propose the method for constructing a Markov network (MN) from the results of frequent pattern mining, called item-associative Markov network (IAMN), where nodes and edges represent the frequent patterns and their associations respectively. We further discuss the properties of a probabilistic graphical model to guarantee the IAMN’s correctness theoretically. Then, we adopt the concept of chordal to reflect the closeness of nodes in the IAMN. Adopting the algorithm for constructing join trees from an MN, we give the algorithm for IAMN-based associative categorization by hierarchical bottom-up aggregations of nodes. Experimental results show the effectiveness, efficiency and correctness of our methods.
The rapid increase of user-generated content (UGC) is a rich source for reputation management of entities, products, and services. Looking at online product reviews as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient attribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) approach to cluster attributes according to their semantic similarity. Experimental results on real world datasets show that the proposed approach is effective.
Recommender systems are one of the most important technologies in e-commerce to help users filter out the overload of information. However, current mainstream recommendation algorithms, such as the collaborative filtering CF family, have problems such as scalability and sparseness. These problems hinder further developments of recommender systems. We propose a new recommendation algorithm based on item quality and user rating preferences, which can significantly decrease the computing complexity. Besides, it is interpretable and works better when the data is sparse. Through extensive experiments on three benchmark data sets, we show that our algorithm achieves higher accuracy in rating prediction compared with the traditional approaches. Furthermore, the results also demonstrate that the problem of rating prediction depends strongly on item quality and user rating preferences, thus opens new paths for further study.
Nowadays movement patterns and people’s behavioral models are needed for traffic engineers and city planners. These observations could be used to reason about mobility and its sustainability and to support decision makers with reliable information. The very same knowledge about human diaspora and behavior extracted from these data is also valuable to the urban planner, so as to localize new services, organize logistics systems and to detect changes as they occur in the movement behavior. Moreover, it is interesting to investigate movement in places like a shopping area or a working district either for commercial purposes or for improving the service quality. These kinds of tracking data are made available by wireless and mobile communication technologies. It is now possible to record and collect a large amount of mobile phone calls in a city. Technologies for object tracking have recently become affordable and reliable and hence we were able to collect mobile phone data from a city in China from January 1, 2008 to December 31, 2008. The large amount of phone call records from mobile operators can be considered as life mates and sensors of persons to inform how many people are present in any given area and how many are entering or leaving. Each phone call record usually contains the caller and callee IDs, date and time, and the base station where the phone calls are made. As mobile phones are widely used in our daily life, many human behaviors can be revealed by analyzing mobile phone data. Through mobile phones, we can learn the information about locations, communications between mobile phone users during their daily lives.
In this work, we propose a comprehensive visual analysis system named as MViewer, Mobile phone spatiotemporal data Viewer, which is the first system to visualize and analyze the population’s mobility patterns from millions of phone call records. Our system consists of three major components: 1) visual analysis of user groups in a base station; 2) visual analysis of the mobility patterns on different user groups making phone calls in certain base stations; 3) visual analysis of handoff phone call records. Some well-established visualization techniques such as parallel coordinates and pixel-based representations have been integrated into our system. We also develop a novel visualization schemes, Voronoi-diagram-based visual encoding to reveal the unique features of mobile phone data. We have applied our system to real mobile phone datasets that are kindly provided by our project partners and obtained some interesting findings regarding people’s mobility patterns.
The use of traditional positioning technologies, such as GPS and wireless local positioning, rely on underlying infrastructure. However, in a subway environment, such positioning systems are not available for the positioning tasks, such as the detection of the train arrivals for the passengers in the train. An alternative approach is to exploit the contextual information available in the mobile devices of subway riders to detect train arrivals. To this end, we propose to exploit multiple contextual features extracted from the mobile devices of subway riders to precisely detecting train arrivals. Following this line, we first investigate potential contextual features which may be effective to detect train arrivals according to the observations from 3D accelerometers and GSM radio. Furthermore, we propose to explore the maximum entropy (MaxEnt) model for training a train arrival detector by learning the correlation between contextual features and train arrivals. Finally, we perform extensive experiments on several real-world data sets collected from two major subway lines in the Beijing subway system. Experimental results validate both the effectiveness and efficiency of the proposed approach.
Geometry mesh introduces user control into texture synthesis and editing, and brings more variations in the synthesized results. But still two problems related remain in need of better solutions. One problem is generating the meshes with desired size and pattern efficiently from easier user inputs. The other problem is improving the quality of synthesized results with mesh information. We present a new two-step texture design and synthesis method that addresses these two problems. Besides example texture, a small piece of mesh sketch drawn by hand or detected from example texture is input to our algorithm. And then a mesh synthesis method of geometry space is provided to avoid optimizations cell by cell. Distance and orientation features are introduced to improve the quality of mesh rasterization. Results show that with our method, users can design and synthesize textures from mesh sketches easily and interactively.