Multicast transfer can efficiently save the bandwidth consumption and reduce the load on the source node than a series of independent unicast transfers. Nowadays, many applications employ the content replica strategy to improve the robustness and efficiency; hence each file and its replicas are usually distributed among multiple sources. In such scenarios, the traditional deterministic multicast develops into the Uncertain multicast, which has more flexibility in the source selection. In this paper, we focus on building and maintaining a minimal cost forest (MCF) for any uncertain multicast, whose group members (source nodes and destination nodes) may join or leave after constructing a MCF. We formulate this dynamic minimal cost forest (DMCF) problem as a mixed integer programming model. We then design three dedicated methods to approximate the optimal solution. Among them, our a-MCF aims to efficiently construct an MCF for any given uncertain multicast, without dynamic behaviors of multicast group members. The d-MCF method motivates to slightly update the existing MCF via local modifications once appearing a dynamic behavior. It can achieve the balance between the minimal cost and the minimal modifications to the existing forest. The last r-MCF is a supplement method to the d-MCF method, since many rounds of local modifications maymake the resultant forest far away from the optimal forest. Accordingly, our r-MCF method monitors the accumulated degradation and triggers the rearrangement process to reconstruct an new MCF when necessary. The comprehensive evaluation results demonstrate that our methods can well tackle the proposed DMCF problem.
Uncertain data are data with uncertainty information, which exist widely in database applications. In recent years, uncertainty in data has brought challenges in almost all database management areas such as data modeling, query representation, query processing, and data mining. There is no doubt that uncertain data management has become a hot research topic in the field of data management. In this study, we explore problems in managing uncertain data, present state-of-the-art solutions, and provide future research directions in this area. The discussed uncertain data management techniques include data modeling, query processing, and data mining in uncertain data in the forms of relational, XML, graph, and stream.
In recent years, the rapid development of Internet of Things and sensor networks makes the time series data experiencing explosive growth. OpenTSDB and other emerging systems begin to use Hadoop, HBase to store massive time series data, and how to use these platforms to query and mine time series data has become a current research hotspot. As a typical time series distance measurementmethod, correlation coefficient is widely used in various applications. However, it requires a large amount of I/O and network transmission to compute the correlation coefficient of long time sequence on HBase in real time, and therefore cannot be applied to interactive query. To address this problem, in this paper, we present two methods to estimate the correlation coefficients of two sequences on HBase. We first propose a fast estimation algorithm for the upper and lower bounds of correlation coefficient, named as DCE. In order to further reduce the cost of I/O, we extend the DCE algorithm, and propose the ADCE algorithm, which can estimate the correlation coefficient quickly with an iterative manner. Experiments show that the algorithms proposed in this paper can quickly calculate the correlation coefficient of the long time series.
In this paper, we tackle the problem of segmenting out a sequence of actions from videos. The videos contain background and actions which are usually composed of ordered sub-actions. We refer the sub-actions and the background as semantic units. Considering the possible overlap between two adjacent semantic units, we propose a bidirectional sliding window method to generate the label distributions for various segments in the video. The label distribution covers a certain number of semantic unit labels, representing the degree to which each label describes the video segment. The mapping from a video segment to its label distribution is then learned by a Label Distribution Learning (LDL) algorithm. Based on the LDL model, a soft video parsing method with segmental regular grammars is proposed to construct a tree structure for the video. Each leaf of the tree stands for a video clip of background or sub-action. The proposed method shows promising results on the THUMOS’14, MSR-II and UCF101 datasets and its computational complexity is much less than the compared state-of-the-art video parsing method.
Although many graph processing systems have been proposed, graphs in the real-world are often dynamic. It is important to keep the results of graph computation up-todate. Incremental computation is demonstrated to be an efficient solution to update calculated results. Recently, many incremental graph processing systems have been proposed to handle dynamic graphs in an asynchronous way and are able to achieve better performance than those processed in a synchronous way. However, these solutions still suffer from suboptimal convergence speed due to their slow propagation of important vertex state (important to convergence speed) and poor locality. In order to solve these problems, we propose a novel graph processing framework. It introduces a dynamic partition method to gather the important vertices for high locality, and then uses a priority-based scheduling algorithm to assign them with a higher priority for an effective processing order. By such means, it is able to reduce the number of updates and increase the locality, thereby reducing the convergence time. Experimental results show that our method reduces the number of updates by 30%, and reduces the total execution time by 35%, compared with state-of-the-art systems.
Image reranking is an effective post-processing step to adjust the similarity order in image retrieval. As key components of initialized ranking lists, top-ranked neighborhoods of a given query usually play important roles in constructing dissimilarity measure. However, the number of pertinent candidates varies with respect to different queries. Thus the images with short lists of ground truth suffer from insufficient contextual information. It consequently introduces noises when using k-nearest neighbor rule to define the context. In order to alleviate this problem, this paper proposes auxiliary points which are added as assistant neighbors in an unsupervised manner. These extra points act on revealing implicit similarity in the metric space and clustering matched image pairs. By isometrically embedding each constructed metric space into the Euclidean space, the image relationships on underlying topological manifolds are locally represented by distance descriptions. Furthermore, by combining Jaccard index with our auxiliary points, we present a contextual modeling on auxiliary points (CMAP) method for image reranking.With richer contextual activations, the Jaccard similarity coefficient defined by local distribution achieves more reliable outputs as well as more stable parameters. Extensive experiments demonstrate the robustness and effectiveness of the proposed method.
Co-training is a famous semi-supervised learning algorithm which can exploit unlabeled data to improve learning performance. Generally it works under a two-view setting (the input examples have two disjoint feature sets in nature), with the assumption that each view is sufficient to predict the label. However, in real-world applications due to feature corruption or feature noise, both views may be insufficient and co-training will suffer from these insufficient views. In this paper, we propose a novel algorithm named Weighted Co-training to deal with this problem. It identifies the newly labeled examples that are probably harmful for the other view, and decreases their weights in the training set to avoid the risk. The experimental results show that Weighted Co-training performs better than the state-of-art co-training algorithms on several benchmarks.
In this paper, a novel approach, Inforence, is proposed to isolate the suspicious codes that likely contain faults. Inforence employs a feature selection method, based on mutual information, to identify those bug-related statements that may cause the program to fail. Because the majority of a program faults may be revealed as undesired joint effect of the program statements on each other and on program termination state, unlike the state-of-the-art methods, Inforence tries to identify and select groups of interdependent statements which altogether may affect the program failure. The interdependence amongst the statements is measured according to their mutual effect on each other and on the program termination state. To provide the context of failure, the selected bug-related statements are chained to each other, considering the program static structure. Eventually, the resultant causeeffect chains are ranked according to their combined causal effect on program failure. To validate Inforence, the results of our experimentswith seven sets of programs include Siemens suite, gzip, grep, sed, space, make and bash are presented. The experimental results are then compared with those provided by different fault localization techniques for the both single-fault and multi-fault programs. The experimental results prove the outperformance of the proposed method compared to the state-of-the-art techniques.
This paper presents a novel tracking algorithm which integrates two complementary trackers. Firstly, an improved Bayesian tracker(B-tracker) with adaptive learning rate is presented. The classification score of B-tracker reflects tracking reliability, and a low score usually results from large appearance change. Therefore, if the score is low, we decrease the learning rate to update the classifier fast so that B-tracker can adapt to the variation and vice versa. In this way, B-tracker is more suitable than its traditional version to solve appearance change problem. Secondly, we present an improved incremental subspace learning method tracker(Stracker). We propose to calculate projected coordinates using maximum posterior probability, which results in a more accurate reconstruction error than traditional subspace learning tracker. Instead of updating at every time, we present a stopstrategy to deal with occlusion problem. Finally, we present an integrated framework(BAST), in which the pair of trackers run in parallel and return two candidate target states separately. For each candidate state, we define a tracking reliability metrics to measure whether the candidate state is reliable or not, and the reliable candidate state will be chosen as the target state at the end of each frame. Experimental results on challenging sequences show that the proposed approach is very robust and effective in comparison to the state-of-the-art trackers.
Blockchain(BC), as an emerging distributed database technology with advanced security and reliability, has attracted much attention from experts who devoted to e-finance, intellectual property protection, the Internet of Things (IoT) and so forth. However, the inefficient transaction processing speed, which hinders the BC’s widespread, has not been well tackled yet. In this paper, we propose a novel architecture, called Dual-Channel Parallel Broadcast model (DCPB), which could address such a problem to a greater extent by using three methods which are dual communication channels, parallel pipeline processing and block broadcast strategy. In the dual-channel model, one channel processes transactions, and the other engages in the execution of BFT. The parallel pipeline processing allows the system to operate asynchronously. The block generation strategy improves the efficiency and speed of processing. Extensive experiments have been applied to BeihangChain, a simplified prototype for BC system, illustrates that its transaction processing speed could be improved to 16K transaction per second which could well supportmany real-world scenarios such as BC-based energy trading system andMicro-film copyright trading system in CCTV.