Data deduplication has been widely utilized in large-scale storage systems, particularly backup systems. Data deduplication systems typically divide data streams into chunks and identify redundant chunks by comparing chunk fingerprints. Maintaining all fingerprints in memory is not cost-effective because fingerprint indexes are typically very large. Many data deduplication systems maintain a fingerprint cache in memory and exploit fingerprint prefetching to accelerate the deduplication process. Although fingerprint prefetching can improve the performance of data deduplication systems by leveraging the locality of workloads, inaccurately prefetched fingerprints may pollute the cache by evicting useful fingerprints. We observed that most of the prefetched fingerprints in a wide variety of applications are never used or used only once, which severely limits the performance of data deduplication systems. We introduce a prefetch-aware fingerprint cache management scheme for data deduplication systems (PreCache) to alleviate prefetch-related cache pollution. We propose three prefetch-aware fingerprint cache replacement policies (PreCache-UNU, PreCache-UOO, and PreCache-MIX) to handle different types of cache pollution. Additionally, we propose an adaptive policy selector to select suitable policies for prefetch requests. We implement PreCache on two representative data deduplication systems (Block Locality Caching and SiLo) and evaluate its performance utilizing three real-world workloads (Kernel, MacOS, and Homes). The experimental results reveal that PreCache improves deduplication throughput by up to 32.22% based on a reduction of on-disk fingerprint index lookups and improvement of the deduplication ratio by mitigating prefetch-related fingerprint cache pollution.
Finding entities of interest in indoor commercial places, such as the merchandise in supermarkets and shopping malls, is an essential issue for customers, especially when they are unfamiliar with an ad hoc indoor environment. This type of location-based indoor service requires comprehensive knowledge of indoor entities, including locations as well as their semantic information. However, the existing indoor localization approaches fail to directly localize these general entities without dedicated devices. This paper first focuses on the problem of discovering large-scale general entities of interest in indoor commercial spaces without pre-deployed infrastructure.We present a unique entity localization approach that leverages the localization results from multiple independent users to accurately determine the location of corresponding entities. Our key idea is to exploit the short-distance estimation with dead reckoning to guarantee the accuracy of entity localization. We develop a prototype system based on the crowdsourcing method, iScan, and test it in one of the biggest supermarkets in Changsha, China, to validate the performance of our design. Extensive experimental results show that our approach can achieve meter-level accuracy in a single day with 70 participants. Moreover, in a monthly evaluation with 500 effective participants, iScan discovered more than 200 entities and localized approximately 75% of them within 2 m.
Cloud computing users are faced with a wide variety of services to choose from. Consequently, a number of cloud service brokers (CSBs) have emerged to help users in their service selection process. This paper reviews the recent approaches that have been introduced and used for cloud service brokerage and discusses their challenges accordingly. We propose a set of attributes for a CSB to be considered effective. Different CSBs’ approaches are classified as either single service or multiple service models. The CSBs are then assessed, analyzed, and compared with respect to the proposed set of attributes. Based on our studies, CSBs with multiple service models that support more of the proposed effective CSB attributes have wider application in cloud computing environments.
Barely acceptable block I/O performance prevents virtualization from being widely used in the High-Performance Computing field. Although the virtio paravirtual framework brings great I/O performance improvement, there is a sharp performance degradation when accessing high-performance NAND-flash-based devices in the virtual machine due to their data parallel design. The primary cause of this fact is the deficiency of block I/O parallelism in hypervisor, such as KVM and Xen. In this paper, we propose a novel design of block I/O layer for virtualization, named VBMq. VBMq is based on virtio paravirtual I/O model, aiming to solve the block I/O parallelism issue in virtualization. It uses multiple dedicated I/O threads to handle I/O requests in parallel. In the meanwhile, we use polling mechanism to alleviate overheads caused by the frequent context switches of the VM’s notification to and from its hypervisor. Each dedicated I/O thread is assigned to a non-overlapping core to improve performance by avoiding unnecessary scheduling. In addition, we configure CPU affinity to optimize I/O completion for each request. The CPU affinity setting is very helpful to reduce CPU cache miss rate and increase CPU efficiency. The prototype system is based on Linux 4.1 kernel and QEMU 2.3.1. Our measurements show that the proposed method scales graciously in the multi-core environment, and provides performance which is 39.6x better than the baseline at most, and approaches bare-metal performance.
Solid-state drives (SSDs) have been widely used as caching tier for disk-based RAID systems to speed up dataintensive applications. However, traditional cache schemes fail to effectively boost the parity-based RAID storage systems (e.g., RAID-5/6), which have poor random write performance due to the small-write problem. What’s worse, intensive cache writes can wear out the SSD quickly, which causes performance degradation and cost increment. In this article, we present the design and implementation of KDD, an efficient SSD-based caching system which Keeps Data and Deltas in SSD. When write requests hit in the cache, KDD dispatches the data to the RAID storage without updating the parity blocks to mitigate the small write penalty, and compactly stores the compressed deltas in SSD to reduce the cache write traffic while guaranteeing reliability in case of disk failures. In addition, KDD organizes the metadata partition on SSD as a circular log to make the cache persistent with low overhead.We evaluate the performance of KDD via both simulations and prototype implementations. Experimental results show that KDD effectively reduces the small write penalty while extending the lifetime of the SSD-based cache by up to 6.85 times.
Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others. Identifying entities across heterogeneous data sources is paramount to many research fields, such as data cleaning, data integration, information retrieval and machine learning. The aligning process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources, but also need to handle heterogeneous entity attributes. In this paper, we propose an unsupervised approach, called EnAli, to match entities across two or more heterogeneous data sources. EnAli employs a generative probabilistic model to incorporate the heterogeneous entity attributes via employing exponential family, handle missing values, and also utilize the locality sensitive hashing schema to reduce the candidate tuples and speed up the aligning process. EnAli is highly accurate and efficient even without any ground-truth tuples. We illustrate the performance of EnAli on re-identifying entities from the same data source, as well as aligning entities across three real data sources. Our experimental results manifest that our proposed approach outperforms the comparable baseline.
The traditional dynamic random-access memory (DRAM) storage medium can be integrated on chips via modern emerging 3D-stacking technology to architect a DRAM shared cache in multicore systems. Compared with static random-access memory (SRAM), DRAM is larger but slower. In the existing research, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together in shared cache systems, ranging from SRAM structure improvement to optimizing cache tags and data access. However, little attention has been paid to designing a shared cache scheduling scheme for multiprogrammed workloads with different memory footprints in multicore systems. Motivated by this, we propose a hybrid shared cache scheduling scheme that allows a multicore system to utilize SRAM and 3D-stacked DRAM efficiently, thus achieving better workload performance. This scheduling scheme employs (1) a cache monitor, which is used to collect cache statistics; (2) a cache evaluator, which is used to evaluate the cache information during the process of programs being executed; and (3) a cache switcher, which is used to self-adaptively choose SRAM or DRAM shared cache modules. A cache data migration policy is naturally developed to guarantee that the scheduling scheme works correctly. Extensive experiments are conducted to evaluate the workload performance of our proposed scheme. The experimental results showed that our method can improve the multiprogrammed workload performance by up to 25% compared with state-of-the-art methods (including conventional and DRAM cache systems).
Recent advances in supervised salient object detection modeling has resulted in significant performance improvements on benchmark datasets. However, most of the existing salient object detection models assume that at least one salient object exists in the input image. Such an assumption often leads to less appealing saliencymaps on the background images with no salient object at all. Therefore, handling those cases can reduce the false positive rate of a model. In this paper, we propose a supervised learning approach for jointly addressing the salient object detection and existence prediction problems. Given a set of background-only images and images with salient objects, as well as their salient object annotations, we adopt the structural SVM framework and formulate the two problems jointly in a single integrated objective function: saliency labels of superpixels are involved in a classification term conditioned on the salient object existence variable, which in turn depends on both global image and regional saliency features and saliency labels assignments. The loss function also considers both image-level and regionlevel mis-classifications. Extensive evaluation on benchmark datasets validate the effectiveness of our proposed joint approach compared to the baseline and state-of-the-art models.
Design patterns are often used in the development of object-oriented software. It offers reusable abstract information that is helpful in solving recurring design problems. Detecting design patterns is beneficial to the comprehension and maintenance of object-oriented software systems. Several pattern detection techniques based on static analysis often encounter problems when detecting design patterns for identical structures of patterns. In this study, we attempt to detect software design patterns by using software metrics and classification-based techniques. Our study is conducted in two phases: creation of metrics-oriented dataset and detection of software design patterns. The datasets are prepared by using software metrics for the learning of classifiers. Then, pattern detection is performed by using classification-based techniques. To evaluate the proposed method, experiments are conducted using three open source software programs, JHotDraw, QuickUML, and JUnit, and the results are analyzed.
Delay optimization has recently attracted significant attention. However, few studies have focused on the delay optimization of mixed-polarity Reed-Muller (MPRM) logic circuits. In this paper, we propose an efficient delay optimization approach (EDOA) for MPRM logic circuits under the unit delay model, which can derive an optimal MPRM logic circuit with minimum delay. First, the simplest MPRM expression with the fewest number of product terms is obtained using a novel Reed-Muller expression simplification approach (RMESA) considering don’t-care terms. Second, a minimum delay decomposition approach based on a Huffman tree construction algorithm is utilized on the simplestMPRM expression. Experimental results on MCNC benchmark circuits demonstrate that compared to the Berkeley SIS 1.2 and ABC, the EDOA can significantly reduce delay for most circuits. Furthermore, for a few circuits, while reducing delay, the EDOA incurs an area penalty.