Network is dynamic and requires update in the operation. However, many confusions and problems can be caused by careless schedule in the update process. Although the problem has been investigated for many years in traditional networks where the control plane is distributed, software defined networking (SDN) brings new opportunities and solutions to this problem by the separation of control and data plane, as well as the centralized control. This paper makes a survey on the problems caused by network update, including forwarding loop, forwarding black hole, link congestion, network policy violation, etc., as well as the state-of-the-art SDN solutions to these problems. Furthermore, we summarize the network configuration strength and discuss the open issues of network update in the SDN paradigm.
As one of the most classic fields in computer vision, image categorization has attracted widespread interests. Numerous algorithms have been proposed in the community, and many of them have advanced the state-of-the-art. However, most existing algorithms are designed without consideration for the supply of computing resources. Therefore, when dealing with resource constrained tasks, these algorithms will fail to give satisfactory results. In this paper, we provide a comprehensive and in-depth introduction of recent developments of the research in image categorization with resource constraints. While a large portion is based on our own work, we will also give a brief description of other elegant algorithms. Furthermore, we make an investigation into the recent developments of deep neural networks, with a focus on resource constrained deep nets.
How to identify the critical links of the urban road network for actual traffic management and intelligent transportation control is an urgent problem, especially in the congestion environment. Most previous methods focus on traffic static characteristics for traffic planning and design. However, actual traffic management and intelligent control need to identify relevant sections by dynamic traffic information for solving the problems of variable transportation system. Therefore, a city-wide traffic model that consists of three relational algorithms, is proposed to identify significant links of the road network by using macroscopic fundamental diagram (MFD) as traffic dynamic characteristics. Firstly, weightedtraffic flow and density extraction algorithm is provided with simulation modeling and regression analysis methods, based on MFD theory. Secondly, critical links identification algorithm is designed on the first algorithm, under specified principles. Finally, threshold algorithm is developed by cluster analysis. In addition, the algorithms are analyzed and applied in the simulation experiment of the road network of the central district in Hefei city, China. The results show that the model has good maneuverability and improves the shortcomings of the threshold judged by human. It provides an approach to identify critical links for actual traffic management and intelligent control, and also gives a new method for evaluating the planning and design effect of the urban road network.
Bike sharing systems are booming globally as a green and flexible transportationmode, but the flexibility also brings difficulties in keeping the bike stations balanced with enough bikes and docks. Understanding the spatio-temporal bike trip patterns in a bike sharing system, such as the popular trip origins and destinations during rush hours, is important for researchers to design models for bike scheduling and station management. However, due to privacy and operational concerns, bike trip data are usually not publicly available in many cities. Instead, the station feeds about real-time bike and dock number in stations are usually public, which we refer to as bike sharing system open data. In this paper, we propose an approach to infer the spatio-temporal bike trip patterns from the public station feeds. Since the number of possible trips (i.e., origin-destination station pairs) is much larger than the number of stations, we define the trip inference as an ill-posed inverse problem. To solve this problem, we identify the sparsity and locality properties of bike trip patterns, and propose a sparse and weighted regularization model to impose both properties in the solution. We evaluate our method using real-world data fromWashington, D.C. and New York City. Results show that our method can effectively infer the spatio-temporal bike trip patterns and outperform the baselines in both cities.
People often have to queue for a busy service in many places around a city, and knowing the queue time can be helpful for making better activity plans to avoid long queues. Traditional solutions to the queue time monitoring are based on pre-deployed infrastructures, such as cameras and infrared sensors, which are costly and fail to deliver the queue time information to scattered citizens. This paper presents CrowdQTE, a mobile crowdsensing system, which utilizes the sensor-enhanced mobile devices and crowd human intelligence to monitor and provide real-time queue time information for various queuing scenarios. When people are waiting in a line, we utilize the accelerometer sensor data and ambient contexts to automatically detect the queueing behavior and calculate the queue time. When people are not waiting in a line, it estimates the queue time based on the information reported manually by participants. We evaluate the performance of the system with a two-week and 12-person deployment using commercially-available smartphones. The results demonstrate that CrowdQTE is effective in estimating queuing status.
To facilitate the travel preparation process to a city, a lot of work has been done to recommend a POI or a sequence of POIs automatically to satisfy users’ needs. However, most of the existing work ignores the issue of planning the detailed travel routes between POIs, leaving the task to online map services or commercial GPS navigators. Such a service or navigator in terms of suggesting the shortest travel distance or time, which cannot meet the diverse requirements of users. For instance, in the case of traveling by driving for leisure purpose, the scenic view along the travel routes would be of great importance to users, and a good planning service should put the sceneries of the route in higher priority rather than the distance or time taken. To this end, in this paper, we propose a novel framework called ScenicPlanner for route recommendation, leveraging a combination of geotagged image and check-in digital footprints from locationbased social networks (LBSNs). First, we enrich the road network and assign a proper scenic view score to each road segment to model the scenic road network, by extracting relevant information from geo-tagged images and check-ins. Then, we apply heuristic algorithms to iteratively add road segment and determine the travelling order of added road segments with the objective of maximizing the total scenic view score while satisfying the user-specified constraints (i.e., origin, destination and the total travel distance). Finally, to validate the efficiency and effectiveness of the proposed framework, we conduct extensive experiments on three real-world data sets from the Bay Area in the city of San Francisco, which contain a road network crawled from OpenStreetMap, more than 31 000 geo-tagged images generated by 1 571 Flickr users in one year, and 110 214 check-ins left by 15 680 Foursquare users in six months.
In data center, applications of big data analytics pose a big challenge to massive storage systems. It is significant to achieve high availability, high performance and high scalability for PB-scale or EB-scale storage systems. Metadata server (MDS) cluster architecture is one of the most effective solutions to meet the requirements of applications in data center. Workload migration can achieve load balance and energy saving of cluster systems. In this paper, a hybrid workload migration mechanism of MDS cluster is proposed and named as HWM. In HWM, workload of MDS is classified into two categories: metadata service and state service, and they can be migrated rapidly from a source MDS to a target MDS in different ways. Firstly, in metadata service migration, all the dirty metadata of one sub file system is flushed to a shared storage pool by the source MDS, and then is loaded by the target MDS. Secondly, in state service migration, all the states of that sub file system are migrated from source MDS to target MDS through network at file granularity, and then all of the related structures of these states are reconstructed in targetMDS. Thirdly, in the process of workload migration, instead of blocking client requests, the source MDS can decide which MDS will respond to each request according to the operation type and the migration stage. The proposed mechanismis implemented in the BlueWhaleMDS cluster. The performance measurements show that the HWM mechanism is efficient to migrate the workload of a MDS cluster system and provides low-latency access to metadata and states.
Data items are usually replicated in modern distributed data stores to obtain high performance and availability. However, the availability-consistency and latencyconsistency trade-offs exist in data replication, thus system designers intend to choose weak consistency models, such as eventual consistency, which may result in stale reads. Since stale data items may lead to serious application semantic problems, we consider how to increase the probability of data recency which provides a uniform view on recent versions of data items for all clients. In this work, we propose HARP, a framework that can enhance data recency of eventually consistent distributed data stores in an efficient and highly available way. Through detecting possible stale reads under failures or not, HARP can perform reread operations to eliminate stale results only when needed based on our analysis on write/read processes. We also present solutions on how to deal with some practical anomalies in HARP, including delayed, reordered and dropped messages and clock drift, and show how to extend HARP to multiple datacenters. Finally we implement HARP based on Cassandra, and the experiments show that HARP can effectively eliminate stale reads, with a low overhead (less than 6.9%) compared with original eventually consistent Cassandra.
Reducing the power consumption has become one of the most important challenges in designing modern data centers due to the explosive growth of data. The traditional approaches employed to decrease the power consumption normally do not consider the power of IT devices and the power of cooling system simultaneously. In contrast to existing works, this paper proposes a power model which can minimize the overall power consumption of data centers by balancing the computing power and cooling power. Furthermore, an enhanced genetic algorithm (EGA) is designed to explore the solution space of the power model since the model is a liner programming problem. However, EGA is computing intensive and the performance gradually decreases with the growth of the problem size. Therefore, heuristic greedy sequence (HGS) is proposed to simplify the calculation by leveraging the nature of greed. In contrast to EGA, HGS can determine the workload allocation of a specific data center layout with only one calculation. Experimental results demonstrate that both the EGA and HGS can significantly reduce the power consumption of data centers in contrast to the random algorithm. Additionally, HGS significantly outperforms EGA in terms of the continuity of workload allocation and execution performance.
Counting the cardinality of flows for massive high-speed traffic over sliding windows is still a challenging work under time and space constrains, but plays a key role in many network applications, such as traffic management and routing optimization in software defined network. In this paper, we propose a novel data structure (called LRU-Sketch) to address the problem. The significant contributions are as follows. 1) The proposed data structure adapts a well-known probabilistic sketch to sliding window model; 2) By using the least-recently used (LRU) replacement policy, we design a highly time-efficient algorithm for timely forgetting stale information, which takes constant (O(1)) time per time slot; 3) Moreover, a further memory-reducing schema is given at a cost of very little loss of accuracy; 4) Comprehensive experiments, performed on two real IP trace files, confirm that the proposed schema attains high accuracy and high time efficiency.
Recent years have witnessed a processor development trend that integrates central processing unit (CPU) and graphic processing unit (GPU) into a single chip. The integration helps to save some host-device data copying that a discrete GPU usually requires, but also introduces deep resource sharing and possible interference between CPU and GPU. This work investigates the performance implications of independently co-running CPU and GPU programs on these platforms. First, we perform a comprehensive measurement that covers a wide variety of factors, including processor architectures, operating systems, benchmarks, timing mechanisms, inputs, and power management schemes. These measurements reveal a number of surprising observations.We analyze these observations and produce a list of novel insights, including the important roles of operating system (OS) context switching and power management in determining the program performance, and the subtle effect of CPU-GPU data copying. Finally, we confirm those insights through case studies, and point out some promising directions to mitigate anomalous performance degradation on integrated heterogeneous processors.
Entity resolution is a key aspect in data quality and data integration, identifying which records correspond to the same real world entity in data sources. Many existing approaches require manually designed match rules to solve the problem, which always needs domain knowledge and is time consuming. We propose a novel genetic algorithm based entity resolution approach via active learning. It is able to learn effective match rules by logically combining several different attributes’ comparisons with proper thresholds. We use active learning to reduce manually labeled data and speed up the learning process. The extensive evaluation shows that the proposed approach outperforms the sate-of-the-art entity resolution approaches in accuracy.
Although amazing progress has been made in machine learning to achieve high generalization accuracy and efficiency, there is still very limited work on deriving meaningful decision-making actions from the resulting models. However, in many applications such as advertisement, recommendation systems, social networks, customer relationship management, and clinical prediction, the users need not only accurate prediction, but also suggestions on actions to achieve a desirable goal (e.g., high ads hit rates) or avert an undesirable predicted result (e.g., clinical deterioration). Existing works for extracting such actionability are few and limited to simple models such as a decision tree. The dilemma is that those models with high accuracy are often more complex and harder to extract actionability from. In this paper, we propose an effective method to extract actionable knowledge from additive tree models (ATMs), one of the most widely used and best off-the-shelf classifiers. We rigorously formulate the optimal actionable planning (OAP) problem for a given ATM, which is to extract an actionable plan for a given input so that it can achieve a desirable output while maximizing the net profit. Based on a state space graph formulation, we first propose an optimal heuristic search method which intends to find an optimal solution. Then, we also present a sub-optimal heuristic search with an admissible and consistent heuristic function which can remarkably improve the efficiency of the algorithm. Our experimental results demonstrate the effectiveness and efficiency of the proposed algorithms on several real datasets in the application domain of personal credit and banking.