Android applications are becoming increasingly powerful in recent years. While their functionality is still of paramount importance to users, the energy efficiency of these applications is also gaining more and more attention. Researchers have discovered various types of energy defects in Android applications, which could quickly drain the battery power of mobile devices. Such defects not only cause inconvenience to users, but also frustrate Android developers as diagnosing the energy inefficiency of a software product is a non-trivial task. In this work, we perform a literature review to understand the state of the art of energy inefficiency diagnosis for Android applications. We identified 55 research papers published in recent years and classified existing studies from four different perspectives, including power estimation method, hardware component, types of energy defects, and program analysis approach. We also did a cross-perspective analysis to summarize and compare our studied techniques. We hope that our review can help structure and unify the literature and shed light on future research, as well as drawing developers' attention to build energy-efficient Android applications.
In this paper, we propose a lightweight network with an adaptive batch normalization module, called Meta-BN Net, for few-shot classification. Unlike existing few-shot learning methods, which consist of complex models or algorithms, our approach extends batch normalization, an essential part of current deep neural network training, whose potential has not been fully explored. In particular, a meta-module is introduced to learn to generate more powerful affine transformation parameters, known as
Deep neural networks have achieved great success in varieties of artificial intelligent fields. Since training a good deep model is often challenging and costly, such deep models are of great value and even the key commercial intellectual properties. Recently, deep model intellectual property protection has drawn great attention from both academia and industry, and numerous works have been proposed. However, most of them focus on the classification task. In this paper, we present the first attempt at protecting deep semantic segmentation models from potential infringements. In details, we design a new hybrid intellectual property protection framework by combining the trigger-set based and passport based watermarking simultaneously. Within it, the trigger-set based watermarking mechanism aims to force the network output copyright watermarks for a pre-defined trigger image set, which enables black-box remote ownership verification. And the passport based watermarking mechanism is to eliminate the ambiguity attack risk of trigger-set based watermarking by adding an extra passport layer into the target model. Through extensive experiments, the proposed framework not only demonstrates its effectiveness upon existing segmentation models, but also shows strong robustness to different attack techniques.
Crowd counting is recently becoming a hot research topic, which aims to count the number of the people in different crowded scenes. Existing methods are mainly based on training-testing pattern and rely on large data training, which fails to accurately count the crowd in real-world scenes because of the limitation of model’s generalization capability. To alleviate this issue, a scene-adaptive crowd counting method based on meta-learning with Dual-illumination Merging Network (DMNet) is proposed in this paper. The proposed method based on learning-to-learn and few-shot learning is able to adapt different scenes which only contain a few labeled images. To generate high quality density map and count the crowd in low-lighting scene, the DMNet is proposed, which contains Multi-scale Feature Extraction module and Element-wise Fusion Module. The Multi-scale Feature Extraction module is used to extract the image feature by multi-scale convolutions, which helps to improve network accuracy. The Element-wise Fusion module fuses the low-lighting feature and illumination-enhanced feature, which supplements the missing illumination in low-lighting environments. Experimental results on benchmarks, WorldExpo’10, DISCO, USCD, and Mall, show that the proposed method outperforms the existing state-of-the-art methods in accuracy and gets satisfied results.
Music is the language of emotions. In recent years, music emotion recognition has attracted widespread attention in the academic and industrial community since it can be widely used in fields like recommendation systems, automatic music composing, psychotherapy, music visualization, and so on. Especially with the rapid development of artificial intelligence, deep learning-based music emotion recognition is gradually becoming mainstream. This paper gives a detailed survey of music emotion recognition. Starting with some preliminary knowledge of music emotion recognition, this paper first introduces some commonly used evaluation metrics. Then a three-part research framework is put forward. Based on this three-part research framework, the knowledge and algorithms involved in each part are introduced with detailed analysis, including some commonly used datasets, emotion models, feature extraction, and emotion recognition algorithms. After that, the challenging problems and development trends of music emotion recognition technology are proposed, and finally, the whole paper is summarized.
We introduce a new notion called accountable attribute-based authentication with fine-grained access control (AccABA), which achieves (i) fine-grained access control that prevents ineligible users from authenticating; (ii) anonymity such that no one can recognize the identity of a user; (iii) public accountability, i.e., as long as a user authenticates two different messages, the corresponding authentications will be easily identified and linked, and anyone can reveal the user’s identity without any help from a trusted third party. Then, we formalize the security requirements in terms of unforgeability, anonymity, linkability and traceability, and give a generic construction to fulfill these requirements. Based on AccABA, we further present the first attribute-based, fair, anonymous and publicly traceable crowdsourcing scheme on blockchain, which is designed to filter qualified workers to participate in tasks, and ensures the fairness of the competition between workers, and finally balances the tension between anonymity and accountability.
Cognitive diagnosis is the judgment of the student’s cognitive ability, is a wide-spread concern in educational science. The cognitive diagnosis model (CDM) is an essential method to realize cognitive diagnosis measurement. This paper presents new research on the cognitive diagnosis model and introduces four individual aspects of probability-based CDM and deep learning-based CDM. These four aspects are higher-order latent trait, polytomous responses, polytomous attributes, and multilevel latent traits. The paper also sorts on the contained ideas, model structures and respective characteristics, and provides direction for developing cognitive diagnosis in the future.
The latest advance in recommendation shows that better user and item representations can be learned via performing graph convolutions on the user-item interaction graph. However, such finding is mostly restricted to the collaborative filtering (CF) scenario, where the interaction contexts are not available. In this work, we extend the advantages of graph convolutions to context-aware recommender system (CARS, which represents a generic type of models that can handle various side information). We propose Graph Convolution Machine (GCM), an end-to-end framework that consists of three components: an encoder, graph convolution (GC) layers, and a decoder. The encoder projects users, items, and contexts into embedding vectors, which are passed to the GC layers that refine user and item embeddings with context-aware graph convolutions on the user-item graph. The decoder digests the refined embeddings to output the prediction score by considering the interactions among user, item, and context embeddings. We conduct experiments on three real-world datasets from Yelp and Amazon, validating the effectiveness of GCM and the benefits of performing graph convolutions for CARS.
Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.
Searchable symmetric encryption (SSE) has been introduced for secure outsourcing the encrypted database to cloud storage, while maintaining searchable features. Of various SSE schemes, most of them assume the server is honest but curious, while the server may be trustless in the real world. Considering a malicious server not honestly performing the queries, verifiable SSE (VSSE) schemes are constructed to ensure the verifiability of the search results. However, existing VSSE constructions only focus on single-keyword search or incur heavy computational cost during verification. To address this challenge, we present an efficient VSSE scheme, built on OXT protocol (Cash et al., CRYPTO 2013), for conjunctive keyword queries with sublinear search overhead. The proposed VSSE scheme is based on a privacy-preserving hash-based accumulator, by leveraging a well-established cryptographic primitive, Symmetric Hidden Vector Encryption (SHVE). Our VSSE scheme enables both correctness and completeness verifiability for the result without pairing operations, thus greatly reducing the computational cost in the verification process. Besides, the proposed VSSE scheme can still provide a proof when the search result is empty. Finally, the security analysis and experimental evaluation are given to demonstrate the security and practicality of the proposed scheme.
Activity hijacking is one of the most powerful attacks in Android. Though promising, all the prior activity hijacking attacks suffer from some limitations and have limited attack capabilities. They no longer pose security threats in recent Android due to the presence of effective defense mechanisms. In this work, we propose the first automated and adaptive activity hijacking attack, named VenomAttack, enabling a spectrum of customized attacks (e.g., phishing, spoofing, and DoS) on a large scale in recent Android, even the state-of-the-art defense mechanisms are deployed. Specifically, we propose to use hotpatch techniques to identify vulnerable devices and update attack payload without re-installation and re-distribution, hence bypassing offline detection. We present a newly-discovered flaw in Android and a bug in derivatives of Android, each of which allows us to check if a target app is running in the background or not, by which we can determine the right attack timing via a designed transparent activity. We also propose an automated fake activity generation approach, allowing large-scale attacks. Requiring only the common permission INTERNET, we can hijack activities at the right timing without destroying the GUI integrity of the foreground app. We conduct proof-of-concept attacks, showing that VenomAttack poses severe security risks on recent Android versions. The user study demonstrates the effectiveness of VenomAttack in real-world scenarios, achieving a high success rate (95%) without users’ awareness. That would call more attention to the stakeholders like Google.
Underwater images often exhibit severe color deviations and degraded visibility, which limits many practical applications in ocean engineering. Although extensive research has been conducted into underwater image enhancement, little of which demonstrates the significant robustness and generalization for diverse real-world underwater scenes. In this paper, we propose an adaptive color correction algorithm based on the maximum likelihood estimation of Gaussian parameters, which effectively removes color casts of a variety of underwater images. A novel algorithm using weighted combination of gradient maps in HSV color space and absolute difference of intensity for accurate background light estimation is proposed, which circumvents the influence of white or bright regions that challenges existing physical model-based methods. To enhance contrast of resultant images, a piece-wise affine transform is applied to the transmission map estimated via background light differential. Finally, with the estimated background light and transmission map, the scene radiance is recovered by addressing an inverse problem of image formation model. Extensive experiments reveal that our results are characterized by natural appearance and genuine color, and our method achieves competitive performance with the state-of-the-art methods in terms of objective evaluation metrics, which further validates the better robustness and higher generalization ability of our enhancement model.
Accurate prediction of sea surface temperature (SST) is extremely important for forecasting oceanic environmental events and for ocean studies. However, the existing SST prediction methods do not consider the seasonal periodicity and abnormal fluctuation characteristics of SST or the importance of historical SST data from different times; thus, these methods suffer from low prediction accuracy. To solve this problem, we comprehensively consider the effects of seasonal periodicity and abnormal fluctuation characteristics of SST data, as well as the influence of historical data in different periods, on prediction accuracy. We propose a novel ensemble learning approach that combines the Predictive Recurrent Neural Network(PredRNN) network and an attention mechanism for effective SST field prediction. In this approach, the XGBoost model is used to learn the long-period fluctuation law of SST and to extract seasonal periodic features from SST data. The exponential smoothing method is used to mitigate the impact of severely abnormal SST fluctuations and extract the a priori features of SST data. The outputs of the two aforementioned models and the original SST data are stacked and used as inputs for the next model, the PredRNN network. PredRNN is the most recently developed spatiotemporal deep learning network, which simulates both spatial and temporal representations and is capable of transferring memory across layers and time steps. Therefore, we used it to extract the spatiotemporal correlations of SST data and predict future SSTs. Finally, an attention mechanism is added to capture the importance of different historical SST data, weigh the output of each step of the PredRNN network, and improve the prediction accuracy. The experimental results on two ocean datasets confirm that the proposed approach achieves higher training efficiency and prediction accuracy than the existing SST field prediction approaches do.
Nowadays, smart buildings rely on Internet of things (IoT) technology derived from the cloud and fog computing paradigms to coordinate and collaborate between connected objects. Fog is characterized by low latency with a wider spread and geographically distributed nodes to support mobility, real-time interaction, and location-based services. To provide optimum quality of user life in modern buildings, we rely on a holistic Framework, designed in a way that decreases latency and improves energy saving and services efficiency with different capabilities. Discrete EVent system Specification (DEVS) is a formalism used to describe simulation models in a modular way. In this work, the sub-models of connected objects in the building are accurately and independently designed, and after installing them together, we easily get an integrated model which is subject to the fog computing Framework. Simulation results show that this new approach significantly, improves energy efficiency of buildings and reduces latency. Additionally, with DEVS, we can easily add or remove sub-models to or from the overall model, allowing us to continually improve our designs.
Temporal localization is crucial for action video recognition. Since the manual annotations are expensive and time-consuming in videos, temporal localization with weak video-level labels is challenging but indispensable. In this paper, we propose a weakly-supervised temporal action localization approach in untrimmed videos. To settle this issue, we train the model based on the proxies of each action class. The proxies are used to measure the distances between action segments and different original action features. We use a proxy-based metric to cluster the same actions together and separate actions from backgrounds. Compared with state-of-the-art methods, our method achieved competitive results on the THUMOS14 and ActivityNet1.2 datasets.
It is well known that deep learning depends on a large amount of clean data. Because of high annotation cost, various methods have been devoted to annotating the data automatically. However, a larger number of the noisy labels are generated in the datasets, which is a challenging problem. In this paper, we propose a new method for selecting training data accurately. Specifically, our approach fits a mixture model to the per-sample loss of the raw label and the predicted label, and the mixture model is utilized to dynamically divide the training set into a correctly labeled set, a correctly predicted set, and a wrong set. Then, a network is trained with these sets in the supervised learning manner. Due to the confirmation bias problem, we train the two networks alternately, and each network establishes the data division to teach the other network. When optimizing network parameters, the labels of the samples fuse respectively by the probabilities from the mixture model. Experiments on CIFAR-10, CIFAR-100 and Clothing1M demonstrate that this method is the same or superior to the state-of-the-art methods.
The minimum independent dominance set (MIDS) problem is an important version of the dominating set with some other applications. In this work, we present an improved master-apprentice evolutionary algorithm for solving the MIDS problem based on a path-breaking strategy called MAE-PB. The proposed MAE-PB algorithm combines a construction function for the initial solution generation and candidate solution restarting. It is a multiple neighborhood-based local search algorithm that improves the quality of the solution using a path-breaking strategy for solution recombination based on master and apprentice solutions and a perturbation strategy for disturbing the solution when the algorithm cannot improve the solution quality within a certain number of steps. We show the competitiveness of the MAE-PB algorithm by presenting the computational results on classical benchmarks from the literature and a suite of massive graphs from real-world applications. The results show that the MAE-PB algorithm achieves high performance. In particular, for the classical benchmarks, the MAE-PB algorithm obtains the best-known results for seven instances, whereas for several massive graphs, it improves the best-known results for 62 instances. We investigate the proposed key ingredients to determine their impact on the performance of the proposed algorithm.
The authors of this paper have previously proposed the global virtual data space system (GVDS) to aggregate the scattered and autonomous storage resources in China’s national supercomputer grid (National Supercomputing Center in Guangzhou, National Supercomputing Center in Jinan, National Supercomputing Center in Changsha, Shanghai Supercomputing Center, and Computer Network Information Center in Chinese Academy of Sciences) into a storage system that spans the wide area network (WAN), which realizes the unified management of global storage resources in China. At present, the GVDS has been successfully deployed in the China National Grid environment. However, when accessing and sharing remote data in the WAN, the GVDS will cause redundant transmission of data and waste a lot of network bandwidth resources. In this paper, we propose an edge cache system as a supplementary system of the GVDS to improve the performance of upper-level applications accessing and sharing remote data. Specifically, we first designs the architecture of the edge cache system, and then study the key technologies of this architecture: the edge cache index mechanism based on double-layer hashing, the edge cache replacement strategy based on the GDSF algorithm, the request routing based on consistent hashing method, and the cluster member maintenance method based on the SWIM protocol. The experimental results show that the edge cache system has successfully implemented the relevant operation functions (read, write, deletion, modification, etc.) and is compatible with the POSIX interface in terms of function. Further, it can greatly reduce the amount of data transmission and increase the data access bandwidth when the accessed file is located at the edge cache system in terms of performance, i.e., its performance is close to the performance of the network file system in the local area network (LAN).
Action recognition is an important research topic in video analysis that remains very challenging. Effective recognition relies on learning a good representation of both spatial information (for appearance) and temporal information (for motion). These two kinds of information are highly correlated but have quite different properties, leading to unsatisfying results of both connecting independent models (e.g., CNN-LSTM) and direct unbiased co-modeling (e.g., 3DCNN). Besides, a long-lasting tradition on this task with deep learning models is to just use 8 or 16 consecutive frames as input, making it hard to extract discriminative motion features. In this work, we propose a novel network structure called ResLNet (Deep Residual LSTM network), which can take longer inputs (e.g., of 64 frames) and have convolutions collaborate with LSTM more effectively under the residual structure to learn better spatial-temporal representations than ever without the cost of extra computations with the proposed embedded variable stride convolution. The superiority of this proposal and its ablation study are shown on the three most popular benchmark datasets: Kinetics, HMDB51, and UCF101. The proposed network could be adopted for various features, such as RGB and optical flow. Due to the limitation of the computation power of our experiment equipment and the real-time requirement, the proposed network is tested on the RGB only and shows great performance.
The haze phenomenon seriously interferes the image acquisition and reduces image quality. Due to many uncertain factors, dehazing is typically a challenge in image processing. The most existing deep learning-based dehazing approaches apply the atmospheric scattering model (ASM) or a similar physical model, which originally comes from traditional dehazing methods. However, the data set trained in deep learning does not match well this model for three reasons. Firstly, the atmospheric illumination in ASM is obtained from prior experience, which is not accurate for dehazing real-scene. Secondly, it is difficult to get the depth of outdoor scenes for ASM. Thirdly, the haze is a complex natural phenomenon, and it is difficult to find an accurate physical model and related parameters to describe this phenomenon. In this paper, we propose a black box method, in which the haze is considered an image quality problem without using any physical model such as ASM. Analytically, we propose a novel dehazing equation to combine two mechanisms: interference item and detail enhancement item. The interference item estimates the haze information for dehazing the image, and then the detail enhancement item can repair and enhance the details of the dehazed image. Based on the new equation, we design an anti-interference and detail enhancement dehazing network (AIDEDNet), which is dramatically different from existing dehazing networks in that our network is fed into the haze-free images for training. Specifically, we propose a new way to construct a haze patch on the flight of network training. The patch is randomly selected from the input images and the thickness of haze is also randomly set. Numerous experiment results show that AIDEDNet outperforms the state-of-the-art methods on both synthetic haze scenes and real-world haze scenes.
Although few-shot learning (FSL) has achieved great progress, it is still an enormous challenge especially when the source and target set are from different domains, which is also known as cross-domain few-shot learning (CD-FSL). Utilizing more source domain data is an effective way to improve the performance of CD-FSL. However, knowledge from different source domains may entangle and confuse with each other, which hurts the performance on the target domain. Therefore, we propose team-knowledge distillation networks (TKD-Net) to tackle this problem, which explores a strategy to help the cooperation of multiple teachers. Specifically, we distill knowledge from the cooperation of teacher networks to a single student network in a meta-learning framework. It incorporates task-oriented knowledge distillation and multiple cooperation among teachers to train an efficient student with better generalization ability on unseen tasks. Moreover, our TKD-Net employs both response-based knowledge and relation-based knowledge to transfer more comprehensive and effective knowledge. Extensive experimental results on four fine-grained datasets have demonstrated the effectiveness and superiority of our proposed TKD-Net approach.
Transformers have been widely studied in many natural language processing (NLP) tasks, which can capture the dependency from the whole sentence with a high parallelizability thanks to the multi-head attention and the position-wise feed-forward network. However, the above two components of transformers are position-independent, which causes transformers to be weak in modeling sentence structures. Existing studies commonly utilized positional encoding or mask strategies for capturing the structural information of sentences. In this paper, we aim at strengthening the ability of transformers on modeling the linear structure of sentences from three aspects, containing the absolute position of tokens, the relative distance, and the direction between tokens. We propose a novel bidirectional Transformer with absolute-position aware relative position encoding (BiAR-Transformer) that combines the positional encoding and the mask strategy together. We model the relative distance between tokens along with the absolute position of tokens by a novel absolute-position aware relative position encoding. Meanwhile, we apply a bidirectional mask strategy for modeling the direction between tokens. Experimental results on the natural language inference, paraphrase identification, sentiment classification and machine translation tasks show that BiAR-Transformer achieves superior performance than other strong baselines.
While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.
Massive sequence view (MSV) is a classic timeline-based dynamic network visualization approach. However, it is vulnerable to visual clutter caused by overlapping edges, thereby leading to unexpected misunderstanding of time-varying trends of network communications. This study presents a new edge sampling algorithm called edge-based multi-class blue noise (E-MCBN) to reduce visual clutter in MSV. Our main idea is inspired by the multi-class blue noise (MCBN) sampling algorithm, commonly used in multi-class scatterplot decluttering. First, we take a node pair as an edge class, which can be regarded as an analogy to classes in multi-class scatterplots. Second, we propose two indicators, namely, class overlap and inter-class conflict degrees, to measure the overlapping degree and mutual exclusion, respectively, between edge classes. These indicators help construct the foundation of migrating the MCBN sampling from multi-class scatterplots to dynamic network samplings. Finally, we propose three strategies to accelerate MCBN sampling and a partitioning strategy to preserve local high-density edges in the MSV. The result shows that our approach can effectively reduce visual clutters and improve the readability of MSV. Moreover, our approach can also overcome the disadvantages of the MCBN sampling (i.e., long-running and failure to preserve local high-density communication areas in MSV). This study is the first that introduces MCBN sampling into a dynamic network sampling.
A threshold signature is a special digital signature in which the
Image super-resolution (SR) is one of the classic computer vision tasks. This paper proposes a super-resolution network based on adaptive frequency component upsampling, named SR-AFU. The network is composed of multiple cascaded dilated convolution residual blocks (CDCRB) to extract multi-resolution features representing image semantics, and multiple multi-size convolutional upsampling blocks (MCUB) to adaptively upsample different frequency components using CDCRB features. The paper also defines a new loss function based on the discrete wavelet transform, making the reconstructed SR images closer to human perception. Experiments on the benchmark datasets show that SR-AFU has higher peak signal to noise ratio (PSNR), significantly faster training speed and more realistic visual effects compared with the existing methods.
A new meaningful image encryption algorithm based on compressive sensing (CS) and integer wavelet transformation (IWT) is proposed in this study. First of all, the initial values of chaotic system are encrypted by RSA algorithm, and then they are open as public keys. To make the chaotic sequence more random, a mathematical model is constructed to improve the random performance. Then, the plain image is compressed and encrypted to obtain the secret image. Secondly, the secret image is inserted with numbers zero to extend its size same to the plain image. After applying IWT to the carrier image and discrete wavelet transformation (DWT) to the inserted image, the secret image is embedded into the carrier image. Finally, a meaningful carrier image embedded with secret plain image can be obtained by inverse IWT. Here, the measurement matrix is built by both chaotic system and Hadamard matrix, which not only retains the characteristics of Hadamard matrix, but also has the property of control and synchronization of chaotic system. Especially, information entropy of the plain image is employed to produce the initial conditions of chaotic system. As a result, the proposed algorithm can resist known-plaintext attack (KPA) and chosen-plaintext attack (CPA). By the help of asymmetric cipher algorithm RSA, no extra transmission is needed in the communication. Experimental simulations show that the normalized correlation (NC) values between the host image and the cipher image are high. That is to say, the proposed encryption algorithm is imperceptible and has good hiding effect.