Reactive synthesis is a technique for automatic generation of a reactive system from a high level specification. The system is reactive in the sense that it reacts to the inputs from the environment. The specification is in general given as a linear temporal logic (LTL) formula. The behaviour of the system interacting with the environment can be represented as a game in which the system plays against the environment. Thus, a problem of reactive synthesis is commonly treated as solving such a game with the specification as the winning condition. Reactive synthesis has been thoroughly investigated for more two decades. A well-known challenge is to deal with the complex uncertainty of the environment. We understand that a major issue is due to the lack of a sufficient treatment of probabilistic properties in the traditional models. For example, a two-player game defined by a standard Kriple structure does not consider probabilistic transitions in reaction to the uncertain physical environment; and a Markov Decision Process (MDP) in general does not explicitly separate the system from its environment and it does not describe the interaction between system and the environment. In this paper, we propose a new and more general model which combines the two-player game and the MDP. Furthermore, we study probabilistic reactive synthesis for the games of General Reactivity of Rank 1 (i.e., GR(1)) defined in this model. More specifically, we present an algorithm, which for given model
Pull-based development has become an important paradigm for distributed software development. In this model, each developer independently works on a copied repository (i.e., a fork) from the central repository. It is essential for developers to maintain awareness of the state of other forks to improve collaboration efficiency. In this paper, we propose a method to automatically generate a summary of a fork. We first use the random forest method to generate the label of a fork, i.e., feature implementation or a bug fix. Based on the information of the fork-related commits, we then use the TextRank algorithm to generate detailed activity information of the fork. Finally, we apply a set of rules to integrate all related information to construct a complete fork summary. To validate the effectiveness of our method, we conduct 30 groups of manual experiment and 77 groups of case studies on Github. We propose
As one of the most important components in knowledge graph construction, entity linking has been drawing more and more attention in the last decade. In this paper, we propose two improvements towards better entity linking. On one hand, we propose a simple but effective coarse-to-fine unsupervised knowledge base(KB) extraction approach to improve the quality of KB, through which we can conduct entity linking more efficiently. On the other hand, we propose a highway network framework to bridge key words and sequential information captured with a self-attention mechanism to better represent both local and global information. Detailed experimentation on six public entity linking datasets verifies the great effectiveness of both our approaches.
Word-embedding acts as one of the backbones of modern natural language processing (NLP). Recently, with the need for deploying NLP models to low-resource devices, there has been a surge of interest to compress word embeddings into hash codes or binary vectors so as to save the storage and memory consumption. Typically, existing work learns to encode an embedding into a compressed representation from which the original embedding can be reconstructed. Although these methods aim to preserve most information of every individual word, they often fail to retain the relation between words, thus can yield large loss on certain tasks. To this end, this paper presents Relation Reconstructive Binarization (R2B) to transform word embeddings into binary codes that can preserve the relation between words. At its heart, R2B trains an auto-encoder to generate binary codes that allow reconstructing the word-by-word relations in the original embedding space. Experiments showed that our method achieved significant improvements over previous methods on a number of tasks along with a space-saving of up to 98.4%. Specifically, our method reached even better results on word similarity evaluation than the uncompressed pre-trained embeddings, and was significantly better than previous compression methods that do not consider word relations.
There has been a growing interest in the side-channel analysis (SCA) field based on deep learning (DL) technology. Various DL network or model has been developed to improve the efficiency of SCA. However, few studies have investigated the impact of the different models on attack results and the exact relationship between power consumption traces and intermediate values. Based on the convolutional neural network and the autoencoder, this paper proposes a Template Analysis Pre-trained DL Classification model named TAPDC which contains three sub-networks. The TAPDC model detects the periodicity of power trace, relating power to the intermediate values and mining the deeper features by the multi-layer convolutional net. We implement the TAPDC model and compare it with two classical models in a fair experiment. The evaluative results show that the TAPDC model with autoencoder and deep convolution feature extraction structure in SCA can more effectively extract information from power consumption trace. Also, Using the classifier layer, this model links power information to the probability of intermediate value. It completes the conversion from power trace to intermediate values and greatly improves the efficiency of the power attack.
N6-methyladenosine (m 6A) is a prevalent methylation modification and plays a vital role in various biological processes, such as metabolism, mRNA processing, synthesis, and transport. Recent studies have suggested that m 6A modification is related to common diseases such as cancer, tumours, and obesity. Therefore, accurate prediction of methylation sites in RNA sequences has emerged as a critical issue in the area of bioinformatics. However, traditional high-throughput sequencing and wet bench experimental techniques have the disadvantages of high costs, significant time requirements and inaccurate identification of sites. But through the use of traditional experimental methods, researchers have produced many large databases of m 6A sites. With the support of these basic databases and existing deep learning methods, we developed an m 6A site predictor named DeepM6ASeq-EL, which integrates an ensemble of five LSTM and CNN classifiers with the combined strategy of hard voting. Compared to the state-of-the-art prediction method WHISTLE (average AUC 0.948 and 0.880), the DeepM6ASeq-EL had a lower accuracy in m 6A site prediction (average AUC: 0.861 for the full transcript models and 0.809 for the mature messenger RNA models) when tested on six independent datasets.
Closely related to the economy, the analysis and management of electricity consumption has been widely studied. Conventional approaches mainly focus on the prediction and anomaly detection of electricity consumption, which fails to reveal the in-depth relationships between electricity consumption and various factors such as industry, weather etc.. In the meantime, the lack of analysis tools has increased the difficulty in analytical tasks such as correlation analysis and comparative analysis. In this paper, we introduce EcoVis, a visual analysis system that supports the industrial-level spatio-temporal correlation analysis in the electricity consumption data. We not only propose a novel approach to model spatio-temporal data into a graph structure for easier correlation analysis, but also introduce a novel visual representation to display the distributions of multiple instances in a single map. We implement the system with the cooperation with domain experts. Experiments are conducted to demonstrate the effectiveness of our method.
Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data. Since stream processing systems (SPSs) usually require distributed deployment on clusters of servers in face of large-scale of data, it is especially common to meet with failures of processing nodes or communication networks, but should be handled seriously considering service quality. A failed system may produce wrong results or become unavailable, resulting in a decline in user experience or even significant financial loss. Hence, a large amount of fault tolerance approaches have been proposed for SPSs. These approaches often have their own priorities on specific performance concerns, e.g., runtime overhead and recovery efficiency. Nevertheless, there is a lack of a systematic overview and classification of the state-of-the-art fault tolerance approaches in SPSs, which will become an obstacle for the development of SPSs. Therefore, we investigate the existing achievements and develop a taxonomy of the fault tolerance in SPSs. Furthermore, we propose an evaluation framework tailored for fault tolerance, demonstrate the experimental results on two representative open-sourced SPSs and exposit the possible disadvantages in current designs. Finally, we specify future research directions in this domain.
Accurate diagnosis is a significant step in cancer treatment. Machine learning can support doctors in prognosis decision-making, and its performance is always weakened by the high dimension and small quantity of genetic data. Fortunately, deep learning can effectively process the high dimensional data with growing. However, the problem of inadequate data remains unsolved and has lowered the performance of deep learning. To end it, we propose a generative adversarial model that uses non target cancer data to help target generator training. We use the reconstruction loss to further stabilize model training and improve the quality of generated samples. We also present a cancer classification model to optimize classification performance. Experimental results prove that mean absolute error of cancer gene made by our model is 19.3% lower than DC-GAN, and the classification accuracy rate of our produced data is higher than the data created by GAN. As for the classification model, the classification accuracy of our model reaches 92.6%, which is 7.6% higher than the model without any generated data.
Detection and segmentation of defocus blur is a challenging task in digital imaging applications as the blurry images comprise of blur and sharp regions that wrap significant information and require effective methods for information extraction. Existing defocus blur detection and segmentation methods have several limitations i.e., discriminating sharp smooth and blurred smooth regions, low recognition rate in noisy images, and high computational cost without having any prior knowledge of images i.e., blur degree and camera configuration. Hence, there exists a dire need to develop an effective method for defocus blur detection, and segmentation robust to the above-mentioned limitations. This paper presents a novel features descriptor local directional mean patterns (LDMP) for defocus blur detection and employ KNN matting over the detected LDMP-Trimap for the robust segmentation of sharp and blur regions. We argue/hypothesize that most of the image fields located in blurry regions have significantly less specific local patterns than those in the sharp regions, therefore, proposed LDMP features descriptor should reliably detect the defocus blurred regions. The fusion of LDMP features with KNN matting provides superior performance in terms of obtaining high-quality segmented regions in the image. Additionally, the proposed LDMP features descriptor is robust to noise and successfully detects defocus blur in high-dense noisy images. Experimental results on Shi and Zhao datasets demonstrate the effectiveness of the proposed method in terms of defocus blur detection. Evaluation and comparative analysis signify that our method achieves superior segmentation performance and low computational cost of 15 seconds.
Multivariate dynamic networks indicate networks whose topology structure and vertex attributes are evolving along time. They are common in multimedia applications. Anomaly detection is one of the essential tasks in analyzing these networks though it is not well addressed. In this paper, we combine a rare category detection method and visualization techniques to help users to identify and analyze anomalies in multivariate dynamic networks. We conclude features of rare categories and two types of anomalies of rare categories. Then we present a novel rare category detection method, called DIRAD, to detect rare category candidates with anomalies. We develop a prototype system called iNet, which integrates two major visualization components, including a glyph-based rare category identifier, which helps users to identify rare categories among detected substructures, a major view, which assists users to analyze and interpret the anomalies of rare categories in network topology and vertex attributes. Evaluations, including an algorithm performance evaluation, a case study, and a user study, are conducted to test the effectiveness of proposed methods.
Security of databases has always been a hot topic in the field of information security. Privacy protection can be realized by encrypting data, while data copyright can be protected by using digital watermarking technology. By combining these two technologies, a database’s copyright and privacy problems in the cloud can be effectively solved. Based on order-preserving encryption scheme (OPES), circular histogram and digital watermarking technology, this paper proposes a new robust watermarking scheme for protection of databases in the encrypted domain. Firstly, the OPES is used to encrypt data to avoid exposing the data in the cloud. Then, the encrypted data are grouped and modified by the use of a circular histogram for embedding a digital watermark. The common data query operations in database are available for the encrypted watermarking database. In receivers, the digital watermark and the original data can be restored through a secret key and a key table. Experimental results have shown that the proposed algorithm is robust against common database attacks in the encrypted domain.
Searchable encryption provides an effective way for data security and privacy in cloud storage. Users can retrieve encrypted data in the cloud under the premise of protecting their own data security and privacy. However, most of the current content-based retrieval schemes do not contain enough semantic information of the article and cannot fully reflect the semantic information of the text. In this paper, we propose two secure and semantic retrieval schemes based on BERT (bidirectional encoder representations from transformers) named SSRB-1, SSRB-2. By training the documents with BERT, the keyword vector is generated to contain more semantic information of the documents, which improves the accuracy of retrieval and makes the retrieval result more consistent with the user’s intention. Finally, through testing on real data sets, it is shown that both of our solutions are feasible and effective.
Searchable encryption is an effective way to ensure the security and availability of encrypted outsourced cloud data. Among existing solutions, the keyword exact search solution is relatively inflexible, while the fuzzy keyword search solution either has a high index overhead or suffers from the false-positive. Furthermore, no existing fuzzy keyword search solution considers the homoglyph search on encrypted data. In this paper, we propose an efficient privacy-preserving homoglyph search scheme supporting arbitrary languages (POSA, in short). We enhance the performance of the fuzzy keyword search in three aspects. Firstly, we formulate the similarity of homoglyph and propose a privacy-preserving homoglyph search. Secondly, we put forward an index build mechanism without the false-positive, which reduces the storage overhead of the index and is suitable for arbitrary languages. Thirdly, POSA returns just the user’s search, i.e., all returned documents contain the search keyword or its homoglyph. The theoretical analysis and experimental evaluations on real-world datasets demonstrate the effectiveness and efficiency of POSA.