User profiles are widely used in the age of big data. However, generating and releasing user profiles may cause serious privacy leakage, since a large number of personal data are collected and analyzed. In this paper, we propose a differentially private user profile construction method DP-UserPro, which is composed of DP-CLIQUE and privately top-k tags selection. DP-CLIQUE is a differentially private high dimensional data cluster algorithm based on CLIQUE. The multidimensional tag space is divided into cells, Laplace noises are added into the count value of each cell. Based on the breadthfirst-search, the largest connected dense cells are clustered into a cluster. Then a privately top-k tags selection approach is proposed based on the score function of each tag, to select the most important k tags which can represent the characteristics of the cluster. Privacy and utility of DP-UserPro are theoretically analyzed and experimentally evaluated in the last. Comparison experiments are carried out with Tag Suppression algorithm on two real datasets, to measure the False Negative Rate (FNR) and precision. The results show that DP-UserPro outperforms Tag Suppression by 62.5% in the best case and 14.25% in the worst case on FNR, and DP-UserPro is about 21.1% better on precision than that of Tag Suppression, in average.
Attribute-based encryption with keyword search (ABKS) achieves both fine-grained access control and keyword search. However, in the previous ABKS schemes, the search algorithm requires that each keyword to be identical between the target keyword set and the ciphertext keyword set, otherwise the algorithm does not output any search result, which is not conducive to use. Moreover, the previous ABKS schemes are vulnerable to what we call a peer-decryption attack, that is, the ciphertext may be eavesdropped and decrypted by an adversary who has sufficient authorities but no information about the ciphertext keywords.
In this paper, we provide a new system in fog computing, the ciphertext-policy attribute-based encryption with dynamic keyword search (ABDKS). In ABDKS, the search algorithm requires only one keyword to be identical between the two keyword sets and outputs the corresponding correlation which reflects the number of the same keywords in those two sets. In addition, our ABDKS is resistant to peer-decryption attack, since the decryption requires not only sufficient authority but also at least one keyword of the ciphertext. Beyond that, the ABDKS shifts most computational overheads from resource constrained users to fog nodes. The security analysis shows that the ABDKS can resist Chosen-PlaintextAttack (CPA) and Chosen-Keyword Attack (CKA).
In this paper, a binary-extensible quality status encoding scheme, named IQSCT (IoT quality status code table), is proposed for the PCB-based product with available recovery options in remanufacturing. IQSCT is achieved by code evolution based on binary logic, in which the product flow and the quality information flow are integrated, and three key features of PCB-based product (PCB-module association, assemblydisassembly logic, and disassembly risk) are involved in production costing.With IQSCT, the manufacturer can have better decisions to reduce remanufacturing cost and improve resource utilization, which is verified by a case study based on the real data from BOM cost and corresponding estimation of Apple iPhone 11 series.
Combinatorial optimization in the face of uncertainty is a challenge in both operational research and machine learning. In this paper, we consider a special and important class called the adversarial online combinatorial optimization with semi-bandit feedback, in which a player makes combinatorial decisions and gets the corresponding feedback repeatedly. While existing algorithms focus on the regret guarantee or assume there exists an efficient offline oracle, it is still a challenge to solve this problem efficiently if the offline counterpart is NP-hard. In this paper, we propose a variant of the Followthe-Perturbed-Leader (FPL) algorithm to solve this problem. Unlike the existing FPL approach, our method employs an approximation algorithm as an offline oracle and perturbs the collected data by adding nonnegative random variables. Our approach is simple and computationally efficient. Moreover, it can guarantee a sublinear (1+ ε)-scaled regret of order O(
Traditional first-order logic has four definitions for quantifiers, which are defined by universal and existential quantifiers. In L3-valued (three-valued) first-order logic, there are eight kinds of definitions for quantifiers; and corresponding Gentzen deduction systems will be given and their soundness and completeness theorems will be proved.
Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games, e.g., StarCraft and poker. Neural Fictitious Self-Play (NFSP) is an effective algorithm that learns approximate Nash Equilibrium of imperfect-information games from purely self-play without prior domain knowledge. However, it needs to train a neural network in an off-policy manner to approximate the action values. For games with large search spaces, the training may suffer from unnecessary exploration and sometimes fails to converge. In this paper, we propose a new Neural Fictitious Self-Play algorithmthat combinesMonte Carlo tree search with NFSP, called MC-NFSP, to improve the performance in real-time zero-sum imperfect-information games. With experiments and empirical analysis, we demonstrate that the proposed MC-NFSP algorithm can approximate Nash Equilibrium in games with large-scale search depth while the NFSP can not. Furthermore, we develop an Asynchronous Neural Fictitious Self-Play framework (ANFSP). It uses asynchronous and parallel architecture to collect game experience and improve both the training efficiency and policy quality. The experiments with th e games with hidden state information (Texas Hold’em), and the FPS (firstperson shooter) games demonstrate effectiveness of our algorithms.
Effective exploration is key to a successful search process. The recently proposed negatively correlated search (NCS) tries to achieve this by coordinated parallel exploration, where a set of search processes are driven to be negatively correlated so that different promising areas of the search space can be visited simultaneously. Despite successful applications of NCS, the negatively correlated search behaviors were mostly devised by intuition, while deeper (e.g., mathematical) understanding is missing. In this paper, a more principled NCS, namely NCNES, is presented, showing that the parallel exploration is equivalent to a process of seeking probabilistic models that both lead to solutions of high quality and are distant from previous obtained probabilistic models. Reinforcement learning, for which exploration is of particular importance, are considered for empirical assessment. The proposed NCNES is applied to directly train a deep convolution network with 1.7 million connection weights for playing Atari games. Empirical results show that the significant advantages of NCNES, especially on games with uncertain and delayed rewards, can be highly owed to the effective parallel exploration ability.
Traditional recommendation algorithms predict the latent interest of an active user by collecting rating information from other similar users or items. Recently, more and more recommendation systems attempt to involve social relations to improve recommendation performance. However, the existing works either leave out the user reliability or cannot capture the correlation between two users who are similar but not socially connected. Besides, they also take the trust value between users either 0 or 1, thus degenerating the prediction accuracy. In this paper, we propose an efficient social affect model, multiaffect(ed), for recommendation via incorporating both users’ reliability and influence propagation. Specifically, the model contains two main components, i.e., computation of user reliability and influence propagation, designing of user-shared feature space. Firstly, a reliability calculation strategy based on user similarity is developed for measuring the recommendation accuracy between users. Then, the factor of influence propagation relationship among users is taken into consideration. Finally, the multi-affect(ed) model is developed with user-shared feature space to generate the predicted ratings.
Nowadays, human activity recognition is becoming a more and more significant topic, and there is also a wide range of applications for it in real world scenarios. Sensor data is an important data source in engineering and application. At present, some studies have been carried out in the field of human activity recognition based on sensor data in a macroscopic perspective. However, many studies in this perspective face some limitations. One pivotal limitation is uncontrollable data segment length of different kinds of activities. Suitable feature and data form are also influencing factors. This paper carries out the study creatively on a microscopic perspective with an emphasis on the logic and relevance between data segments, attempting to apply the idea of natural language processing and the method of data symbolization to the study of human activity recognition and try to solve the problem above. In this paper, several activity-element definitions and three algorithms are proposed, including the algorithm of dictionary building, the algorithm of corpus building, and activity recognition algorithm improved from a natural language analysis method, TFIDF. Numerous experiments on different aspects of this model are taken. The experiments are carried out on six complex and representative single-level sensor datasets, namely UCI Sports and Daily dataset, Skoda dataset, WISDM Phoneacc dataset, WISDM Watchacc dataset, Healthy Older People dataset and HAPT dataset, which prove that this model can be applied to different datasets and achieve a satisfactory recognition result.
A sememe is defined as the minimum semantic unit of languages in linguistics. Sememe knowledge bases are built by manually annotating sememes for words and phrases. HowNet is the most well-known sememe knowledge base. It has been extensively utilized in many natural language processing tasks in the era of statistical natural language processing and proven to be effective and helpful to understanding and using languages. In the era of deep learning, although data are thought to be of vital importance, there are some studies working on incorporating sememe knowledge bases like HowNet into neural network models to enhance system performance. Some successful attempts have been made in the tasks including word representation learning, language modeling, semantic composition, etc. In addition, considering the high cost of manual annotation and update for sememe knowledge bases, some work has tried to use machine learning methods to automatically predict sememes for words and phrases to expand sememe knowledge bases. Besides, some studies try to extend HowNet to other languages by automatically predicting sememes for words and phrases in a new language. In this paper, we summarize recent studies on application and expansion of sememe knowledge bases and point out some future directions of research on sememes.
Multi-label classification aims to assign a set of proper labels for each instance, where distance metric learning can help improve the generalization ability of instance-based multi-label classification models. Existing multi-label metric learning techniques work by utilizing pairwise constraints to enforce that examples with similar label assignments should have close distance in the embedded feature space. In this paper, a novel distance metric learning approach for multi-label classification is proposed by modeling structural interactions between instance space and label space. On one hand, compositional distance metric is employed which adopts the representation of a weighted sum of rank-1 PSD matrices based on component bases. On the other hand, compositional weights are optimized by exploiting triplet similarity constraints derived from both instance and label spaces. Due to the compositional nature of employed distance metric, the resulting problem admits quadratic programming formulation with linear optimization complexity w.r.t. the number of training examples.We also derive the generalization bound for the proposed approach based on algorithmic robustness analysis of the compositional metric. Extensive experiments on sixteen benchmark data sets clearly validate the usefulness of compositional metric in yielding effective distance metric for multi-label classification.
Software systems are present all around us and playing their vital roles in our daily life. The correct functioning of these systems is of prime concern. In addition to classical testing techniques, formal techniques like model checking are used to reinforce the quality and reliability of software systems. However, obtaining of behavior model, which is essential for model-based techniques, of unknown software systems is a challenging task. To mitigate this problem, an emerging black-box analysis technique, called Model Learning, can be applied. It complements existing model-based testing and verification approaches by providing behavior models of blackbox systems fully automatically. This paper surveys the model learning technique, which recently has attracted much attention from researchers, especially from the domains of testing and verification. First, we review the background and foundations of model learning, which form the basis of subsequent sections. Second, we present some well-known model learning tools and provide their merits and shortcomings in the form of a comparison table. Third, we describe the successful applications of model learning in multidisciplinary fields, current challenges along with possible future works, and concluding remarks.
Multi-user collaborative editors are useful computer-aided tools to support human-to-human collaboration. For multi-user collaborative editors, selective undo is an essential utility enabling users to undo any editing operations at any time. Collaborative editors usually adopt operational transformation (OT) to address concurrency and consistency issues. However, it is still a great challenge to design an efficient and correct OT algorithm capable of handling both normal do operations and user-initiated undo operations because these two kinds of operations can interfere with each other in various forms. In this paper, we propose a semi-transparent selective undo algorithm that handles both do and undo in a unified framework, which separates the processing part of do operations from the processing part of undo operations. Formal proofs are provided to prove the proposed algorithm under the well-established criteria. Theoretical analysis and experimental evaluation are conducted to show that the proposed algorithm outperforms the prior OT-based selective undo algorithms.
The emergence of non-volatile memory (NVM) has introduced new opportunities for performance optimizations in existing storage systems. To better utilize its byte-addressability and near-DRAM performance, NVM can be attached on the memory bus and accessed via load/store memory instructions rather than the conventional block interface. In this scenario, a cache line (usually 64 bytes) becomes the data transfer unit between volatile and non-volatile devices. However, the failureatomicity of write on NVM is the memory bit width (usually 8 bytes). This mismatch between the data transfer unit and the atomicity unit may introduce write amplification and compromise data consistency of node-based data structures such as B+-trees. In this paper, we propose WOBTree, a Write-Optimized B+-Tree for NVM to address the mismatch problem without expensive logging. WOBTree minimizes the update granularity from a tree node to a much smaller subnode and carefully arranges the write operations in it to ensure crash consistency and reduce write amplification. Experimental results show that compared with previous persistent B+-tree solutions, WOBTree reduces the write amplification by up to 86× and improves write performance by up to 61× while maintaining similar search performance.
Recently, a growing number of scientific applications have been migrated into the cloud. To deal with the problems brought by clouds, more and more researchers start to consider multiple optimization goals in workflow scheduling. However, the previous works ignore some details, which are challenging but essential. Most existing multi-objective workflow scheduling algorithms overlook weight selection, which may result in the quality degradation of solutions. Besides, we find that the famous partial critical path (PCP) strategy, which has been widely used to meet the deadline constraint, can not accurately reflect the situation of each time step. Workflow scheduling is an NP-hard problem, so self-optimizing algorithms are more suitable to solve it.
In this paper, the aim is to solve a workflow scheduling problem with a deadline constraint. We design a deadline constrained scientific workflow scheduling algorithm based on multi-objective reinforcement learning (RL) called DCMORL. DCMORL uses the Chebyshev scalarization function to scalarize its Q-values. This method is good at choosing weights for objectives. We propose an improved version of the PCP strategy calledMPCP. The sub-deadlines in MPCP regularly update during the scheduling phase, so they can accurately reflect the situation of each time step. The optimization objectives in this paper include minimizing the execution cost and energy consumption within a given deadline. Finally, we use four scientific workflows to compare DCMORL and several representative scheduling algorithms. The results indicate that DCMORL outperforms the above algorithms. As far as we know, it is the first time to apply RL to a deadline constrained workflow scheduling problem.