Latest issue

Oct 2023, Volume 17 Issue 5
  • Select all
    Jinyang GUO, Lu ZHANG, José ROMERO HUNG, Chao LI, Jieru ZHAO, Minyi GUO

    Cloud vendors are actively adopting FPGAs into their infrastructures for enhancing performance and efficiency. As cloud services continue to evolve, FPGA (field programmable gate array) systems would play an even important role in the future. In this context, FPGA sharing in multi-tenancy scenarios is crucial for the wide adoption of FPGA in the cloud. Recently, many works have been done towards effective FPGA sharing at different layers of the cloud computing stack.

    In this work, we provide a comprehensive survey of recent works on FPGA sharing. We examine prior art from different aspects and encapsulate relevant proposals on a few key topics. On the one hand, we discuss representative papers on FPGA resource sharing schemes; on the other hand, we also summarize important SW/HW techniques that support effective sharing. Importantly, we further analyze the system design cost behind FPGA sharing. Finally, based on our survey, we identify key opportunities and challenges of FPGA sharing in future cloud scenarios.

    Yi ZHONG, Mengyu SHI, Youran XU, Chunrong FANG, Zhenyu CHEN

    With the benefits of reducing time and workforce, automated testing has been widely used for the quality assurance of mobile applications (APPs). Compared with automated testing, manual testing can achieve higher coverage in complex interactive Activities. And the effectiveness of manual testing is highly dependent on the user operation process (UOP) of experienced testers. Based on the UOP, we propose an iterative Android automated testing (IAAT) method that automatically records, extracts, and integrates UOPs to guide the test logic of the tool across the complex Activity iteratively. The feedback test results can train the UOPs to achieve higher coverage in each iteration. We extracted 50 UOPs and conducted experiments on 10 popular mobile APPs to demonstrate IAAT’s effectiveness compared with Monkey and the initial automated tests. The experimental results show a noticeable improvement in the IAAT compared with the test logic without human knowledge. Under the 60 minutes test time, the average code coverage is improved by 13.98% to 37.83%, higher than the 27.48% of Monkey under the same conditions.

    Mingzhao WANG, Henry HAN, Zhao HUANG, Juanying XIE

    It is a significant and challenging task to detect the informative features to carry out explainable analysis for high dimensional data, especially for those with very small number of samples. Feature selection especially the unsupervised ones are the right way to deal with this challenge and realize the task. Therefore, two unsupervised spectral feature selection algorithms are proposed in this paper. They group features using advanced Self-Tuning spectral clustering algorithm based on local standard deviation, so as to detect the global optimal feature clusters as far as possible. Then two feature ranking techniques, including cosine-similarity-based feature ranking and entropy-based feature ranking, are proposed, so that the representative feature of each cluster can be detected to comprise the feature subset on which the explainable classification system will be built. The effectiveness of the proposed algorithms is tested on high dimensional benchmark omics datasets and compared to peer methods, and the statistical test are conducted to determine whether or not the proposed spectral feature selection algorithms are significantly different from those of the peer methods. The extensive experiments demonstrate the proposed unsupervised spectral feature selection algorithms outperform the peer ones in comparison, especially the one based on cosine similarity feature ranking technique. The statistical test results show that the entropy feature ranking based spectral feature selection algorithm performs best. The detected features demonstrate strong discriminative capabilities in downstream classifiers for omics data, such that the AI system built on them would be reliable and explainable. It is especially significant in building transparent and trustworthy medical diagnostic systems from an interpretable AI perspective.

    Yao ZHANG, Liangxiao JIANG, Chaoqun LI

    Crowdsourcing provides an effective and low-cost way to collect labels from crowd workers. Due to the lack of professional knowledge, the quality of crowdsourced labels is relatively low. A common approach to addressing this issue is to collect multiple labels for each instance from different crowd workers and then a label integration method is used to infer its true label. However, to our knowledge, almost all existing label integration methods merely make use of the original attribute information and do not pay attention to the quality of the multiple noisy label set of each instance. To solve these issues, this paper proposes a novel three-stage label integration method called attribute augmentation-based label integration (AALI). In the first stage, we design an attribute augmentation method to enrich the original attribute space. In the second stage, we develop a filter to single out reliable instances with high-quality multiple noisy label sets. In the third stage, we use majority voting to initialize integrated labels of reliable instances and then use cross-validation to build multiple component classifiers on reliable instances to predict all instances. Experimental results on simulated and real-world crowdsourced datasets demonstrate that AALI outperforms all the other state-of-the-art competitors.

    Jiaran LI, Richong ZHANG, Samuel MENSAH, Wenyi QIN, Chunming HU

    When a crowdsourcing approach is used to assist the classification of a set of items, the main objective is to classify this set of items by aggregating the worker-provided labels. A secondary objective is to assess the workers’ skill levels in this process. A classical model that achieves both objectives is the famous Dawid-Skene model. In this paper, we consider a third objective in this context, namely, to learn a classifier that is capable of labelling future items without further assistance of crowd workers. By extending the Dawid-Skene model to include the item features into consideration, we develop a Classification-Oriented Dawid Skene (CODS) model, which achieves the three objectives simultaneously. The effectiveness of CODS on this three dimensions of the problem space is demonstrated experimentally.

    Xiaoming SHI, Wanxiang CHE

    Slot filling, to extract entities for specific types of information (slot), is a vitally important modular of dialogue systems for automatic diagnosis. Doctor responses can be regarded as the weak supervision of patient queries. In this way, a large amount of weakly labeled data can be obtained from unlabeled diagnosis dialogue, alleviating the problem of costly and time-consuming data annotation. However, weakly labeled data suffers from extremely noisy samples. To alleviate the problem, we propose a simple and effective Co-Weak-Teaching method. The method trains two slot filling models simultaneously. These two models learn from two different weakly labeled data, ensuring learning from two aspects. Then, one model utilizes selected weakly labeled data generated by the other, iteratively. The model, obtained by the Co-Weak-Teaching on weakly labeled data, can be directly tested on testing data or sequentially fine-tuned on a small amount of human-annotated data. Experimental results on these two settings illustrate the effectiveness of the method with an increase of 8.03% and 14.74% in micro and macro f1 scores, respectively.

    Yi ZHU, Xindong WU, Jipeng QIANG, Yunhao YUAN, Yun LI

    The purpose of unsupervised domain adaptation is to use the knowledge of the source domain whose data distribution is different from that of the target domain for promoting the learning task in the target domain. The key bottleneck in unsupervised domain adaptation is how to obtain higher-level and more abstract feature representations between source and target domains which can bridge the chasm of domain discrepancy. Recently, deep learning methods based on autoencoder have achieved sound performance in representation learning, and many dual or serial autoencoder-based methods take different characteristics of data into consideration for improving the effectiveness of unsupervised domain adaptation. However, most existing methods of autoencoders just serially connect the features generated by different autoencoders, which pose challenges for the discriminative representation learning and fail to find the real cross-domain features. To address this problem, we propose a novel representation learning method based on an integrated autoencoders for unsupervised domain adaptation, called IAUDA. To capture the inter- and inner-domain features of the raw data, two different autoencoders, which are the marginalized autoencoder with maximum mean discrepancy (mAEMMD) and convolutional autoencoder (CAE) respectively, are proposed to learn different feature representations. After higher-level features are obtained by these two different autoencoders, a sparse autoencoder is introduced to compact these inter- and inner-domain representations. In addition, a whitening layer is embedded for features processed before the mAEMMD to reduce redundant features inside a local area. Experimental results demonstrate the effectiveness of our proposed method compared with several state-of-the-art baseline methods.

    Cairui YAN, Huifang MA, Qingqing LI, Fanyi YANG, Zhixin LI

    Community search is an important problem in network analysis, which has attracted much attention in recent years. As a query-oriented variant of community detection problem, community search starts with some given nodes, pays more attention to local network structures, and gets personalized resultant communities quickly. The existing community search method typically returns a single target community containing query nodes by default. This is a strict requirement and does not allow much flexibility. In many real-world applications, however, query nodes are expected to be located in multiple communities with different semantics. To address this limitation of existing methods, an efficient spectral-based Multi-Scale Community Search method (MSCS) is proposed, which can simultaneously identify the multi-scale target local communities to which query node belong. In MSCS, each node is equipped with a graph Fourier multiplier operator. The access of the graph Fourier multiplier operator helps nodes to obtain feature representations at various community scales. In addition, an efficient algorithm is proposed for avoiding the large number of matrix operations due to spectral methods. Comprehensive experimental evaluations on a variety of real-world datasets demonstrate the effectiveness and efficiency of the proposed method.

    Jinwei LUO, Mingkai HE, Weike PAN, Zhong MING

    Session-based recommendation (SBR) and multi-behavior recommendation (MBR) are both important problems and have attracted the attention of many researchers and practitioners. Different from SBR that solely uses one single type of behavior sequences and MBR that neglects sequential dynamics, heterogeneous SBR (HSBR) that exploits different types of behavioral information (e.g., examinations like clicks or browses, purchases, adds-to-carts and adds-to-favorites) in sequences is more consistent with real-world recommendation scenarios, but it is rarely studied. Early efforts towards HSBR focus on distinguishing different types of behaviors or exploiting homogeneous behavior transitions in a sequence with the same type of behaviors. However, all the existing solutions for HSBR do not exploit the rich heterogeneous behavior transitions in an explicit way and thus may fail to capture the semantic relations between different types of behaviors. However, all the existing solutions for HSBR do not model the rich heterogeneous behavior transitions in the form of graphs and thus may fail to capture the semantic relations between different types of behaviors. The limitation hinders the development of HSBR and results in unsatisfactory performance. As a response, we propose a novel behavior-aware graph neural network (BGNN) for HSBR. Our BGNN adopts a dual-channel learning strategy for differentiated modeling of two different types of behavior sequences in a session. Moreover, our BGNN integrates the information of both homogeneous behavior transitions and heterogeneous behavior transitions in a unified way. We then conduct extensive empirical studies on three real-world datasets, and find that our BGNN outperforms the best baseline by 21.87%, 18.49%, and 37.16% on average correspondingly. A series of further experiments and visualization studies demonstrate the rationality and effectiveness of our BGNN. An exploratory study on extending our BGNN to handle more than two types of behaviors show that our BGNN can easily and effectively be extended to multi-behavior scenarios.

    Quan FENG, Songcan CHEN

    Multi-task learning is to improve the performance of the model by transferring and exploiting common knowledge among tasks. Existing MTL works mainly focus on the scenario where label sets among multiple tasks (MTs) are usually the same, thus they can be utilized for learning across the tasks. However, the real world has more general scenarios in which each task has only a small number of training samples and their label sets are just partially overlapped or even not. Learning such MTs is more challenging because of less correlation information available among these tasks. For this, we propose a framework to learn these tasks by jointly leveraging both abundant information from a learnt auxiliary big task with sufficiently many classes to cover those of all these tasks and the information shared among those partially-overlapped tasks. In our implementation of using the same neural network architecture of the learnt auxiliary task to learn individual tasks, the key idea is to utilize available label information to adaptively prune the hidden layer neurons of the auxiliary network to construct corresponding network for each task, while accompanying a joint learning across individual tasks. Extensive experimental results demonstrate that our proposed method is significantly competitive compared to state-of-the-art methods.

    En XU, Zhiwen YU, Nuo LI, Helei CUI, Lina YAO, Bin GUO

    The sequential recommendation is a compelling technology for predicting users’ next interaction via their historical behaviors. Prior studies have proposed various methods to optimize the recommendation accuracy on different datasets but have not yet explored the intrinsic predictability of sequential recommendation. To this end, we consider applying the popular predictability theory of human movement behavior to this recommendation context. Still, it would incur serious bias in the next moment measurement of the candidate set size, resulting in inaccurate predictability. Therefore, determining the size of the candidate set is the key to quantifying the predictability of sequential recommendations. Here, different from the traditional approach that utilizes topological constraints, we first propose a method to learn inter-item associations from historical behaviors to restrict the size via logical constraints. Then, we extend it by 10 excellent recommendation algorithms to learn deeper associations between user behavior. Our two methods show significant improvement over existing methods in scenarios that deal with few repeated behaviors and large sets of behaviors. Finally, a prediction rate between 64% and 80% has been obtained by testing on five classical datasets in three domains of the recommender system. This provides a guideline to optimize the recommendation algorithm for a given dataset.

    Huifang YU, Ning WANG

    Network coding can improve the information transmission efficiency and reduces the network resource consumption, so it is a very good platform for information transmission. Certificateless proxy signatures are widely applied in information security fields. However, certificateless proxy signatures based on classical number theory are not suitable for the network coding environment and cannot resist the quantum computing attacks. In view of this, we construct certificateless network coding proxy signatures from lattice (LCL-NCPS). LCL-NCPS is new multi-source signature scheme which has the characteristics of anti-quantum, anti-pollution and anti-forgery. In LCL-NCPS, each source node user can output a message vector to intermediate node and sink node, and the message vectors from different source nodes will be linearly combined to achieve the aim of improving the network transmission rate and network robustness. In terms of efficiency analysis of space dimension, LCL-NCPS can obtain the lower computation complexity by reducing the dimension of proxy key. In terms of efficiency analysis of time dimension, LCL-NCPS has higher computation efficiency in signature and verification.

    Yunbo YANG, Xiaolei DONG, Zhenfu CAO, Jiachen SHEN, Shangmin DOU

    Oblivious Cross-Tags (OXT) [1] is the first efficient searchable encryption (SE) protocol for conjunctive queries in a single-writer single-reader framework. However, it also has a trade-off between security and efficiency by leaking partial database information to the server. Recent attacks on these SE schemes show that the leakages from these SE schemes can be used to recover the content of queried keywords. To solve this problem, Lai et al. [2] propose Hidden Cross-Tags (HXT), which reduces the access pattern leakage from Keyword Pair Result Pattern (KPRP) to Whole Result Pattern (WRP). However, the WRP leakage can also be used to recover some additional contents of queried keywords. This paper proposes Improved Cross-Tags (IXT), an efficient searchable encryption protocol that achieves access and searches pattern hiding based on the labeled private set intersection. We also prove the proposed labeled private set intersection (PSI) protocol is secure against semi-honest adversaries, and IXT is L-semi-honest secure (L is leakage function). Finally, we do experiments to compare IXT with HXT. The experimental results show that the storage overhead and computation overhead of the search phase at the client-side in IXT is much lower than those in HXT. Meanwhile, the experimental results also show that IXT is scalable and can be applied to various sizes of datasets.

    Jianwei LI, Xiaoming WANG, Qingqing GAN

    When one enterprise acquires another, the electronic data of the acquired enterprise will be transferred to the acquiring enterprise. In particular, if the data system of acquired enterprise contains a searchable encryption mechanism, the corresponding searchability will also be transferred. In this paper, we introduce the concept of Searchable Encryption with Ownership Transfer (SEOT), and propose a secure SEOT scheme. Based on the new structure of polling pool, our proposed searchable encryption scheme not only achieves efficient transfer of outsourced data, but also implements secure transfer of data searchability. Moreover, we optimize the storage cost for user to a desirable value. We prove our scheme can achieve the secure characteristics, then carry out the performance evaluation and experiments. The results demonstrate that our scheme is superior in efficiency and practicability.

    Yan JIANG, Youwen ZHU, Jian WANG, Xingxin LI

    Identity-based threshold signature (IDTS) is a forceful primitive to protect identity and data privacy, in which parties can collaboratively sign a given message as a signer without reconstructing a signing key. Nevertheless, most IDTS schemes rely on a trusted key generation center (KGC). Recently, some IDTS schemes can achieve escrow-free security against corrupted KGC, but all of them are vulnerable to denial-of-service attacks in the dishonest majority setting, where cheaters may force the protocol to abort without providing any feedback. In this work, we present a fully decentralized IDTS scheme to resist corrupted KGC and denial-of-service attacks. To this end, we design threshold protocols to achieve distributed key generation, private key extraction, and signing generation which can withstand the collusion between KGCs and signers, and then we propose an identification mechanism that can detect the identity of cheaters during key generation, private key extraction and signing generation. Finally, we formally prove that the proposed scheme is threshold unforgeability against chosen message attacks. The experimental results show that the computation time of both key generation and signing generation is <1 s, and private key extraction is about 3 s, which is practical in the distributed environment.

    Long LI, Jianbo HUANG, Liang CHANG, Jian WENG, Jia CHEN, Jingjing LI

    Since smartphones embedded with positioning systems and digital maps are widely used, location-based services (LBSs) are rapidly growing in popularity and providing unprecedented convenience in people’s daily lives; however, they also cause great concern about privacy leakage. In particular, location queries can be used to infer users’ sensitive private information, such as home addresses, places of work and appointment locations. Hence, many schemes providing query anonymity have been proposed, but they typically ignore the fact that an adversary can infer real locations from the correlations between consecutive locations in a continuous LBS. To address this challenge, a novel dual privacy-preserving scheme (DPPS) is proposed that includes two privacy protection mechanisms. First, to prevent privacy disclosure caused by correlations between locations, a correlation model is proposed based on a hidden Markov model (HMM) to simulate users’ mobility and the adversary’s prediction probability. Second, to provide query probability anonymity of each single location, an advanced k-anonymity algorithm is proposed to construct cloaking regions, in which realistic and indistinguishable dummy locations are generated. To validate the effectiveness and efficiency of DPPS, theoretical analysis and experimental verification are further performed on a real-life dataset published by Microsoft, i.e., GeoLife dataset.

    Zhihui YANG, Juan LIU, Xuekai ZHU, Feng YANG, Qiang ZHANG, Hayat Ali SHAH

    Prediction of drug-protein binding is critical for virtual drug screening. Many deep learning methods have been proposed to predict the drug-protein binding based on protein sequences and drug representation sequences. However, most existing methods extract features from protein and drug sequences separately. As a result, they can not learn the features characterizing the drug-protein interactions. In addition, the existing methods encode the protein (drug) sequence usually based on the assumption that each amino acid (atom) has the same contribution to the binding, ignoring different impacts of different amino acids (atoms) on the binding. However, the event of drug-protein binding usually occurs between conserved residue fragments in the protein sequence and atom fragments of the drug molecule. Therefore, a more comprehensive encoding strategy is required to extract information from the conserved fragments.

    In this paper, we propose a novel model, named FragDPI, to predict the drug-protein binding affinity. Unlike other methods, we encode the sequences based on the conserved fragments and encode the protein and drug into a unified vector. Moreover, we adopt a novel two-step training strategy to train FragDPI. The pre-training step is to learn the interactions between different fragments using unsupervised learning. The fine-tuning step is for predicting the binding affinities using supervised learning. The experiment results have illustrated the superiority of FragDPI.

    Yajing GUO, Xiujuan LEI, Lian LIU, Yi PAN

    Circular RNAs (circRNAs) are RNAs with closed circular structure involved in many biological processes by key interactions with RNA binding proteins (RBPs). Existing methods for predicting these interactions have limitations in feature learning. In view of this, we propose a method named circ2CBA, which uses only sequence information of circRNAs to predict circRNA-RBP binding sites. We have constructed a data set which includes eight sub-datasets. First, circ2CBA encodes circRNA sequences using the one-hot method. Next, a two-layer convolutional neural network (CNN) is used to initially extract the features. After CNN, circ2CBA uses a layer of bidirectional long and short-term memory network (BiLSTM) and the self-attention mechanism to learn the features. The AUC value of circ2CBA reaches 0.8987. Comparison of circ2CBA with other three methods on our data set and an ablation experiment confirm that circ2CBA is an effective method to predict the binding sites between circRNAs and RBPs.

    Haisheng LI, Guiqiong LI, Haiying XIA

    Wavelet transform is being widely used in the field of information processing. One-dimension and two-dimension quantum wavelet transforms have been investigated as important tool algorithms. However, three-dimensional quantum wavelet transforms have not been reported. This paper proposes a multi-level three-dimensional quantum wavelet transform theory to implement the wavelet transform for quantum videos. Then, we construct the iterative formulas for the multi-level three-dimensional Haar and Daubechies D4 quantum wavelet transforms, respectively. Next, we design quantum circuits of the two wavelet transforms using iterative methods. Complexity analysis shows that the proposed wavelet transforms offer exponential speed-up over their classical counterparts. Finally, the proposed quantum wavelet transforms are selected to realize quantum video compression as a primary application. Simulation results reveal that the proposed wavelet transforms have better compression performance for quantum videos than two-dimension quantum wavelet transforms.

    Fuqi JIN, Wenquan LI, Lanju KONG, Qingzhong LI