Feb 2023, Volume 17 Issue 1
    

  • Select all
    Architecture
  • RESEARCH ARTICLE
    Jiantong HUO, Yaowen XU, Zhisheng HUO, Limin XIAO, Zhenxue HE

    The authors of this paper have previously proposed the global virtual data space system (GVDS) to aggregate the scattered and autonomous storage resources in China’s national supercomputer grid (National Supercomputing Center in Guangzhou, National Supercomputing Center in Jinan, National Supercomputing Center in Changsha, Shanghai Supercomputing Center, and Computer Network Information Center in Chinese Academy of Sciences) into a storage system that spans the wide area network (WAN), which realizes the unified management of global storage resources in China. At present, the GVDS has been successfully deployed in the China National Grid environment. However, when accessing and sharing remote data in the WAN, the GVDS will cause redundant transmission of data and waste a lot of network bandwidth resources. In this paper, we propose an edge cache system as a supplementary system of the GVDS to improve the performance of upper-level applications accessing and sharing remote data. Specifically, we first designs the architecture of the edge cache system, and then study the key technologies of this architecture: the edge cache index mechanism based on double-layer hashing, the edge cache replacement strategy based on the GDSF algorithm, the request routing based on consistent hashing method, and the cluster member maintenance method based on the SWIM protocol. The experimental results show that the edge cache system has successfully implemented the relevant operation functions (read, write, deletion, modification, etc.) and is compatible with the POSIX interface in terms of function. Further, it can greatly reduce the amount of data transmission and increase the data access bandwidth when the accessed file is located at the edge cache system in terms of performance, i.e., its performance is close to the performance of the network file system in the local area network (LAN).

  • RESEARCH ARTICLE
    Shuai XUE, Shang ZHAO, Quan CHEN, Zhuo SONG, Shanpei CHEN, Tao MA, Yong YANG, Wenli ZHENG, Minyi GUO

    While researchers have proposed many techniques to mitigate the contention on the shared cache and memory bandwidth, none of them has considered the memory bus contention due to split lock. Our study shows that the split lock may cause 9X longer data access latency without saturating the memory bandwidth. To minimize the impact of split lock, we propose Kronos, a runtime system composed of an online bus contention tolerance meter and a bus contention-aware job scheduler. The meter characterizes the tolerance of jobs to the “pressure” of bus contention and builds a tolerance model with the polynomial regression technique. The job scheduler allocates user jobs to the physical nodes in a contention aware manner. We design three scheduling policies that minimize the number of required nodes while ensuring the Service Level Agreement (SLA) of all the user jobs, minimize the number of jobs that suffer from SLA violation without enough nodes, and maximize the overall performance without considering the SLA violation, respectively. Adopting the three policies, Kronos reduces the number of the required nodes by 42.1% while ensuring the SLA of all the jobs, reduces the number of the jobs that suffer from SLA violation without enough nodes by 72.8%, and improves the overall performance by 35.2% without considering SLA.

  • Software
  • LETTER
    Kabir Sulaiman SAID, Liming NIE, Yuanchang LIN, Yaowen ZHENG, Zuohua DING
  • RESEARCH ARTICLE
    Chong TIAN, Haikun LIU, Xiaofei LIAO, Hai JIN

    Unikernels provide an efficient and lightweight way to deploy cloud computing services in application-specialized and single-address-space virtual machines (VMs). They can efficiently deploy hundreds of unikernel-based VMs in a single physical server. In such a cloud computing platform, main memory is the primary bottleneck resource for high-density application deployment. Recently, non-volatile memory (NVM) technologies has become increasingly popular in cloud data centers because they can offer extremely large memory capacity at a low expense. However, there still remain many challenges to utilize NVMs for unikernel-based VMs, such as the difficulty of heterogeneous memory allocation and high performance overhead of address translations.

    In this paper, we present UCat, a heterogeneous memory management mechanism that support multi-grained memory allocation for unikernels. We propose front-end/back-end cooperative address space mapping to expose the host memory heterogeneity to unikernels. UCat exploits large pages to reduce the cost of two-layer address translation in virtualization environments, and leverages slab allocation to reduce memory waste due to internal memory fragmentation. We implement UCat based on a popular unikernel--OSv and conduct extensive experiments to evaluate its efficiency. Experimental results show that UCat can reduce the memory consumption of unikernels by 50% and TLB miss rate by 41%, and improve the throughput of real-world benchmarks such as memslap and YCSB by up to 18.5% and 14.8%, respectively.

  • REVIEW ARTICLE
    Yuxia SUN, Jiefeng FANG, Yanjia CHEN, Yepang LIU, Zhao CHEN, Song GUO, Xinkai CHEN, Ziyuan TAN

    Android applications are becoming increasingly powerful in recent years. While their functionality is still of paramount importance to users, the energy efficiency of these applications is also gaining more and more attention. Researchers have discovered various types of energy defects in Android applications, which could quickly drain the battery power of mobile devices. Such defects not only cause inconvenience to users, but also frustrate Android developers as diagnosing the energy inefficiency of a software product is a non-trivial task. In this work, we perform a literature review to understand the state of the art of energy inefficiency diagnosis for Android applications. We identified 55 research papers published in recent years and classified existing studies from four different perspectives, including power estimation method, hardware component, types of energy defects, and program analysis approach. We also did a cross-perspective analysis to summarize and compare our studied techniques. We hope that our review can help structure and unify the literature and shed light on future research, as well as drawing developers' attention to build energy-efficient Android applications.

  • Artificial Intelligence
  • LETTER
    Lingbin JIN, Li ZHANG, Lei ZHAO
  • RESEARCH ARTICLE
    Momo MATSUDA, Yasunori FUTAMURA, Xiucai YE, Tetsuya SAKURAI

    Single-cell RNA-seq (scRNA-seq) allows the analysis of gene expression in each cell, which enables the detection of highly variable genes (HVG) that contribute to cell-to-cell variation within a homogeneous cell population. HVG detection is necessary for clustering analysis to improve the clustering result. scRNA-seq includes some genes that are expressed with a certain probability in all cells which make the cells indistinguishable. These genes are referred to as background noise. To remove the background noise and select the informative genes for clustering analysis, in this paper, we propose an effective HVG detection method based on principal component analysis (PCA). The proposed method utilizes PCA to evaluate the genes (features) on the sample space. The distortion-free principal components are selected to calculate the distance from the origin to gene as the weight of each gene. The genes that have the greatest distances to the origin are selected for clustering analysis. Experimental results on both synthetic and gene expression datasets show that the proposed method not only removes the background noise to select the informative genes for clustering analysis, but also outperforms the existing HVG detection methods.

  • LETTER
    Qing TIAN, Heyang SUN, Shun PENG, Tinghuai MA
  • RESEARCH ARTICLE
    Ke-Jia CHEN, Mingyu WU, Yibo ZHANG, Zhiwei CHEN

    Image super-resolution (SR) is one of the classic computer vision tasks. This paper proposes a super-resolution network based on adaptive frequency component upsampling, named SR-AFU. The network is composed of multiple cascaded dilated convolution residual blocks (CDCRB) to extract multi-resolution features representing image semantics, and multiple multi-size convolutional upsampling blocks (MCUB) to adaptively upsample different frequency components using CDCRB features. The paper also defines a new loss function based on the discrete wavelet transform, making the reconstructed SR images closer to human perception. Experiments on the benchmark datasets show that SR-AFU has higher peak signal to noise ratio (PSNR), significantly faster training speed and more realistic visual effects compared with the existing methods.

  • RESEARCH ARTICLE
    Hongjia RUAN, Huihui SONG, Bo LIU, Yong CHENG, Qingshan LIU

    Deep neural networks have achieved great success in varieties of artificial intelligent fields. Since training a good deep model is often challenging and costly, such deep models are of great value and even the key commercial intellectual properties. Recently, deep model intellectual property protection has drawn great attention from both academia and industry, and numerous works have been proposed. However, most of them focus on the classification task. In this paper, we present the first attempt at protecting deep semantic segmentation models from potential infringements. In details, we design a new hybrid intellectual property protection framework by combining the trigger-set based and passport based watermarking simultaneously. Within it, the trigger-set based watermarking mechanism aims to force the network output copyright watermarks for a pre-defined trigger image set, which enables black-box remote ownership verification. And the passport based watermarking mechanism is to eliminate the ambiguity attack risk of trigger-set based watermarking by adding an extra passport layer into the target model. Through extensive experiments, the proposed framework not only demonstrates its effectiveness upon existing segmentation models, but also shows strong robustness to different attack techniques.

  • RESEARCH ARTICLE
    Yongchun ZHU, Fuzhen ZHUANG, Xiangliang ZHANG, Zhiyuan QI, Zhiping SHI, Juan CAO, Qing HE

    Many few-shot learning approaches have been designed under the meta-learning framework, which learns from a variety of learning tasks and generalizes to new tasks. These meta-learning approaches achieve the expected performance in the scenario where all samples are drawn from the same distributions (i.i.d. observations). However, in real-world applications, few-shot learning paradigm often suffers from data shift, i.e., samples in different tasks, even in the same task, could be drawn from various data distributions. Most existing few-shot learning approaches are not designed with the consideration of data shift, and thus show downgraded performance when data distribution shifts. However, it is non-trivial to address the data shift problem in few-shot learning, due to the limited number of labeled samples in each task. Targeting at addressing this problem, we propose a novel metric-based meta-learning framework to extract task-specific representations and task-shared representations with the help of knowledge graph. The data shift within/between tasks can thus be combated by the combination of task-shared and task-specific representations. The proposed model is evaluated on popular benchmarks and two constructed new challenging datasets. The evaluation results demonstrate its remarkable performance.

  • RESEARCH ARTICLE
    Haoyu ZHAO, Weidong MIN, Jianqiang XU, Qi WANG, Yi ZOU, Qiyan FU

    Crowd counting is recently becoming a hot research topic, which aims to count the number of the people in different crowded scenes. Existing methods are mainly based on training-testing pattern and rely on large data training, which fails to accurately count the crowd in real-world scenes because of the limitation of model’s generalization capability. To alleviate this issue, a scene-adaptive crowd counting method based on meta-learning with Dual-illumination Merging Network (DMNet) is proposed in this paper. The proposed method based on learning-to-learn and few-shot learning is able to adapt different scenes which only contain a few labeled images. To generate high quality density map and count the crowd in low-lighting scene, the DMNet is proposed, which contains Multi-scale Feature Extraction module and Element-wise Fusion Module. The Multi-scale Feature Extraction module is used to extract the image feature by multi-scale convolutions, which helps to improve network accuracy. The Element-wise Fusion module fuses the low-lighting feature and illumination-enhanced feature, which supplements the missing illumination in low-lighting environments. Experimental results on benchmarks, WorldExpo’10, DISCO, USCD, and Mall, show that the proposed method outperforms the existing state-of-the-art methods in accuracy and gets satisfied results.

  • RESEARCH ARTICLE
    Jipeng QIANG, Feng ZHANG, Yun LI, Yunhao YUAN, Yi ZHU, Xindong WU

    Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.

  • RESEARCH ARTICLE
    Wei GAO, Mingwen SHAO, Jun SHU, Xinkai ZHUANG

    In this paper, we propose a lightweight network with an adaptive batch normalization module, called Meta-BN Net, for few-shot classification. Unlike existing few-shot learning methods, which consist of complex models or algorithms, our approach extends batch normalization, an essential part of current deep neural network training, whose potential has not been fully explored. In particular, a meta-module is introduced to learn to generate more powerful affine transformation parameters, known as γ and β, in the batch normalization layer adaptively so that the representation ability of batch normalization can be activated. The experimental results on miniImageNet demonstrate that Meta-BN Net not only outperforms the baseline methods at a large margin but also is competitive with recent state-of-the-art few-shot learning methods. We also conduct experiments on Fewshot-CIFAR100 and CUB datasets, and the results show that our approach is effective to boost the performance of weak baseline networks. We believe our findings can motivate to explore the undiscovered capacity of base components in a neural network as well as more efficient few-shot learning methods.

  • RESEARCH ARTICLE
    Le QI, Yu ZHANG, Ting LIU

    Transformers have been widely studied in many natural language processing (NLP) tasks, which can capture the dependency from the whole sentence with a high parallelizability thanks to the multi-head attention and the position-wise feed-forward network. However, the above two components of transformers are position-independent, which causes transformers to be weak in modeling sentence structures. Existing studies commonly utilized positional encoding or mask strategies for capturing the structural information of sentences. In this paper, we aim at strengthening the ability of transformers on modeling the linear structure of sentences from three aspects, containing the absolute position of tokens, the relative distance, and the direction between tokens. We propose a novel bidirectional Transformer with absolute-position aware relative position encoding (BiAR-Transformer) that combines the positional encoding and the mask strategy together. We model the relative distance between tokens along with the absolute position of tokens by a novel absolute-position aware relative position encoding. Meanwhile, we apply a bidirectional mask strategy for modeling the direction between tokens. Experimental results on the natural language inference, paraphrase identification, sentiment classification and machine translation tasks show that BiAR-Transformer achieves superior performance than other strong baselines.

  • Theoretical Computer Science
  • LETTER
    Caiquan XIONG, Hang LIU, Xinyun WU, Na DENG
  • Information Systems
  • REVIEW ARTICLE
    Yingjie LIU, Tiancheng ZHANG, Xuecen WANG, Ge YU, Tao LI

    Cognitive diagnosis is the judgment of the student’s cognitive ability, is a wide-spread concern in educational science. The cognitive diagnosis model (CDM) is an essential method to realize cognitive diagnosis measurement. This paper presents new research on the cognitive diagnosis model and introduces four individual aspects of probability-based CDM and deep learning-based CDM. These four aspects are higher-order latent trait, polytomous responses, polytomous attributes, and multilevel latent traits. The paper also sorts on the contained ideas, model structures and respective characteristics, and provides direction for developing cognitive diagnosis in the future.

  • LETTER
    Xuecheng QI, Huiqi HU, Jinwei GUO, Chenchen HUANG, Xuan ZHOU, Ning XU, Yu FU, Aoying ZHOU
  • LETTER
    Zhigang WANG, Ning WANG, Jie NIE, Zhiqiang WEI, Yu GU, Ge YU
  • RESEARCH ARTICLE
    Baiyou QIAO, Zhongqiang WU, Ling MA, Yicheng Zhou, Yunjiao SUN

    Accurate prediction of sea surface temperature (SST) is extremely important for forecasting oceanic environmental events and for ocean studies. However, the existing SST prediction methods do not consider the seasonal periodicity and abnormal fluctuation characteristics of SST or the importance of historical SST data from different times; thus, these methods suffer from low prediction accuracy. To solve this problem, we comprehensively consider the effects of seasonal periodicity and abnormal fluctuation characteristics of SST data, as well as the influence of historical data in different periods, on prediction accuracy. We propose a novel ensemble learning approach that combines the Predictive Recurrent Neural Network(PredRNN) network and an attention mechanism for effective SST field prediction. In this approach, the XGBoost model is used to learn the long-period fluctuation law of SST and to extract seasonal periodic features from SST data. The exponential smoothing method is used to mitigate the impact of severely abnormal SST fluctuations and extract the a priori features of SST data. The outputs of the two aforementioned models and the original SST data are stacked and used as inputs for the next model, the PredRNN network. PredRNN is the most recently developed spatiotemporal deep learning network, which simulates both spatial and temporal representations and is capable of transferring memory across layers and time steps. Therefore, we used it to extract the spatiotemporal correlations of SST data and predict future SSTs. Finally, an attention mechanism is added to capture the importance of different historical SST data, weigh the output of each step of the PredRNN network, and improve the prediction accuracy. The experimental results on two ocean datasets confirm that the proposed approach achieves higher training efficiency and prediction accuracy than the existing SST field prediction approaches do.

  • Image and Graphics
  • LETTER
    Yanchao GONG, Zhao LI, Zhuang WANG, Kaifang YANG, Ying LIU, Keng Pang LIM
  • RESEARCH ARTICLE
    Yanni PENG, Xiaoping FAN, Rong CHEN, Ziyao YU, Shi LIU, Yunpeng CHEN, Ying ZHAO, Fangfang ZHOU

    Massive sequence view (MSV) is a classic timeline-based dynamic network visualization approach. However, it is vulnerable to visual clutter caused by overlapping edges, thereby leading to unexpected misunderstanding of time-varying trends of network communications. This study presents a new edge sampling algorithm called edge-based multi-class blue noise (E-MCBN) to reduce visual clutter in MSV. Our main idea is inspired by the multi-class blue noise (MCBN) sampling algorithm, commonly used in multi-class scatterplot decluttering. First, we take a node pair as an edge class, which can be regarded as an analogy to classes in multi-class scatterplots. Second, we propose two indicators, namely, class overlap and inter-class conflict degrees, to measure the overlapping degree and mutual exclusion, respectively, between edge classes. These indicators help construct the foundation of migrating the MCBN sampling from multi-class scatterplots to dynamic network samplings. Finally, we propose three strategies to accelerate MCBN sampling and a partitioning strategy to preserve local high-density edges in the MSV. The result shows that our approach can effectively reduce visual clutters and improve the readability of MSV. Moreover, our approach can also overcome the disadvantages of the MCBN sampling (i.e., long-running and failure to preserve local high-density communication areas in MSV). This study is the first that introduces MCBN sampling into a dynamic network sampling.

  • Information Security
  • RESEARCH ARTICLE
    Peng LI, Junzuo LAI, Yongdong WU

    We introduce a new notion called accountable attribute-based authentication with fine-grained access control (AccABA), which achieves (i) fine-grained access control that prevents ineligible users from authenticating; (ii) anonymity such that no one can recognize the identity of a user; (iii) public accountability, i.e., as long as a user authenticates two different messages, the corresponding authentications will be easily identified and linked, and anyone can reveal the user’s identity without any help from a trusted third party. Then, we formalize the security requirements in terms of unforgeability, anonymity, linkability and traceability, and give a generic construction to fulfill these requirements. Based on AccABA, we further present the first attribute-based, fair, anonymous and publicly traceable crowdsourcing scheme on blockchain, which is designed to filter qualified workers to participate in tasks, and ensures the fairness of the competition between workers, and finally balances the tension between anonymity and accountability.

  • RESEARCH ARTICLE
    Pu SUN, Sen CHEN, Lingling FAN, Pengfei GAO, Fu SONG, Min YANG

    Activity hijacking is one of the most powerful attacks in Android. Though promising, all the prior activity hijacking attacks suffer from some limitations and have limited attack capabilities. They no longer pose security threats in recent Android due to the presence of effective defense mechanisms. In this work, we propose the first automated and adaptive activity hijacking attack, named VenomAttack, enabling a spectrum of customized attacks (e.g., phishing, spoofing, and DoS) on a large scale in recent Android, even the state-of-the-art defense mechanisms are deployed. Specifically, we propose to use hotpatch techniques to identify vulnerable devices and update attack payload without re-installation and re-distribution, hence bypassing offline detection. We present a newly-discovered flaw in Android and a bug in derivatives of Android, each of which allows us to check if a target app is running in the background or not, by which we can determine the right attack timing via a designed transparent activity. We also propose an automated fake activity generation approach, allowing large-scale attacks. Requiring only the common permission INTERNET, we can hijack activities at the right timing without destroying the GUI integrity of the foreground app. We conduct proof-of-concept attacks, showing that VenomAttack poses severe security risks on recent Android versions. The user study demonstrates the effectiveness of VenomAttack in real-world scenarios, achieving a high success rate (95%) without users’ awareness. That would call more attention to the stakeholders like Google.