Journal home Browse Most accessed

Most accessed

  • Select all
  • RESEARCH ARTICLE
    Shijun XIANG, Guanqi RUAN, Hao LI, Jiayong HE
    Frontiers of Computer Science, 2022, 16(2): 162804. https://doi.org/10.1007/s11704-020-0112-z

    Security of databases has always been a hot topic in the field of information security. Privacy protection can be realized by encrypting data, while data copyright can be protected by using digital watermarking technology. By combining these two technologies, a database’s copyright and privacy problems in the cloud can be effectively solved. Based on order-preserving encryption scheme (OPES), circular histogram and digital watermarking technology, this paper proposes a new robust watermarking scheme for protection of databases in the encrypted domain. Firstly, the OPES is used to encrypt data to avoid exposing the data in the cloud. Then, the encrypted data are grouped and modified by the use of a circular histogram for embedding a digital watermark. The common data query operations in database are available for the encrypted watermarking database. In receivers, the digital watermark and the original data can be restored through a secret key and a key table. Experimental results have shown that the proposed algorithm is robust against common database attacks in the encrypted domain.

  • RESEARCH ARTICLE
    Yan-Ping SUN, Min-Ling ZHANG
    Frontiers of Computer Science, 2021, 15(5): 155320. https://doi.org/10.1007/s11704-020-9294-7

    Multi-label classification aims to assign a set of proper labels for each instance, where distance metric learning can help improve the generalization ability of instance-based multi-label classification models. Existing multi-label metric learning techniques work by utilizing pairwise constraints to enforce that examples with similar label assignments should have close distance in the embedded feature space. In this paper, a novel distance metric learning approach for multi-label classification is proposed by modeling structural interactions between instance space and label space. On one hand, compositional distance metric is employed which adopts the representation of a weighted sum of rank-1 PSD matrices based on component bases. On the other hand, compositional weights are optimized by exploiting triplet similarity constraints derived from both instance and label spaces. Due to the compositional nature of employed distance metric, the resulting problem admits quadratic programming formulation with linear optimization complexity w.r.t. the number of training examples.We also derive the generalization bound for the proposed approach based on algorithmic robustness analysis of the compositional metric. Extensive experiments on sixteen benchmark data sets clearly validate the usefulness of compositional metric in yielding effective distance metric for multi-label classification.

  • REVIEW ARTICLE
    Fanchao QI, Ruobing XIE, Yuan ZANG, Zhiyuan LIU, Maosong SUN
    Frontiers of Computer Science, 2021, 15(5): 155327. https://doi.org/10.1007/s11704-020-0002-4

    A sememe is defined as the minimum semantic unit of languages in linguistics. Sememe knowledge bases are built by manually annotating sememes for words and phrases. HowNet is the most well-known sememe knowledge base. It has been extensively utilized in many natural language processing tasks in the era of statistical natural language processing and proven to be effective and helpful to understanding and using languages. In the era of deep learning, although data are thought to be of vital importance, there are some studies working on incorporating sememe knowledge bases like HowNet into neural network models to enhance system performance. Some successful attempts have been made in the tasks including word representation learning, language modeling, semantic composition, etc. In addition, considering the high cost of manual annotation and update for sememe knowledge bases, some work has tried to use machine learning methods to automatically predict sememes for words and phrases to expand sememe knowledge bases. Besides, some studies try to extend HowNet to other languages by automatically predicting sememes for words and phrases in a new language. In this paper, we summarize recent studies on application and expansion of sememe knowledge bases and point out some future directions of research on sememes.

  • RESEARCH ARTICLE
    Yunhao SUN, Guanyu LI, Jingjing DU, Bo NING, Heng CHEN
    Frontiers of Computer Science, 2022, 16(3): 163606. https://doi.org/10.1007/s11704-020-0360-y

    The problem of subgraph matching is one fundamental issue in graph search, which is NP-Complete problem. Recently, subgraph matching has become a popular research topic in the field of knowledge graph analysis, which has a wide range of applications including question answering and semantic search. In this paper, we study the problem of subgraph matching on knowledge graph. Specifically, given a query graph q and a data graph G, the problem of subgraph matching is to conduct all possible subgraph isomorphic mappings of q on G. Knowledge graph is formed as a directed labeled multi-graph having multiple edges between a pair of vertices and it has more dense semantic and structural features than general graph. To accelerate subgraph matching on knowledge graph, we propose a novel subgraph matching algorithm based on subgraph index for knowledge graph, called as F G q T-Match. The subgraph matching algorithm consists of two key designs. One design is a subgraph index of matching-driven flow graph ( F G q T), which reduces redundant calculations in advance. Another design is a multi-label weight matrix, which evaluates a near-optimal matching tree for minimizing the intermediate candidates. With the aid of these two key designs, all subgraph isomorphic mappings are quickly conducted only by traversing F G q T. Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.

  • RESEARCH ARTICLE
    Juntao CHEN, Quan ZOU, Jing LI
    Frontiers of Computer Science, 2022, 16(2): 162302. https://doi.org/10.1007/s11704-020-0180-0

    N6-methyladenosine (m 6A) is a prevalent methylation modification and plays a vital role in various biological processes, such as metabolism, mRNA processing, synthesis, and transport. Recent studies have suggested that m 6A modification is related to common diseases such as cancer, tumours, and obesity. Therefore, accurate prediction of methylation sites in RNA sequences has emerged as a critical issue in the area of bioinformatics. However, traditional high-throughput sequencing and wet bench experimental techniques have the disadvantages of high costs, significant time requirements and inaccurate identification of sites. But through the use of traditional experimental methods, researchers have produced many large databases of m 6A sites. With the support of these basic databases and existing deep learning methods, we developed an m 6A site predictor named DeepM6ASeq-EL, which integrates an ensemble of five LSTM and CNN classifiers with the combined strategy of hard voting. Compared to the state-of-the-art prediction method WHISTLE (average AUC 0.948 and 0.880), the DeepM6ASeq-EL had a lower accuracy in m 6A site prediction (average AUC: 0.861 for the full transcript models and 0.809 for the mature messenger RNA models) when tested on six independent datasets.

  • RESEARCH ARTICLE
    Jingya FENG, Lang LI
    Frontiers of Computer Science, 2022, 16(3): 163813. https://doi.org/10.1007/s11704-020-0115-9

    In this paper, we propose a new lightweight block cipher called SCENERY. The main purpose of SCENERY design applies to hardware and software platforms. SCENERY is a 64-bit block cipher supporting 80-bit keys, and its data processing consists of 28 rounds. The round function of SCENERY consists of 8 4 × 4 S-boxes in parallel and a 32 × 32 binary matrix, and we can implement SCENERY with some basic logic instructions. The hardware implementation of SCENERY only requires 1438 GE based on 0.18 um CMOS technology, and the software implementation of encrypting or decrypting a block takes approximately 1516 clock cycles on 8-bit microcontrollers and 364 clock cycles on 64-bit processors. Compared with other encryption algorithms, the performance of SCENERY is well balanced for both hardware and software. By the security analyses, SCENERY can achieve enough security margin against known attacks, such as differential cryptanalysis, linear cryptanalysis, impossible differential cryptanalysis and related-key attacks.

  • LETTER
    Haiyong BAO, Beibei LI
    Frontiers of Computer Science, 2021, 15(5): 155812. https://doi.org/10.1007/s11704-020-9402-8
  • RESEARCH ARTICLE
    Kaimin WEI, Tianqi LI, Feiran HUANG, Jinpeng CHEN, Zefan HE
    Frontiers of Computer Science, 2022, 16(2): 162601. https://doi.org/10.1007/s11704-020-0025-x

    Accurate diagnosis is a significant step in cancer treatment. Machine learning can support doctors in prognosis decision-making, and its performance is always weakened by the high dimension and small quantity of genetic data. Fortunately, deep learning can effectively process the high dimensional data with growing. However, the problem of inadequate data remains unsolved and has lowered the performance of deep learning. To end it, we propose a generative adversarial model that uses non target cancer data to help target generator training. We use the reconstruction loss to further stabilize model training and improve the quality of generated samples. We also present a cancer classification model to optimize classification performance. Experimental results prove that mean absolute error of cancer gene made by our model is 19.3% lower than DC-GAN, and the classification accuracy rate of our produced data is higher than the data created by GAN. As for the classification model, the classification accuracy of our model reaches 92.6%, which is 7.6% higher than the model without any generated data.

  • RESEARCH ARTICLE
    Yu OU, Lang LI
    Frontiers of Computer Science, 2022, 16(2): 162303. https://doi.org/10.1007/s11704-020-0209-4

    There has been a growing interest in the side-channel analysis (SCA) field based on deep learning (DL) technology. Various DL network or model has been developed to improve the efficiency of SCA. However, few studies have investigated the impact of the different models on attack results and the exact relationship between power consumption traces and intermediate values. Based on the convolutional neural network and the autoencoder, this paper proposes a Template Analysis Pre-trained DL Classification model named TAPDC which contains three sub-networks. The TAPDC model detects the periodicity of power trace, relating power to the intermediate values and mining the deeper features by the multi-layer convolutional net. We implement the TAPDC model and compare it with two classical models in a fair experiment. The evaluative results show that the TAPDC model with autoencoder and deep convolution feature extraction structure in SCA can more effectively extract information from power consumption trace. Also, Using the classifier layer, this model links power information to the probability of intermediate value. It completes the conversion from power trace to intermediate values and greatly improves the efficiency of the power attack.

  • LETTER
    Zhuo-Xin ZHAN, Ming-Kai HE, Wei-Ke PAN, Zhong MING
    Frontiers of Computer Science, 2022, 16(2): 162615. https://doi.org/10.1007/s11704-022-1184-8
  • RESEARCH ARTICLE
    Yao QIN, Hua WANG, Shanwen YI, Xiaole LI, Linbo ZHAI
    Frontiers of Computer Science, 2021, 15(5): 155105. https://doi.org/10.1007/s11704-020-9273-z

    Recently, a growing number of scientific applications have been migrated into the cloud. To deal with the problems brought by clouds, more and more researchers start to consider multiple optimization goals in workflow scheduling. However, the previous works ignore some details, which are challenging but essential. Most existing multi-objective workflow scheduling algorithms overlook weight selection, which may result in the quality degradation of solutions. Besides, we find that the famous partial critical path (PCP) strategy, which has been widely used to meet the deadline constraint, can not accurately reflect the situation of each time step. Workflow scheduling is an NP-hard problem, so self-optimizing algorithms are more suitable to solve it.

    In this paper, the aim is to solve a workflow scheduling problem with a deadline constraint. We design a deadline constrained scientific workflow scheduling algorithm based on multi-objective reinforcement learning (RL) called DCMORL. DCMORL uses the Chebyshev scalarization function to scalarize its Q-values. This method is good at choosing weights for objectives. We propose an improved version of the PCP strategy calledMPCP. The sub-deadlines in MPCP regularly update during the scheduling phase, so they can accurately reflect the situation of each time step. The optimization objectives in this paper include minimizing the execution cost and energy consumption within a given deadline. Finally, we use four scientific workflows to compare DCMORL and several representative scheduling algorithms. The results indicate that DCMORL outperforms the above algorithms. As far as we know, it is the first time to apply RL to a deadline constrained workflow scheduling problem.

  • RESEARCH ARTICLE
    Zhangjie FU, Yan WANG, Xingming SUN, Xiaosong ZHANG
    Frontiers of Computer Science, 2022, 16(2): 162802. https://doi.org/10.1007/s11704-021-0277-0

    Searchable encryption provides an effective way for data security and privacy in cloud storage. Users can retrieve encrypted data in the cloud under the premise of protecting their own data security and privacy. However, most of the current content-based retrieval schemes do not contain enough semantic information of the article and cannot fully reflect the semantic information of the text. In this paper, we propose two secure and semantic retrieval schemes based on BERT (bidirectional encoder representations from transformers) named SSRB-1, SSRB-2. By training the documents with BERT, the keyword vector is generated to contain more semantic information of the documents, which improves the accuracy of retrieval and makes the retrieval result more consistent with the user’s intention. Finally, through testing on real data sets, it is shown that both of our solutions are feasible and effective.

  • LETTER
    Chenchen SUN, Derong SHEN
    Frontiers of Computer Science, 2022, 16(4): 164340. https://doi.org/10.1007/s11704-021-1130-1
  • LETTER
    Zaijun ZHANG, Daoyun XU, Jincheng ZHOU
    Frontiers of Computer Science, 2021, 15(6): 156405. https://doi.org/10.1007/s11704-021-0318-8
  • LETTER
    Jiping ZHENG, Qi DONG, Xianhong QIU, Xingnan HUANG
    Frontiers of Computer Science, 2021, 15(6): 156618. https://doi.org/10.1007/s11704-020-0178-7
  • RESEARCH ARTICLE
    Yi REN, Ning XU, Miaogen LING, Xin GENG
    Frontiers of Computer Science, 2022, 16(1): 161306. https://doi.org/10.1007/s11704-021-0611-6

    Multimodal machine learning (MML) aims to understand the world from multiple related modalities. It has attracted much attention as multimodal data has become increasingly available in real-world application. It is shown that MML can perform better than single-modal machine learning, since multi-modalities containing more information which could complement each other. However, it is a key challenge to fuse the multi-modalities in MML. Different from previous work, we further consider the side-information, which reflects the situation and influences the fusion of multi-modalities. We recover multimodal label distribution (MLD) by leveraging the side-information, representing the degree to which each modality contributes to describing the instance. Accordingly, a novel framework named multimodal label distribution learning (MLDL) is proposed to recover the MLD, and fuse the multimodalities with its guidance to learn an in-depth understanding of the jointly feature representation. Moreover, two versions of MLDL are proposed to deal with the sequential data. Experiments on multimodal sentiment analysis and disease prediction show that the proposed approaches perform favorably against state-of-the-art methods.

  • RESEARCH ARTICLE
    Yong XIAO, Kaihong ZHENG, Supaporn LONAPALAWONG, Wenjie LU, Zexian CHEN, Bin QIAN, Tianye ZHANG, Xin WANG, Wei CHEN
    Frontiers of Computer Science, 2022, 16(2): 162604. https://doi.org/10.1007/s11704-020-0088-8

    Closely related to the economy, the analysis and management of electricity consumption has been widely studied. Conventional approaches mainly focus on the prediction and anomaly detection of electricity consumption, which fails to reveal the in-depth relationships between electricity consumption and various factors such as industry, weather etc.. In the meantime, the lack of analysis tools has increased the difficulty in analytical tasks such as correlation analysis and comparative analysis. In this paper, we introduce EcoVis, a visual analysis system that supports the industrial-level spatio-temporal correlation analysis in the electricity consumption data. We not only propose a novel approach to model spatio-temporal data into a graph structure for easier correlation analysis, but also introduce a novel visual representation to display the distributions of multiple instances in a single map. We implement the system with the cooperation with domain experts. Experiments are conducted to demonstrate the effectiveness of our method.

  • RESEARCH ARTICLE
    Zheng HUO, Ping HE, Lisha HU, Huanyu ZHAO
    Frontiers of Computer Science, 2021, 15(5): 155811. https://doi.org/10.1007/s11704-020-9462-9

    User profiles are widely used in the age of big data. However, generating and releasing user profiles may cause serious privacy leakage, since a large number of personal data are collected and analyzed. In this paper, we propose a differentially private user profile construction method DP-UserPro, which is composed of DP-CLIQUE and privately top-k tags selection. DP-CLIQUE is a differentially private high dimensional data cluster algorithm based on CLIQUE. The multidimensional tag space is divided into cells, Laplace noises are added into the count value of each cell. Based on the breadthfirst-search, the largest connected dense cells are clustered into a cluster. Then a privately top-k tags selection approach is proposed based on the score function of each tag, to select the most important k tags which can represent the characteristics of the cluster. Privacy and utility of DP-UserPro are theoretically analyzed and experimentally evaluated in the last. Comparison experiments are carried out with Tag Suppression algorithm on two real datasets, to measure the False Negative Rate (FNR) and precision. The results show that DP-UserPro outperforms Tag Suppression by 62.5% in the best case and 14.25% in the worst case on FNR, and DP-UserPro is about 21.1% better on precision than that of Tag Suppression, in average.

  • RESEARCH ARTICLE
    Suyu MEI
    Frontiers of Computer Science, 2022, 16(1): 161901. https://doi.org/10.1007/s11704-021-0476-8

    Rapidly identifying protein complexes is significant to elucidate the mechanisms of macromolecular interactions and to further investigate the overlapping clinical manifestations of diseases. To date, existing computational methods majorly focus on developing unsupervised graph clustering algorithms, sometimes in combination with prior biological insights, to detect protein complexes from protein-protein interaction (PPI) networks. However, the outputs of these methods are potentially structural or functional modules within PPI networks. These modules do not necessarily correspond to the actual protein complexes that are formed via spatiotemporal aggregation of subunits. In this study, we propose a computational framework that combines supervised learning and dense subgraphs discovery to predict protein complexes. The proposed framework consists of two steps. The first step reconstructs genome-scale protein co-complex networks via training a supervised learning model of l2-regularized logistic regression on experimentally derived co-complexed protein pairs; and the second step infers hierarchical and balanced clusters as complexes from the co-complex networks via effective but computationally intensive k-clique graph clustering method or efficient maximum modularity clustering (MMC) algorithm. Empirical studies of cross validation and independent test show that both steps achieve encouraging performance. The proposed framework is fundamentally novel and excels over existing methods in that the complexes inferred from protein cocomplex networks are more biologically relevant than those inferred from PPI networks, providing a new avenue for identifying novel protein complexes.

  • LETTER
    Xian MO, Jun PANG, Zhiming LIU
    Frontiers of Computer Science, 2022, 16(2): 162304. https://doi.org/10.1007/s11704-020-0092-z
  • RESEARCH ARTICLE
    Mingyu DENG, Wei YANG, Chao CHEN, Chenxi LIU
    Frontiers of Computer Science, 2022, 16(4): 164316. https://doi.org/10.1007/s11704-020-0007-z

    Understanding the influencing mechanism of the urban streetscape on crime is fairly important to crime prevention and urban management. Recently, the development of deep learning technology and big data of street view images, makes it possible to quantitatively explore the relationship between streetscape and crime. This study computed eight streetscape indexes of the street built environment using Google Street View images firstly. Then, the association between the eight indexes and recorded crime events was revealed with a poisson regression model and a geographically weighted poisson regression model. An experiment was conducted in downtown and uptown Manhattan, New York. Global regression results show that the influences of Motorization Index on crimes are significant and positive, while the effects of the Light View Index and Green View Index on crimes depend heavily on the socio-economic factors. From a local perspective, the Pedestrian Space Index, Green View Index, Light View IndexandMotorization Index have a significant spatial influence on crimes, while the same visual streetscape factors have different effects on different streets due to the combination differences of socio-economic, cultural and streetscape elements. The key streetscape elements of a given street that affect a specific criminal activity can be identified according to the strength of the association. The results provide both theoretical and practical implications for crime theories and crime prevention efforts.

  • LETTER
    Qingfeng CHENG, Ting CHEN, Siqi MA, Xinghua LI
    Frontiers of Computer Science, 2022, 16(2): 162803. https://doi.org/10.1007/s11704-020-0194-7
  • RESEARCH ARTICLE
    Fei MENG, Leixiao CHENG, Mingqiang WANG
    Frontiers of Computer Science, 2021, 15(5): 155810. https://doi.org/10.1007/s11704-020-9472-7

    Attribute-based encryption with keyword search (ABKS) achieves both fine-grained access control and keyword search. However, in the previous ABKS schemes, the search algorithm requires that each keyword to be identical between the target keyword set and the ciphertext keyword set, otherwise the algorithm does not output any search result, which is not conducive to use. Moreover, the previous ABKS schemes are vulnerable to what we call a peer-decryption attack, that is, the ciphertext may be eavesdropped and decrypted by an adversary who has sufficient authorities but no information about the ciphertext keywords.

    In this paper, we provide a new system in fog computing, the ciphertext-policy attribute-based encryption with dynamic keyword search (ABDKS). In ABDKS, the search algorithm requires only one keyword to be identical between the two keyword sets and outputs the corresponding correlation which reflects the number of the same keywords in those two sets. In addition, our ABDKS is resistant to peer-decryption attack, since the decryption requires not only sufficient authority but also at least one keyword of the ciphertext. Beyond that, the ABDKS shifts most computational overheads from resource constrained users to fog nodes. The security analysis shows that the ABDKS can resist Chosen-PlaintextAttack (CPA) and Chosen-Keyword Attack (CKA).

  • RESEARCH ARTICLE
    Qiao XUE, Youwen ZHU, Jian WANG
    Frontiers of Computer Science, 2022, 16(3): 163806. https://doi.org/10.1007/s11704-020-0103-0

    The fast development of the Internet and mobile devices results in a crowdsensing business model, where individuals (users) are willing to contribute their data to help the institution (data collector) analyze and release useful information. However, the reveal of personal data will bring huge privacy threats to users, which will impede the wide application of the crowdsensing model. To settle the problem, the definition of local differential privacy (LDP) is proposed. Afterwards, to respond to the varied privacy preference of users, researchers propose a new model, i.e., personalized local differential privacy (PLDP), which allow users to specify their own privacy parameters. In this paper, we focus on a basic task of calculating the mean value over a single numeric attribute with PLDP. Based on the previous schemes for mean estimation under LDP, we employ PLDP model to design novel schemes (LAP, DCP, PWP) to provide personalized privacy for each user. We then theoretically analysis the worst-case variance of three proposed schemes and conduct experiments on synthetic and real datasets to evaluate the performance of three methods. The theoretical and experimental results show the optimality of PWP in the low privacy regime and a slight advantage of DCP in the high privacy regime.

  • RESEARCH ARTICLE
    Xia-an BI, Yiming XIE, Hao WU, Luyun XU
    Frontiers of Computer Science, 2021, 15(6): 156903. https://doi.org/10.1007/s11704-020-9520-3

    Mild cognitive impairment (MCI) as the potential sign of serious cognitive decline could be divided into two stages, i.e., late MCI (LMCI) and early MCI (EMCI). Although the different cognitive states in the MCI progression have been clinically defined, effective and accurate identification of differences in neuroimaging data between these stages still needs to be further studied. In this paper, a new method of clustering-evolutionary weighted support vector machine ensemble (CEWSVME) is presented to investigate the alterations from cognitively normal (CN) to EMCI to LMCI. The CEWSVME mainly includes two steps. The first step is to build multiple SVM classifiers by randomly selecting samples and features. The second step is to introduce the idea of clustering evolution to eliminate inefficient and highly similar SVMs, thereby improving the final classification performances. Additionally, we extracted the optimal features to detect the differential brain regions in MCI progression, and confirmed that these differential brain regions changed dynamically with the development of MCI. More exactly, this study found that some brain regions only have durative effects on MCI progression, such as parahippocampal gyrus, posterior cingulate gyrus and amygdala, while the superior temporal gyrus and the middle temporal gyrus have periodic effects on the progression. Our work contributes to understanding the pathogenesis of MCI and provide the guidance for its timely diagnosis.

  • RESEARCH ARTICLE
    Haixia ZHAO, Yongzhuang WEI
    Frontiers of Computer Science, 2022, 16(3): 163805. https://doi.org/10.1007/s11704-020-0182-y

    Highly nonlinear resilient functions play a crucial role in nonlinear combiners which are usual hardware oriented stream ciphers. During the past three decades, the main idea of construction of highly nonlinear resilient functions are benefited from concatenating a large number of affine subfunctions. However, these resilient functions as core component of ciphers usually suffered from the guess and determine attack or algebraic attack since the n-variable nonlinear Boolean functions can be easily given rise to partial linear relations by fixing at most n/2 variables of them. How to design highly nonlinear resilient functions (S-boxes) without concatenating a large number of n/2 variables affine subfunctions appears to be an important task. In this article, a new construction of highly nonlinear resilient functions is proposed. These functions consist of two classes subfunctions. More specially, the first class (nonlinear part) contains both the bent functions with 2 k variables and some affine subfunctions with n/2 − k variables which are attained by using [ n/2 − k, m, d] disjoint linear codes. The second class (linear part) includes some linear subfunctions with n/2 variables which are attained by using [ n/2, m, d] disjoint linear codes. It is illustrated that these resilient functions have high nonlinearity and high algebraic degree. In particular, It is different from previous well-known resilient S-boxes, these new S-boxes cannot be directly decomposed into some affine subfunctions with n/2 variables by fixing at most n/2 variables. It means that the S-boxes (vectorial Boolean functions) which use these resilient functions as component functions have more favourable cryptography properties against the guess and determine attack or algebraic attacks.

  • RESEARCH ARTICLE
    Zaheer Ullah KHAN, Dechang PI, Shuanglong YAO, Asif NAWAZ, Farman ALI, Shaukat ALI
    Frontiers of Computer Science, 2021, 15(6): 156904. https://doi.org/10.1007/s11704-020-9504-3

    Enhancers are short DNA cis-elements that can be bound by proteins (activators) to increase the possibility that transcription of a particular gene will occur. The Enhancers perform a significant role in the formation of proteins and regulating the gene transcription process. Human diseases such as cancer, inflammatory bowel disease, Parkinson’s, addiction, and schizophrenia are due to genetic variation in enhancers. In the current study, we havemade an effort by building, amore robust and novel computational a bi-layered model. The representative feature vector was constructed over a linear combination of six features. The optimum Hybrid feature vector was obtained via the Novel Cascade Multi-Level Subset Feature selection (CMSFS) algorithm. The first layer predicts the enhancer, and the secondary layer carries the prediction of their subtypes. The baseline model obtained 87.88% of accuracy, 95.29% of sensitivity, 80.47% of specificity, 0.766 of MCC, and 0.9603 of a roc value on Layer-1. Similarly, the model obtained 68.24%, 65.54%, 70.95%, 0.3654, and 0.7568 as an Accuracy, sensitivity, specificity, MCC, and ROC values on layer-2 respectively. Over an independent dataset on layer-1, the piEnPred secured 80.4% accuracy, 82.5% of sensitivity, 78.4% of specificity, and 0.6099 as MCC, respectively. Subsequently, the proposed predictor obtained 72.5% of accuracy, 70.0% of sensitivity, 75% of specificity, and 0.4506 of MCC on layer-2, respectively. The proposed method remarkably performed in contrast to other state-of-the-art predictors. For the convenience of most experimental scientists, a user-friendly and publicly freely accessible web server@/bienhancer dot pythonanywhere dot com has been developed.

  • REVIEW ARTICLE
    Hong QIAN, Yang YU
    Frontiers of Computer Science, 2021, 15(6): 156336. https://doi.org/10.1007/s11704-020-0241-4

    Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-andupdating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.

  • REVIEW ARTICLE
    Xiangmao MENG, Wenkai LI, Xiaoqing PENG, Yaohang LI, Min LI
    Frontiers of Computer Science, 2021, 15(6): 156902. https://doi.org/10.1007/s11704-020-8179-0

    In the post-genomic era, proteomics has achieved significant theoretical and practical advances with the development of high-throughput technologies. Especially the rapid accumulation of protein-protein interactions (PPIs) provides a foundation for constructing protein interaction networks (PINs), which can furnish a new perspective for understanding cellular organizations, processes, and functions at network level. In this paper, we present a comprehensive survey on three main characteristics of PINs: centrality, modularity, and dynamics. 1) Different centrality measures, which are used to calculate the importance of proteins, are summarized based on the structural characteristics of PINs or on the basis of its integrated biological information; 2) Different modularity definitions and various clustering algorithms for predicting protein complexes or identifying functional modules are introduced; 3) The dynamics of proteins, PPIs and sub-networks are discussed, respectively. Finally, the main applications of PINs in the complex diseases are reviewed, and the challenges and future research directions are also discussed.

  • RESEARCH ARTICLE
    Xiaobing SUN, Tianchi ZHOU, Rongcun WANG, Yucong DUAN, Lili BO, Jianming CHANG
    Frontiers of Computer Science, 2021, 15(6): 156212. https://doi.org/10.1007/s11704-020-9441-1

    Machine learning (ML) techniques and algorithms have been successfully and widely used in various areas including software engineering tasks. Like other software projects, bugs are also common in ML projects and libraries. In order to more deeply understand the features related to bug fixing in ML projects, we conduct an empirical study with 939 bugs from five ML projects by manually examining the bug categories, fixing patterns, fixing scale, fixing duration, and types of maintenance. The results show that (1) there are commonly seven types of bugs in ML programs; (2) twelve fixing patterns are typically used to fix the bugs in ML programs; (3) 68.80% of the patches belong to micro-scale-fix and small-scale-fix; (4) 66.77% of the bugs in ML programs can be fixed within one month; (5) 45.90% of the bug fixes belong to corrective activity from the perspective of software maintenance. Moreover, we perform a questionnaire survey and send them to developers or users of ML projects to validate the results in our empirical study. The results of our empirical study are basically consistent with the feedback from developers. The findings from the empirical study provide useful guidance and insights for developers and users to effectively detect and fix bugs in MLprojects.