Enhancers are short DNA cis-elements that can be bound by proteins (activators) to increase the possibility that transcription of a particular gene will occur. The Enhancers perform a significant role in the formation of proteins and regulating the gene transcription process. Human diseases such as cancer, inflammatory bowel disease, Parkinson’s, addiction, and schizophrenia are due to genetic variation in enhancers. In the current study, we havemade an effort by building, amore robust and novel computational a bi-layered model. The representative feature vector was constructed over a linear combination of six features. The optimum Hybrid feature vector was obtained via the Novel Cascade Multi-Level Subset Feature selection (CMSFS) algorithm. The first layer predicts the enhancer, and the secondary layer carries the prediction of their subtypes. The baseline model obtained 87.88% of accuracy, 95.29% of sensitivity, 80.47% of specificity, 0.766 of MCC, and 0.9603 of a roc value on Layer-1. Similarly, the model obtained 68.24%, 65.54%, 70.95%, 0.3654, and 0.7568 as an Accuracy, sensitivity, specificity, MCC, and ROC values on layer-2 respectively. Over an independent dataset on layer-1, the piEnPred secured 80.4% accuracy, 82.5% of sensitivity, 78.4% of specificity, and 0.6099 as MCC, respectively. Subsequently, the proposed predictor obtained 72.5% of accuracy, 70.0% of sensitivity, 75% of specificity, and 0.4506 of MCC on layer-2, respectively. The proposed method remarkably performed in contrast to other state-of-the-art predictors. For the convenience of most experimental scientists, a user-friendly and publicly freely accessible web server@/bienhancer dot pythonanywhere dot com has been developed.
Mild cognitive impairment (MCI) as the potential sign of serious cognitive decline could be divided into two stages, i.e., late MCI (LMCI) and early MCI (EMCI). Although the different cognitive states in the MCI progression have been clinically defined, effective and accurate identification of differences in neuroimaging data between these stages still needs to be further studied. In this paper, a new method of clustering-evolutionary weighted support vector machine ensemble (CEWSVME) is presented to investigate the alterations from cognitively normal (CN) to EMCI to LMCI. The CEWSVME mainly includes two steps. The first step is to build multiple SVM classifiers by randomly selecting samples and features. The second step is to introduce the idea of clustering evolution to eliminate inefficient and highly similar SVMs, thereby improving the final classification performances. Additionally, we extracted the optimal features to detect the differential brain regions in MCI progression, and confirmed that these differential brain regions changed dynamically with the development of MCI. More exactly, this study found that some brain regions only have durative effects on MCI progression, such as parahippocampal gyrus, posterior cingulate gyrus and amygdala, while the superior temporal gyrus and the middle temporal gyrus have periodic effects on the progression. Our work contributes to understanding the pathogenesis of MCI and provide the guidance for its timely diagnosis.
In the post-genomic era, proteomics has achieved significant theoretical and practical advances with the development of high-throughput technologies. Especially the rapid accumulation of protein-protein interactions (PPIs) provides a foundation for constructing protein interaction networks (PINs), which can furnish a new perspective for understanding cellular organizations, processes, and functions at network level. In this paper, we present a comprehensive survey on three main characteristics of PINs: centrality, modularity, and dynamics. 1) Different centrality measures, which are used to calculate the importance of proteins, are summarized based on the structural characteristics of PINs or on the basis of its integrated biological information; 2) Different modularity definitions and various clustering algorithms for predicting protein complexes or identifying functional modules are introduced; 3) The dynamics of proteins, PPIs and sub-networks are discussed, respectively. Finally, the main applications of PINs in the complex diseases are reviewed, and the challenges and future research directions are also discussed.
The Wireless Sensor Networks (WSNs) used for the monitoring applications like pipelines carrying oil, water, and gas; perimeter surveillance; border monitoring; and subway tunnel monitoring form linearWSNs. Here, the infrastructure being monitored inherently forms linearity (straight line through the placement of sensor nodes). Therefore, suchWSNs are called linear WSNs. These applications are security critical because the data being communicated can be used for malicious purposes. The contemporary research of WSNs data security cannot fit in directly to linear WSN as only by capturing few nodes, the adversary can disrupt the entire service of linear WSN. Therefore, we propose a data aggregation scheme that takes care of privacy, confidentiality, and integrity of data. In addition, the scheme is resilient against node capture attack and collusion attacks. There are several schemes detecting the malicious nodes. However, the proposed scheme also provides an identification of malicious nodes with lesser key storage requirements. Moreover, we provide an analysis of communication cost regarding the number of messages being communicated. To the best of our knowledge, the proposed data aggregation scheme is the first lightweight scheme that achieves privacy and verification of data, resistance against node capture and collusion attacks, and malicious node identification in linear WSNs.
Removing the smog from digital images is a challenging pre-processing tool in various imaging systems. Therefore, many smog removal (i.e., desmogging) models are proposed so far to remove the effect of smog from images. The desmogging models are based upon a physical model, it means it requires efficient estimation of transmission map and atmospheric veil from a single smoggy image. Therefore, many prior based restoration models are proposed in the literature to estimate the transmission map and an atmospheric veil. However, these models utilized computationally extensive minimization of an energy function. Also, the existing restoration models suffer from various issues such as distortion of texture, edges, and colors. Therefore, in this paper, a convolutional neural network (CNN) is used to estimate the physical attributes of smoggy images. Oblique gradient channel prior (OGCP) is utilized to restore the smoggy images. Initially, a dataset of smoggy and sunny images are obtained. Thereafter, we have trained CNN to estimate the smog gradient from smoggy images. Finally, based upon the computed smog gradient, OGCP is utilized to restore the still smoggy images. Performance analyses reveal that the proposed CNN-OGCP based desmogging model outperforms the existing desmogging models in terms of various performance metrics.
Closely related to the safety and stability of power grids, stability analysis has long been a core topic in the electric industry. Conventional approaches employ computational simulation to make the quantitative judgement of the grid stability under distinctive conditions. The lack of in-depth data analysis tools has led to the difficulty in analytical tasks such as situation-aware analysis, instability reasoning and pattern recognition. To facilitate visual exploration and reasoning on the simulation data, we introduce WaveLines, a visual analysis approach which supports the supervisory control of multivariate simulation time series of power grids. We design and implement an interactive system that supports a set of analytical tasks proposed by domain experts and experienced operators. Experiments have been conducted with domain experts to illustrate the usability and effectiveness of WaveLines.
Many key-value stores use RDMA to optimize the messaging and data transmission between application layer and the storage layer, most of which only provide point-wise operations. Skiplist-based store can support both point operations and range queries, but its CPU-intensive access operations combined with the high-speed network will easily lead to the storage layer reaches CPU bottlenecks. The common solution to this problem is offloading some operations into the application layer and using RDMA bypassing CPU to directly perform remote access, but this method is only used in the hash tablebased store. In this paper, we present RS-store, a skiplist-based key-value store with RDMA, which can overcome the CPU handle of the storage layer by enabling two access modes: local access and remote access. In RS-store, we redesign a novel data structure R-skiplist to save the communication cost in remote access, and implement a latch-free concurrency control mechanism to ensure all the concurrency during two access modes. RS-store also supports client-active range query which can reduce the storage layer’s CPU consumption. At last, we evaluate RS-store on an RDMA-capable cluster. Experimental results show that RS-store achieves up to 2x improvements over RDMA-enabled RocksDB on the throughput and application’s scalability.
Network embedding, which targets at learning the vector representation of vertices, has become a crucial issue in network analysis. However, considering the complex structures and heterogeneous attributes in real-world networks, existing methods may fail to handle the inconsistencies between the structure topology and attribute proximity. Thus, more comprehensive techniques are urgently required to capture the highly non-linear network structure and solve the existing inconsistencies with retaining more information. To that end, in this paper, we propose a heterogeneous-attributes enhancement deep framework (HEDF), which could better capture the non-linear structure and associated information in a deep learningway, and effectively combine the structure information of multi-views by the combining layer. Along this line, the inconsistencies will be handled to some extent and more structure information will be preserved through a semi-supervised mode. The extensive validations on several real-world datasets show that our model could outperform the baselines, especially for the sparse and inconsistent situation with less training data.
Wandering is a significant indicator in the clinical diagnosis of dementia and other related diseases for elders. Reliable monitoring of long-term continuous movement in indoor setting for detection of wandering movement is challenging because most elders are prone to forget to carry or wear sensors that collect motion information daily due to their declining memory. Wi-Fi as an emerging sensing modality has been widely used to monitor human indoor movement in a noninvasive manner. In order to continuously monitor individuals’ indoor motion and reliably identify wandering movement in a non-invasive manner, in this work, we develop a LSTMbased deep classification method that is able to differentiate the wandering-causedWi-Fi signal change from the others. Specifically, we first use the off-the-shelf Wi-Fi devices to capture a resident’s indoor motion information, enabling to collect a group ofWi-Fi signal streams, which will be split into variablesize segments. Second, the deep network LSTM is adopted to develop wandering detection method that is able to classify every variable-size segment of Wi-Fi signals into categories according to the well-known wandering spatiotemporal patterns. Last, experimental evaluation conducted on a group of realworld Wi-Fi signal streams shows that our proposed LSTMbased detection method is workable and effective to identify indoor wandering behavior, obtaining an average value of 0.9286, 0.9618, 0.9634 and 0.9619 for accuracy, precision, recall and F-1 score, respectively.
Node order is one of the most important factors in learning the structure of a Bayesian network (BN) for probabilistic reasoning. To improve the BN structure learning, we propose a node order learning algorithmbased on the frequently used Bayesian information criterion (BIC) score function. The algorithm dramatically reduces the space of node order and makes the results of BN learning more stable and effective. Specifically, we first find the most dependent node for each individual node, prove analytically that the dependencies are undirected, and then construct undirected subgraphs UG. Secondly, the UG- is examined and connected into a single undirected graph UGC. The relation between the subgraph number and the node number is analyzed. Thirdly, we provide the rules of orienting directions for all edges in UGC, which converts it into a directed acyclic graph (DAG). Further, we rank the DAG’s topology order and describe the BIC-based node order learning algorithm. Its complexity analysis shows that the algorithm can be conducted in linear time with respect to the number of samples, and in polynomial time with respect to the number of variables. Finally, experimental results demonstrate significant performance improvement by comparing with other methods.
Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-andupdating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.
The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time. In reality, streaming data usually arrives out-of-order due to factors such as network delay. The data stream processing framework commonly adopts the watermark mechanism to address the data disorderedness. Watermark is a special kind of data inserted into the data stream with a timestamp, which helps the framework to decide whether the data received is late and thus be discarded. Traditional watermark generation strategies are periodic; they cannot dynamically adjust the watermark distribution to balance the responsiveness and accuracy. This paper proposes an adaptive watermark generation mechanism based on the time series prediction model to address the above limitation. This mechanism dynamically adjusts the frequency and timing of watermark distribution using the disordered data ratio and other lateness properties of the data stream to improve the system responsiveness while ensuring acceptable result accuracy. We implement the proposed mechanism on top of Flink and evaluate it with realworld datasets. The experiment results show that our mechanism is superior to the existing watermark distribution strategies in terms of both system responsiveness and result accuracy.
Machine learning (ML) techniques and algorithms have been successfully and widely used in various areas including software engineering tasks. Like other software projects, bugs are also common in ML projects and libraries. In order to more deeply understand the features related to bug fixing in ML projects, we conduct an empirical study with 939 bugs from five ML projects by manually examining the bug categories, fixing patterns, fixing scale, fixing duration, and types of maintenance. The results show that (1) there are commonly seven types of bugs in ML programs; (2) twelve fixing patterns are typically used to fix the bugs in ML programs; (3) 68.80% of the patches belong to micro-scale-fix and small-scale-fix; (4) 66.77% of the bugs in ML programs can be fixed within one month; (5) 45.90% of the bug fixes belong to corrective activity from the perspective of software maintenance. Moreover, we perform a questionnaire survey and send them to developers or users of ML projects to validate the results in our empirical study. The results of our empirical study are basically consistent with the feedback from developers. The findings from the empirical study provide useful guidance and insights for developers and users to effectively detect and fix bugs in MLprojects.
As the mean-time-between-failures (MTBF) continues to decline with the increasing number of components on large-scale high performance computing (HPC) systems, program failures might occur during the execution period with high probability. Ensuring successful execution of the HPC programs has become an issue that the unprivileged users should be concerned. From the user perspective, if the program failure cannot be detected and handled in time, it would waste resources and delay the progress of program execution. Unfortunately, the unprivileged users are unable to perform program state checking due to execution control by the job management system as well as the limited privilege. Currently, automated tools for supporting user-level failure detection and autorecovery of parallel programs in HPC systems are missing. This paper proposes an innovative method for the unprivileged user to achieve failure detection of job execution and automatic resubmission of failed jobs. The state checker in our method is encapsulated as an independent job to reduce interference with the user jobs. In addition, we propose a dual-checker mechanism to improve the robustness of our approach.We implement the proposed method as a tool named automatic re-launcher (ARL) and evaluate it on the Tianhe-2 system. Experiment results show that ARL can detect the execution failures effectively on Tianhe-2 system. In addition, the communication and performance overhead caused by ARL is negligible. The good scalability of ARL makes it applicable for large-scale HPC systems.