Recent advancements in AI-based synthesis of small molecules have led to the creation of extensive databases, housing billions of small molecules. Given this vast scale, traditional quantum chemistry (QC) methods become inefficient for determining the chemical and physical properties of such an extensive array of molecules. To address this challenge, we present MetaGIN, a lightweight deep learning framework designed for efficient and accurate molecular property prediction.
While traditional GNN models with 1-hop edges (i.e., covalent bonds) are sufficient for abstract graph representation, they are inadequate for capturing 3D features. Our MetaGIN model shows that including 2-hop and 3-hop edges (representing bond and torsion angles, respectively) is crucial to fully comprehend the intricacies of 3D molecules. Moreover, MetaGIN is a streamlined model with fewer than 10 million parameters, making it ideal for fine-tuning on a single GPU. It also adopts the widely acknowledged MetaFormer framework, which has consistently shown high accuracy in many computer vision tasks.
In our experiments, MetaGIN achieved a mean absolute error (MAE) of 0.0851 with just 8.87M parameters on the PCQM4Mv2 dataset, outperforming leading techniques across several datasets in the MoleculeNet benchmark. These results demonstrate MetaGIN’s potential to significantly accelerate drug discovery processes by enabling rapid and accurate prediction of molecular properties for large-scale databases.
In recent decades, traditional drug research and development have been facing challenges such as high cost, long timelines, and high risks. To address these issues, many computational approaches have been proposed for predicting the relationship between drugs and diseases through drug repositioning, aiming to reduce the cost, development cycle and risks associated with developing new drugs. Researchers have explored different computational methods to predict drug-disease associations, including drug side effects-disease associations, drug-target associations, and miRNA-disease associations. In this comprehensive review, we focus on recent advances in predicting drug-disease association methods for drug repositioning. We first categorize these methods into several groups, including neural network-based algorithms, matrix-based algorithms, recommendation algorithms, link-based reasoning algorithms, and text mining and semantic reasoning. Then, we compare the prediction performance of existing drug-disease association prediction algorithms. Lastly, we discuss the current challenges and future perspectives in the field of drug-disease associations.
Discovering new drugs is a complicated, time-consuming, costly, risky and failure-prone process. However, about 80% of the drugs that have been approved so far are targeted at protein targets, and 99% of them only target specific proteins. This means that there are still a large number of protein targets that are considered “useless”. By exploring miRNA as a potential therapeutic target, we can expand the range of target selection and improve the efficiency of drug development. Therefore, it is of great significance to search for potential miRNA-drug interactions (MDIs) through reasonable computational methods. In this paper, a dual-channel network model, MDIDCN, based on Temporal Convolutional Network (TCN) and Bi-directional Long Short-Term Memory (BiLSTM), was proposed to predict MDIs. Specifically, we first used a known bipartite network to represent the interaction between miRNAs and drugs, and the graph embedding technique of BiNE was applied to learn the topological features of both. Secondly, we used TCN to learn the MACCS fingerprints of drugs, BiLSTM to learn the k-mer of miRNA, and concatenated the topological and structural features of the two together as their fusion features. Finally, the fusion features of miRNA and drug underwent max-pooling, and they were input into the Softmax layer to obtain the predicted scores of both, so as to obtain the potential miRNA-drug interaction pairs. In this paper, the prediction performance of the model was evaluated on three different datasets by using 5-fold cross-validation, and the average AUC were 0.9567, 0.9365, and 0.8975, respectively. In addition, case studies on the drugs Gemcitabine and hsa-miR-155-5p were also conducted in this paper, and the results showed that the model had high accuracy and reliability. In conclusion, the MDIDCN model can accurately and efficiently predict MDIs, which has important implications for drug development.
Synthetic binding proteins (SBPs) with small size, marked solubility and stability, and high affinity are important for protein-based research, treatment, and diagnostics. Over the last several decades, site-directed mutagenesis and directed evolution of privileged protein scaffold make up the great majority of SBPs. The groundbreaking advancement of deep learning (DL) in recent years has revolutionized the problem of protein structure prediction and design. Here, for the first time, the cutting-edge DL framework ProteinMPNN was applied to fulfill the de novo design of 7,245 new synthetic proteins covering 55 different scaffolds based on the original SBPs collected in our SYNBIP database. Comprehensive bioinformatics analysis indicated that, in addition to the excellent performance of sequence recovery, the designed synthetic proteins have a significant improvement in solubility and thermal stability compared to the currently known SBPs. Meanwhile, 8 incredibly suitable protein scaffolds for ProteinMPNN have been identified, from which the designed synthetic proteins calculate displayed good performance on binding ability to their corresponding protein targets. Therefore, the DL-based framework shown great potential in target-directed de novo generation of synthetic protein library with high quality, which could assist experimental biologists to rational protein engineering to discover novel functional protein binders.
Drug side effects have become paramount concerns in drug safety research, ranking as the fourth leading cause of mortality following cardiovascular diseases, cancer, and infectious diseases. Simultaneously, the widespread use of multiple prescription and over-the-counter medications by many patients in their daily lives has heightened the occurrence of side effects resulting from Drug-Drug Interactions (DDIs). Traditionally, assessments of drug side effects relied on resource-intensive and time-consuming laboratory experiments. However, recent advancements in bioinformatics and the rapid evolution of artificial intelligence technology have led to the accumulation of extensive biomedical data. Based on this foundation, researchers have developed diverse machine learning methods for discovering and detecting drug side effects. This paper provides a comprehensive overview of recent advancements in predicting drug side effects, encompassing the entire spectrum from biological data acquisition to the development of sophisticated machine learning models. The review commences by elucidating widely recognized datasets and Web servers relevant to the field of drug side effect prediction. Subsequently, The study delves into machine learning methods customized for binary, multi-class, and multi-label classification tasks associated with drug side effects. These methods are applied to a variety of representative computational models designed for identifying side effects induced by single drugs and DDIs. Finally, the review outlines the challenges encountered in predicting drug side effects using machine learning approaches and concludes by illuminating important future research directions in this dynamic field.