Mar 2020, Volume 8 Issue 1

Cover illustration

  • The CRISPR/Cas9 system has shown great potential in functional genomic screening by introducing short indels in protein-coding genes. However, short indel is usally not sufficient to generate loss-of-function of non-coding genomic element. In this issue, Tao et al. propose a strategy to construct a library of paired sgRNA expressing plasmids that can be used to efficiently generate chromosomal deletions, providing a scalable method for functional study of non-coding elements [Detail] ...

  • Select all
    Yadan Huang, Yimeng Ye, Jiang Zhang
    Baizhu Chen, Zhuojun Dai

    Background: Synthetic biology has attracted enormous attention in recent years. A key focus of synthetic biology is to utilize modular biological building blocks to assemble the cell-based circuits.

    Results: Scientists have programmed the living organisms using these circuits to attain multiple, delicate and well-defined functions. With the integration of tools or technologies from other disciplines, these rewired cells can achieve even more complex tasks.

    Conclusions: In this review, we will focus on the recent achievements in new materials and devices assembly, next generation therapeutics development and versatile manufacturing by combining the synthetic gene circuits, various tools and technologies from multiple fields, such as printing technology, material engineering and electronic engineering.

    Shilei Zhao, Hua Chen

    Background: The coronavirus disease 2019 (COVID-19) is rapidly spreading in China and more than 30 countries over last two months. COVID-19 has multiple characteristics distinct from other infectious diseases, including high infectivity during incubation, time delay between real dynamics and daily observed number of confirmed cases, and the intervention effects of implemented quarantine and control measures.

    Methods: We develop a Susceptible, Un-quanrantined infected, Quarantined infected, Confirmed infected (SUQC) model to characterize the dynamics of COVID-19 and explicitly parameterize the intervention effects of control measures, which is more suitable for analysis than other existing epidemic models.

    Results: The SUQC model is applied to the daily released data of the confirmed infections to analyze the outbreak of COVID-19 in Wuhan, Hubei (excluding Wuhan), China (excluding Hubei) and four first-tier cities of China. We found that, before January 30, 2020, all these regions except Beijing had a reproductive number , and after January 30, all regions had a reproductive number , indicating that the quarantine and control measures are effective in preventing the spread of COVID-19. The confirmation rate of Wuhan estimated by our model is 0.0643, substantially lower than that of Hubei excluding Wuhan (0.1914), and that of China excluding Hubei (0.2189), but it jumps to 0.3229 after February 12 when clinical evidence was adopted in new diagnosis guidelines. The number of un-quarantined infected cases in Wuhan on February 12, 2020 is estimated to be 3,509 and declines to 334 on February 21, 2020. After fitting the model with data as of February 21, 2020, we predict that the end time of COVID-19 in Wuhan and Hubei is around late March, around mid March for China excluding Hubei, and before early March 2020 for the four tier-one cities. A total of 80,511 individuals are estimated to be infected in China, among which 49,510 are from Wuhan, 17,679 from Hubei (excluding Wuhan), and the rest 13,322 from other regions of China (excluding Hubei). Note that the estimates are from a deterministic ODE model and should be interpreted with some uncertainty.

    Conclusions: We suggest that rigorous quarantine and control measures should be kept before early March in Beijing, Shanghai, Guangzhou and Shenzhen, and before late March in Hubei. The model can also be useful to predict the trend of epidemic and provide quantitative guide for other countries at high risk of outbreak, such as South Korea, Japan, Italy and Iran.

    Bernard Mathey-Prevot, Bao-Tran Parker, Carolyn Im, Cierra Hong, Peng Dong, Guang Yao, Lingchong You

    Background: E2F1 protein, a major effector of the Rb/E2F pathway plays a central role in regulating cell-fate decisions involved in proliferation, apoptosis, and differentiation. Its expression is highly dynamic and tightly modulated through a combination of transcriptional, translational and posttranslational controls. However, the mechanisms by which its expression and activity can promote different cellular outcomes remain to be fully elucidated. To better document E2F1 expression in live cells, we have engineered a series of fluorescent E2F1 protein reporters that quantitatively capture E2F1 protein dynamics.

    Methods: Reporter constructs, under the control of the mouse or human E2F1 proximal promoter, were designed to express an E2F1-Venus fusion protein incapable of binding DNA. In addition, constructs either included or excluded the 3′ untranslated region (3′UTR) of the E2F1 gene. These constructs were introduced into fibroblasts and epithelial cells, and expression of the fusion reporter protein was validated and quantified in single cells using live imaging.

    Results: In all cases, expression of the reporter protein effectively recapitulated the behavior of E2F1 under various conditions, including cell cycle progression and genotoxic stress. No or little fluorescent signal of the reporter was detected in G0, but as the cycle progressed, expression of the reporter protein steadily increased in the nucleus, peaking a few hours before cell division, but declining to baseline 2–3 h prior to the onset of mitosis. The absence of the E2F1 3′UTR in the constructs led to considerably higher steady-state levels of the fusion protein, which although normally regulated, exhibited a slightly less complex dynamic profile during the cell cycle or genotoxic stress. Lastly, the presence or absence of Rb failed to impact the overall detection and levels of the reporter proteins.

    Conclusions: Our validated E2F1 protein reporters complement nicely other reporters of the Rb/E2F pathway and provide a unique tool to follow the complex dynamics of E2F1 expression in real time in single cells.

    Minzhen Tao, Qiaochu Mu, Yurui Zhang, Zhen Xie

    Background: Derived from an adaptive bacterial immune system, the clustered regularly interspaced palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9) system has shown great potential in high-throughput functional genomic screening, especially for protein-coding genes. However, it is still challenging to apply the similar strategy to study non-coding genomic elements such as long non-coding RNAs (lncRNAs) or clusters of microRNAs, because short insertions or deletions may not be sufficient to generate loss-of-function phenotypes.

    Methods: Here, we presented a systematic strategy for designing a CRISPR-based paired-sgRNA library for high-throughput screening in non-coding regions. Due to the abundance of lncRNAs and their diverse regulatory roles in vivo, we repurposed microarray datasets to select 600 highly expressed lncRNAs in non-small-cell lung cancer and designed two schemes for lncRNA deletion with ~20 paired-sgRNAs for each lncRNA. Through Golden-Gate assembly, we generated a pooled CRISPR-based library with a total of 12,878 sgRNA pairs.

    Results: Over 80% of paired-sgRNAs were recovered from final pooled library with a relatively even distribution. Cleavage efficiency of sgRNA pairs was validated through experiments of transient transfection and viral infection. Moreover, randomly selected paired-sgRNAs showed that efficient deletion of genomic DNA could be achieved with a deletion size within the range of 500 to 3000 bp.

    Conclusions: In summary, we have demonstrated a strategy to design and construct a pooled paired-sgRNA library to generate genomic deletion in the lncRNA regions, validated their deletion efficiency and explored the relationship of deletion efficiency with respect to deletion size. This method would be also suitable for investigation of other uncharacterized non-coding genomic regions in mammalian cells in an efficient and cost-effective manner.

    Jingxue Xin, Junjun Hao, Lang Chen, Tao Zhang, Lei Li, Luonan Chen, Wenmin Zhao, Xuemei Lu, Peng Shi, Yong Wang

    Background: Plateau zokor inhabits in sealed burrows from 2,000 to 4,200 meters at Qinghai-Tibet Plateau. This extreme living environment makes it a great model to study animal adaptation to hypoxia, low temperature, and high carbon dioxide concentration.

    Methods: We provide an integrated resource, ZokorDB, for tissue specific regulatory network annotation for zokor. ZokorDB is based on a high-quality draft genome of a plateau zokor at 3,300 m and its transcriptional profiles in brain, heart, liver, kidney, and lung. The conserved non-coding elements of zokor are annotated by their nearest genes and upstream transcriptional factor motif binding sites.

    Results: ZokorDB provides a general draft gene regulatory network (GRN), i.e., potential transcription factor (TF) binds to non-coding regulatory elements and regulates the expression of target genes (TG). Furthermore, we refined the GRN by incorporating matched RNA-seq and DNase-seq data from mouse ENCODE project and reconstructed five tissue-specific regulatory networks.

    Conclusions: A web-based, open-access database is developed for easily searching, visualizing, and downloading the annotation and data. The pipeline of non-coding region annotation for zokor will be useful for other non-model species. ZokorDB is free available at the website (

    Jie Shi, Xiangrui Zeng, Rui Jiang, Tao Jiang, Min Xu

    Background: Cryo-electron microscopy (Cryo-EM) and tomography (Cryo-ET) have emerged as important imaging techniques for studying structures of macromolecular complexes. In 3D reconstruction of large macromolecular complexes, many 2D projection images of macromolecular complex particles are usually acquired with low signal-to-noise ratio. Therefore, it is meaningful to select multiple images containing the same structure with identical orientation. The selected images are averaged to produce a higher-quality representation of the underlying structure with improved resolution. Existing approaches of selecting such images have limited accuracy and speed.

    Methods: We propose a simulated annealing-based algorithm (SA) to pick the homogeneous image set with best average. Its performance is compared with two baseline methods based on both 2D and 3D datasets. When tested on simulated and experimental 3D Cryo-ET images of Ribosome complex, SA sometimes stopped at a local optimal solution. Restarting is applied to settle this difficulty and significantly improved the performance of SA on 3D datasets.

    Results: Experimented on simulated and experimental 2D Cryo-EM images of Ribosome complex datasets respectively with SNR=10 and SNR=0.5, our method achieved better accuracy in terms of F-measure, resolution score, and time cost than two baseline methods. Additionally, SA shows its superiority when the proportion of homogeneous images decreases.

    Conclusions: SA is introduced for homogeneous image selection to realize higher accuracy with faster processing speed. Experiments on both simulated and real 2D Cryo-EM and 3D Cryo-ET images demonstrated that SA achieved expressively better performance. This approach serves as an important step for improving the resolution of structural recovery of macromolecular complexes captured by Cryo-EM and Cryo-ET.

    Jie Ren, Kai Song, Chao Deng, Nathan A. Ahlgren, Jed A. Fuhrman, Yi Li, Xiaohui Xie, Ryan Poplin, Fengzhu Sun

    Background: The recent development of metagenomic sequencing makes it possible to massively sequence microbial genomes including viral genomes without the need for laboratory culture. Existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences from metagenomic data.

    Methods: Here we developed a reference-free and alignment-free machine learning method, DeepVirFinder, for identifying viral sequences in metagenomic data using deep learning.

    Results: Trained based on sequences from viral RefSeq discovered before May 2015, and evaluated on those discovered after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths, achieving AUROC 0.93, 0.95, 0.97, and 0.98 for 300, 500, 1000, and 3000 bp sequences respectively. Enlarging the training data with additional millions of purified viral sequences from metavirome samples further improved the accuracy for identifying virus groups that are under-represented. Applying DeepVirFinder to real human gut metagenomic samples, we identified 51,138 viral sequences belonging to 175 bins in patients with colorectal carcinoma (CRC). Ten bins were found associated with the cancer status, suggesting viruses may play important roles in CRC.

    Conclusions: Powered by deep learning and high throughput sequencing metagenomic data, DeepVirFinder significantly improved the accuracy of viral identification and will assist the study of viruses in the era of metagenomics.

    Md. Bahadur Badsha, Rui Li, Boxiang Liu, Yang I. Li, Min Xian, Nicholas E. Banovich, Audrey Qiuyan Fu

    Background: Single-cell RNA-sequencing (scRNA-seq) is a rapidly evolving technology that enables measurement of gene expression levels at an unprecedented resolution. Despite the explosive growth in the number of cells that can be assayed by a single experiment, scRNA-seq still has several limitations, including high rates of dropouts, which result in a large number of genes having zero read count in the scRNA-seq data, and complicate downstream analyses.

    Methods: To overcome this problem, we treat zeros as missing values and develop nonparametric deep learning methods for imputation. Specifically, our LATE (Learning with AuToEncoder) method trains an autoencoder with random initial values of the parameters, whereas our TRANSLATE (TRANSfer learning with LATE) method further allows for the use of a reference gene expression data set to provide LATE with an initial set of parameter estimates.

    Results: On both simulated and real data, LATE and TRANSLATE outperform existing scRNA-seq imputation methods, achieving lower mean squared error in most cases, recovering nonlinear gene-gene relationships, and better separating cell types. They are also highly scalable and can efficiently process over 1 million cells in just a few hours on a GPU.

    Conclusions: We demonstrate that our nonparametric approach to imputation based on autoencoders is powerful and highly efficient.