Introduction
Single-cell genomics has become a prosperous research field because of its numerous advantages over conventional methods. Traditional sequencing technologies typically involve heterogeneous populations, thus masking the differences among individual cells (
Geschwind and Konopka,2009). In the field of cancer biology, for example, the differences among individual cells in a tumor tissue have already been confirmed decades ago (
Farago and Chester,1961), in particular with regards to genetic heterogeneity (
Torres et al.,2007;
Park et al.,2010). According to the cancer stem cell (CSC) theory (
Kleinsmith and Pierce,1964;
Hamburger and Salmon,1977;
Reya et al.,2001), only a small fraction of tumor stem cells lead to recurrence of the tumor after chemotherapy and/or radiotherapy (
Chaffer and Weinberg,2011). Study of this small fraction of cells by single-cell analysis would be of great clinical importance. Several recent studies have been focused on the individual differences among cells, including single nucleotide variant (SNV) (
Clark et al.,2010), gene copy number variation (CNV) (
Zong et al.,2012) and expression profiling (
Tan et al.,2013). In addition to tumor research, single cell genomics becomes a valuable tool to study the biologic differences between the neighboring cells during embryonic development (
Xue et al.,2013;
Zandi et al.,2012), cell proliferation (
Lecault et al.,2011), differentiation (
Galán e t al.,2010), reprogramming (
Buganim et al.,2012), and aging (
Frontera et al.,2012). In short, single-cell genomics has the capability to uncover enormous cellular heterogeneity and yield novel biologic mechanisms at a single cell level.
The amplification of nucleic acid is an essential step for single-cell genomics because DNA has to be amplified well over a thousand fold from a single cell that contains limited amount of genetic materials, e.g., a normal human somatic cell contains approximately 6 pg of genomic DNA and ~100 pg of total RNA of which ~1 pg is mRNA. Conventional DNA microarray and deep sequencing technologies generally require micrograms of DNA/RNA as a starting material. A significant problem is that the commonly used PCR amplification introduces high error rate due to low fidelity of DNA polymerase, so that the altered ratio of amplified PCR products generates biased sequencing results (
Aird et al.,2011). To address the problem with respect to single-cell genomics, great progress has been made recently including isolation and amplification of DNA/RNA from a single or small number of cells followed by whole genome sequencing, RNA sequencing as well as epigenetic profiling. In this mini-review, we introduce the technique and applications of single-cell genomics in biomedical research. We also discuss how single cell analysis would revolutionize the field of cellular heterogeneity.
Isolation of single cell
The methods of single-cell separation and nucleotide extraction vary based on the characteristics of different cell types. Some cells, e.g., sperm and blood are in liquid form and are relatively easy to be isolated and collected. Traditional enzymatic digestion methods were developed to isolate cells of interest within compact tissues. However, these methods tend to lose the position information of a certain cell, which is important in research areas such as oncology. To overcome this problem, laser capture micro-dissection (LCM) was introduced since it does not alter or damage the morphological structure of the cell of interest as well as its surrounding cells. LCM cuts out and separates the cells from the adjacent tissues without contact using laser beam coupled into a microscope. Non-contact cutting ensures that the separated cells are free from instrument contamination and the accuracy reaches 1 μm (
Simone et al.,1998). LCM is an ideal method to collect selected cells for DNA, RNA and protein analyses. It can be performed on many different tissue samples including cell cultures, solid tissues, and frozen and paraffin embedded archival tissues. For example, Guo
et al developed a protocol of LCM-mediated single nucleus purification for Class III gastric cancer tissue sample. The successful detection of a point mutation on the
PSCA gene by PCR restriction fragment-length polymorphism (PCR-RFLP) demonstrated that this method can be effectively coupled with the downstream single cell polymorphism (SNP) analysis (
Guo et al.,2012).
In addition to LCM, a variety of manual or mechanical cutting techniques at the level of microstructure is widely used. In 1998, Lee
et al isolated tumor cells from adjacent tissue in paraffin embedded archival tissues without contamination, by developing a precise microdissection device that contains a disposable needle, a conventional microscope, and a micromanipulator (
Lee et al.,1998). Many other methods have been particularly designed to adapt to specific tissues. When studying colon cancer, Dalerba
et al separated single tumor cells by flow cytometry using fluorescence-activated cell sorting (FACS) after immuno-staining against the cell surface marker (
Dalerba et al.,2011). In analysis of cell-lineage commitment, blastomeres during human early embryonic development were separated by micromanipulation (
Galán et al.,2010). With regard to single microbial cell separation, the conventional methods such as micro fluidics and serial dilution are used (
Schoenborn et al.,2004). Micro-fluidics method can be used for high-throughput isolation of single cell analysis as demonstrated by the recent Fluidigm C1 machine (Fluidigm Corporation) (
Wu et al.,2013), which utilizes disposable 96-chamber collection disc to integrate all the upstream procedures for gene expression profiling, such as cell capture, wash, verification, cell lysis, reverse transcription and amplification. Fluidigm C1 is based on the patented NanoFlex valves. These valves are fabricated by sealing different layers of elastomeric rubber through Multilayer Soft Lithography (MSL) (
Unger et al.,2000). Pressurized gas in one layer results in rubber deflection on the other layer so that to control on and off of the fluid flow in the disc and thus enable separation of single cells.
New sequencing technology
The first generation of sequencing technology includes the Maxam-Gilbert (chemical cleavage) method and the Sanger (dideoxy) method, of which the Sanger method dominated the laboratories in late 20th century for its convenience on benchtop. Since the new millennium, next-generation sequencing (NGS) technology, also known as deep sequencing, has been developed achieving millions of sequencing reads in parallel. NGS includes second-generation sequencing and now third-generation sequencing, indicating the evolution of NGS technology. Second-generation sequencing is referred to as sequencing by combination of optical detection and nucleoside triphosphates wash (
Niedringhaus et al.,2011). It employs sequencing-by-synthesis or sequencing-by-ligation technology, and branches into three major commercial platforms i.e., Illumina HiSeq2000, Roche 454 and ABI SOLiD platform (
Mardis,2008;
Morozova and Marra,2008;
Shendure and Ji,2008;
Metzker,2010). Despite using different instruments and specific chemical reactions, all three platforms have the capability to achieve millions of sequencing reads in parallel efficiently. These three platforms have claimed their individual merit, e.g., HiSeq2000 is most cost-effective; 454 is capable of sequencing the longest fragments; SOLiD has claimed high precision. In practice, Illumina/HiSeq2000 is currently the most widely used platform among the three.
Third-generation sequencing introduces state-of-the-art technologies and aims to sequencing human genome under $1000 (
Niedringhaus et al.,2011). Ion Torrent Personal Genome Machine (PGM) is a third-generation sequencing technology that sequences short-reads (about 100 bp in length) in a rapid and economical way (
Rothberg et al.,2011). The principle of this method is to measure, as a pH meter, the proton release during dNTP incorporation in DNA synthesis. In DNA library construction step, DNA fragments with specific adapters are amplified onto the beads. Then DNA templates on the beads, along with sequencing primers and DNA polymerase are loaded into sensor wells to be sequenced. In sequencing step, dNTP is provided sequentially so that the proton release from each nucleotide-incorporation reaction is recorded by an ion sensor, with the signal strength proportional to total number of incorporated nucleotides. The recent technical breakthrough, single-molecule real-time DNA sequencing (also known as SMRT), designed by Pacific Biosciences Company is also considered as a third-generation sequencing technology. SMRT is performed on a single DNA molecule and takes full advantage of the high catalytic ability and sustained synthesis ability of the DNA polymerase. Four types of nucleoside triphosphate are labeled with different fluorescent dyes on the g-phosphate, so the reading signals are generated when certain nucleoside triphosphate is incorporated along the template. The fluorescent label is then cleaved off by the polymerase during DNA extension. SMRT has many important advantages. For example, it has low error rate and high sequencing speed and can sequence long DNA molecule. Besides, by analyzing the difference of DNA synthesis interval, it can determine the secondary structure of DNA molecule and also certain modifications such as DNA methylation (
Feng et al.,2013). Overall, single-molecule sequencing technology such as SMRT promises great application prospects by avoiding the bias from PCR amplification (
Eid et al.,2009). However, it still remains a major hurdle to reach a high sequencing speed while optimizing accuracy and reducing the cost for all the third-generation sequencing platforms.
DNA amplification
DNA library preparation for the major sequencing platforms shares similar procedures, including DNA fragmentation, end repair, 3′-end adenylation, ligation with specific adapters, ligation mediated-PCR, purification, quality control, etc. Single-cell genomics requires a specific step of whole-genome amplification (WGA) in DNA library preparation, which generates a sufficient amount of DNA from an individual cell for sequencing (
Hanson and Ballantyne,2005). PCR-based WGA is a popular approach nowadays. However, this method could generate amplification bias, leading to low genome coverage and skewed sequencing results.
Toward improving the capability of the traditional WGA, multiple displacement amplification (MDA) was newly developed by using strand-displacement F29 polymerase (
Dean et al.,2002). The reaction can be carried out at a moderate isothermal condition near room temperature, therefore does not require a thermocycler. The hexamer primers are usually thiophosphate-modified at the 3′ end to convey resistance to the 3′–5′ exonuclease activity of Ф29 DNA polymerase (
Luthra and Medeiros,2004;
Hutchison et al.,2005). During MDA, hyper-branched DNA structures are formed when primers continuously bind the newly formed DNA strands synthesized by strand displacement (
Paez et al.,2004). MDA has been widely used in analysis of SNV, CNV, sequence tagged sites (STS) and even methylation correlation detection (
Tzvetkov et al.,2005;
Corneveaux et al.,2007;
Hughes and Jones,2007;
Ling et al.,2012). Although it has many advantages over the traditional PCR-based WGA, MDA still has insurmountable problems such as significant non-specific amplification and sequence deviation.
A new approach, multiple annealing and looping-based amplification cycles (MALBAC) was recently reported to be a potentially superior method over MDA (
Lu et al.,2012;
Zong et al.,2012). The amplification is based on a pool of random primers which share common 27-nucleotide sequences but with 8 variable nucleotides. This type of primers can be evenly distributed on the template, and ultimately results in the quasi-linear pre-amplification instead of exponential amplification by PCR. MALBAC achieves high genome coverage by quasi-linear pre-amplification of the genetic material from an individual cell. However, it remains to be verified whether amplification bias is significantly reduced compared to current WGA and MDA methods.
Single cell analysis of SNV and point mutation
Using the MALBAC method, Zong
et al detected CNVs of an individual cancer cell and identified SNVs within three kindred cells, which showed higher resolution than MDA (
Zong et al.,2012). They also mapped recombination events in 99 sperm from an Asian male by using MALBAC (
Lu et al.,2012). With SNPs as markers to map the positions of crossovers in each sperm, they found that the reduction of recombination rate near transcription start sites (TSSs) results from the intrinsic mechanism during meiosis rather than natural selection (
Lu et al.,2012). The sequencing results of the sperm also showed that in 5% of the sperm aneuploidy causes serious congenital birth defects (
Lu et al.,2012). Similarly, Wang
et al carried out microfluidic single-sperm WGA and studied the recombination hotspots in 91 human sperms (
Wang et al.,2012). They found that there are on average 30 mutations per million base pairs between generations, which is slightly higher than reported previously (
McVean et al.,2004).
The fast developing single-cell genomics has already provided guidance for the early diagnosis and personalized treatment for cancer. Since genomic variation is assumed to be the fundamental cause of a tumor, an accurate and in-depth analysis of tumor cell by single-cell genomics will allow researchers to locate cancer gene mutations and to identify the genotypes of certain cancer cell type. SNP, as the most common genomic variation, is considered as risk marker correlated with cancer. Single-cell genomics provides much more detailed information than conventional PCR and microarray data. By using single-nucleus sequencing, Navin
et al analyzed 100 cells from breast cancer, categorizing three clonal sub-populations, and tracing the cell type that seeded a liver metastasis (
Navin et al.,2011). In addition, exome sequencing can access information on single-nucleotide mutation at single-cell level. By single-cell exome-sequencing, for example, Xu
et al compared the genetic information of the individual tumor cells with their surrounding normal cells in the clear cell renal cell carcinoma (ccRCC) patient (
Xu et al.,2012). They were able to pinpoint several potential causative genes for ccRCC and revealed different mutation mechanisms in these genes (
Xu et al.,2012). Meanwhile, another exome-sequencing study on essential thrombocythemia (ET) was performed by Hou
et al (
Hou et al.,2012), in which they identified the origin of the 58 single cells from the patients and characterized mutations on
SESN2 and
NTRK1 with combination of MDA, Hiseq2000 sequencing and SNP calling. Recently, Voet
et al developed single-cell paired-end sequencing method and analyzed the genomic variations in a variety of samples, including breast cancer cells, human zygote and blastomeres (
Voet et al.,2013). In accordance with the previous findings, their data supported that MDA followed by sequencing is still a robust tool for SNP calling although bias is inevitably introduced by the DNA amplification step. Another important application for SNP analysis by single-cell genomics is to facilitate distinguishing transcripts that are paternally and maternally inherited, i.e. halotyping. Fan
et al developed a method called direct deterministic phasing (DDP) to separate single chromosome at metaphase from a single cell in a microfluidic disc (
Fan et al.,2011). The authors performed the whole-genome halotyping in four individuals, implying its clinical application in autoimmune disease and transplantation by determination of human leukocyte antigen (HLA) haplotypes.
Transcriptome analysis
Individual cells have their unique transcriptomes that reflect expression of a subset of genes, although they share the same genotype. Theoretically, homogeneous cells with the same DNA may have very different transcriptomes even if they appear to be identical physiologically and morphologically (
Li and Clevers,2010). Cells and their neighboring cells have been shown to contain differentiated RNA transcripts, let alone the cells from different tissues, organs and systems.
The newly developed techniques, e.g., single-cell cDNA microarray and RNA-seq, attempted to detect both known and unknown transcripts with relatively precise quantitation in a single cell. The cell fate commitment and embryo genome activation was analyzed by single-cell cDNA microarray using 5- to 8-cell human embryos (
et Galánal.,2010). By coupling SOLiD sequencing system with improved amplification methods, mRNA-seq became a convenient platform to perform transcriptome analysis especially in the field of early embryonic development (
Tang et al.,2009). However, it is well known that many RNA species, such as mRNA, rRNA, tRNA and small nuclear RNA, undergo multiple variants due to alternative RNA processing (
He,2010). Thus, new single-cell transcriptome methods have been developed to improve the efficiency of detecting different transcript variants which are derived from different alleles and/or different RNA processing (
Ramsköld et al.,2012). Single-cell tagged reverse transcription (STRT) (
Islam et al.,2011) and switching mechanism at the 5′end of the RNA template (SMART) (
Ramsköldet al.,2012) take advantage of the “template switching” property of Moloney murine leukemia virus (MMLV) reverse transcriptase. After priming with oligo(dT), MMLV reverse transcriptase adds a tail of cytosines to the end of the first-strand cDNA. A helper oligonucleotide with guanine motif can anneal with the cytosine tail. MMLV reverse transcriptase then switches template and copies the helper oligo, which results in arbitrary ends on both ends allowing downstream DNA amplification that covers full length transcripts. An alternative approach is cell expression by linear amplification and sequencing (CEL-seq) (
Hashimshony et al.,2012), which theoretically aims to reduce the bias in the amplification methods such as PCR by inserting an in vitro transcription step.
To generate single-cell transcriptome after single-cell isolation, the protocol by Tang
et al is to convert mRNAs into first-stand cDNA using a universal primer with oligo(dT). After reverse transcription, a poly(A) tail is added to the 3′ ends of the first-stand cDNA so that the second-strand cDNA is amplified by another universal primer with oligo(dT). Then the cDNA pool is amplified by PCR with both universal primers prior to sequencing and further analysis (
Tang et al.,2011). However, the amplification steps often generate biased representation of mRNA, and lead to discriminated sequencing results. To solve this problem, Pan
et al developed two RNA amplification procedures to obtain full-length coverage of expressed mRNA from low quantities of cells or even single cell. These two methods demonstrated neither 3′ biased nor variable transcript representation in the previous methods (
Pan et al.,2013). Briefly, one method utilized F29 DNA polymerase-based mRNA transcriptome amplification (PMA) to potentially capture all the end sequences of cDNA by intramolecular ligation, which is adapted from the previous whole DNA-pool amplification procedure (
Pan et al.,2008). Meanwhile, the authors also developed semi-random primed PCR-based mRNA transcriptome amplification[SRP-transcriptome mRNA amplification (SMA)] by adapting from nano-ChIP-seq. The characteristic signature of SMA is to use the hairpin-containing semi-random primers for the initial library construction following cDNA reverse transcription. Both PMA and SMA were demonstrated to have high applicability in cDNA sequencing coupled with high-throughput sequencing technology. Although both methods are able to retrieve full-length coverage, SMA is considered to be superior to PMA in many respects. To date, a recent method Quartz-seq optimized the conditions for PCR based single-cell RNA-seq by eliminating byproduct contamination and using a robust polymerase for one-tube reaction (
Sasagawa et al.,2013).
Single-cell genomics holds great promise for analysis of gene expression variation at single-cell resolution in many research fields, such as immune response and preimplantation embryo development. By exploring the response of bone-marrow-derived dendritic cells to lipopolysaccharide (LPS), Shalek
et al recently elucidated the extensive bimodal expression of immune genes and differentiated mRNA splicing patterns via single-cell RNA-seq (
Shalek et al.,2013). Our group recently performed a comprehensive analysis of the transcriptome during human and mouse preimplantation by taking advantage of single-cell RNA-seq (
Xue et al.,2013), This work showed that the genetic programs in mammalian preimplantation development are evolutionarily conserved between the species, with the exception of developmental timing. By analyzing the allelic patterns of paternal and maternal gene transcripts, our analysis also revealed single nucleotide variants as well as synonymous and deleterious variants in a subset of protein coding genes. Our study paved the way for using single-cell RNA-seq in preimplantation genetic diagnosis and IVF technology.
DNA methylation
DNA methylation is referred to as the addition of a methyl group to the cytosine of DNA in vertebrates, which plays an important role in many cellular events. The base 5-hydroxymethycytosine (5hmC) and 5-methylcytosine (5mC) are both important modifications involved in many cellular functions especially during early development (
Tahiliani et al.,2009;
Bhutani et al.,2011). Site-specific methylation plays an important regulatory role during human embryonic development (
Quenneville et al.2012;
Zhao et al.,2013). It is well known that cancer cells possess genome-wide hypomethylation and site-specific hypermethylation (
Esteller,2007;
Jones and Baylin,2007). Analysis of these epigenetic markers including site-specific methylation will enhance our understanding of cellular regulation mechanisms. Song
et al developed a novel method that selectively labels 5-hmC with the biotin tag followed by SMRT. Based on the incorporation time change in modified nucleotides, different DNA methylation structures were directly detected by monitoring the interpulse duration (IPD) (
Song et al.,2012). This method highlighted the bright prospects of SMRT as a third-generation sequencing technology, not only in genomic analysis but also in epigenetic analysis.
Drug candidate screening and biomarker discovery
As stated previously in the review, cellular heterogeneity generates “noise” information when studying a cell population. Before the era of single-cell genomics, all our results were obtained from mixed cell populations, which did not appreciate the unique property of an individual cell. Nowadays single-cell genomics has made the personalized medicine coming true. First of all, single-cell genomics requires very small sample size and thus is suitable for detection of rare cell-types and cells with very limited amount from the patients. Second, genomic DNA sequencing, transcriptome analysis and DNA modification analysis allow evaluating the efficacy of the drug candidates at different levels in the targeted cell compared to its neighbors. Last, the high-throughput advances allow single-cell genomics to retrieve more comprehensive information and to provide new biomarkers of human diseases. Pioneer investment on single-cell genomics has already been made in research on different cancer types (
Dalerba et al.,2011;
Navin et al.,2011;
Hou et al.,2012;
Xu et al.,2012;
Heitzer et al.,2013;
Voet et al.,2013) and preimplantation diagnosis (
Xue et al.,2013;
Yan et al.,2013). Its application has also been put on the agenda for other areas, such as neuronal diseases (
Iourov et al.,2012). Interestingly, single-cell genomics is promptly adopted into studying uncultured pathogenic microbes, including bacteria (
McLean et al.,2013) and virus (
Allen et al.,2011;
McWilliam Leitch and McLauchlan,2013).
In summary, the single-cell analysis has been and will be widely used in all aspects of functional genomics, such as the profiling of mRNA and non-coding RNA expression, DNA methylation and histone modification patterns. It is possible to reveal the cellular and molecular differences between each individual cell. To date, single-cell analysis has already shown its promising potential in the translational medicine fields, such as cancer diagnosis and in vitro fertilization (IVF). The rapid development of technology will permit single cell genomic analysis with higher speed, wider coverage, higher accuracy, more convenience, and lower cost. It will certainly become a powerful tool for basic biomedical research, clinical diagnosis, and drug discovery in the near future.
Higher Education Press and Springer-Verlag Berlin Heidelberg