Importance of genetic manipulation in human pluripotent stem cells
Human pluripotent stem cells (hPSCs) harbor great promise to revolutionize regenerative medicine as well as advance our knowledge of basic biology (Yamanaka and Blau, 2010). In particular, human embryonic stem cells (hESCs) isolated from preimplantation embryos, such as morulae and blastocysts, can provide a unique glimpse into early developmental processes (
Yu and Thomson, 2008). To model congenital disorders, scientists have used preimplantation genetically diagnosed (PGD) embryos to isolate disease-specific hESCs, but the efficiency and availability of such embryos are limited (
Sermon et al., 2009;
Stephenson et al., 2009). To overcome these hurdles, somatic cell nuclear transfer (SCNT) has been proposed as a source of patient-matched pluripotent cells, and just recently has it been successfully demonstrated in human cells (
Tachibana et al., 2013). SCNT still faces technical and ethical barriers to immediate adoption, but it will likely play a greater role in the stem cell field as more laboratories implement the technology. Meanwhile, an alternative approach to changing cellular fate by ‘reprogramming’ has received much attention. The resulting human induced pluripotent stem cells (hiPSCs) are simply derived from human somatic cells via ectopic expression of several transcription factors. Because hiPSCs comprise a supply of potentially unlimited cells for human disease modeling, drug screening and autologous transplantation, they are currently the source of pluripotent human cells regarded to have the greatest promise (
Lengner et al., 2010). Numerous laboratories have already established the potential of hiPSCs for modeling and therapeutics, yet one major unresolved issue remains hiPSC modeling of congenital disorders resulting from genetic mutations. Correcting such genetic deficits is exceedingly difficult in hESCs and hiPSCs, which unlike mouse embryonic stem cells, are thought to have a very low efficiency of homologous recombination. In this review, we address the importance of site-specific genome modification and the technology that allows such repair in human pluripotent stem cells.
While precise genetic manipulation is crucial in most applications of hPSCs, the purpose of such manipulation can vary in the context of hESCs versus hiPSCs. We often use hESCs to understand developmental events, such as how cell types of interest can obtain their cellular fate during embryonic and fetal development, by dissecting molecular pathways and/or uncovering new transcription factors that play roles in different temporal and/or spatial windows. Genetic manipulation, in this context, can provide a powerful experimental tool through overexpression or downregulation of gene(s) of interest. In certain cases, simple transient transfection of a plasmid vector or transduction of a viral vector may suffice. However, it is often preferable to have a faithful and stable genetic reporter line to track gene expression or understand protein interactions during long-term hPSC differentiation when epigenetically dynamic events occur. As shown in mouse embryonic stem cells studies, precise gene targeting approaches allow real-time monitoring of gene expression or the purification of homogenous cell populations for biochemical analysis. This level of genetic manipulation requires precise integration of donor vectors into a target locus- only possible with homologous recombination mediated gene targeting.
Much of the excitement regarding hiPSCs is based upon the idea that autologous cells can be genetically corrected and transplanted into patients with genetic disorders. Certain loss of function mutations may be rescued by simple random integration gene delivery, but many mutations require a more precise method to eliminate the disease allele(s) and ensure a persistent and safe genetic correction. As seen in prior gene therapy cases, some gene delivery procedures, especially virus-mediated delivery, can yield long lasting results but come with a significant safety risk, such as intractable oncogenesis. There is a clear need for the development of efficient and precise gene targeting methods to restore the gene function, particularly in patient specific hPSC lines.
Previous efforts toward genetic manipulation in hPSCs
Genetic manipulation in hPSCs can be accomplished through several different approaches including transfection of plasmids and transduction of viral vectors. Plasmid transfection can be classified by two criteria, the transfection method and the transfected material (usually DNA, in this case). Relatively short plasmids, very large plasmids like Bacterial Artificial Chromosomes (BAC) (
Tomishima et al., 2007), and small hairpin structures are often introduced into hESCs or hiPSCs by electroporation (including nucleofection) or liposome/nanoparticle-mediated transfection. Transgene expression is often transientwith these methods because the DNA rarely integrates into the host cell’s chromosome. As such, drug selection is usually employed to select for the rare cells that have integrated the transgene into the genome, and in the absence of selection, the transfected nucleic acid is turned over, degraded, and cell division dilutes out any remaining DNA. In contrast, virus-mediated gene delivery (including retroviral, lentiviral, and adenoviral vectors) exhibits high transduction efficiency in hPSCs, yet these methods have their own caveats. Retro- and lentiviral vectors exhibit long-term expression by inserting their payload into the host cell’s genome, assuring transgene segregation into daughter cells after division. But their random integration can lead to transgene silencing which is often triggered by drastic changes in the epigenetic landscape during the hPSC differentiation process. Another issue is insertional mutagenesis that could, in principle, have an effect on the experiment in question. This is of particular concern for prospective transplantation studies, for example in cell therapy with patient-hiPSC derivatives. Adenoviral vectors have the advantage of high transduction efficiency and largely remain episomal after gene delivery: that is, the vectors do not integrate into the host cell’s genome. Consequently, it also follows that gene expression is usually transient in a dividing population, and the viral DNA can still randomly insert into the host cell’s genome with the same deleterious consequences as deregulated gene expression.
The disadvantages of random transgene integration drove the development of targeted approaches that precisely tailor the genome. The tradeoff with precise genome tailoring is targeting efficiency: even in mouse ESCs where such technology has been used for decades, there can be a 100-fold difference in the number of clones that undergo non-homologous versus homologous recombination, and the overall efficiency in human stem cells is even lower than in mouse (
Doetschman et al., 1987). Many different approaches have been used to facilitate homologous recombination in human pluripotent stem cells including adeno-, AAV and integration-deficient lentiviral based transgene delivery. Despite some successes, most of the field is converging on nuclease-mediated gene correction for hESCs or hiPSCs. Therefore we will focus on the rapidly evolving field of nuclease-mediated gene correction.
Precise gene editing in pluripotent stem cells
Gene targeting is mediated by homologous recombination of donor vectors containing homology arms; it can be stimulated by double-stranded breaks at or around the target locus as shown by Maria Jasin’s group (
Smih et al., 1995). Initially, such DNA breaks were executed by engineering the genome to contain a recognition site for I-SceI, a yeast mitochondrial nuclease. By creating a double-stranded break in mouse ESCs, homologous recombination was stimulated at least 50 fold. In the early 2000s, the Caroll group and others made this strategy more practical by removing the need to engineer a specific site into the genome. They designed an artificial nuclease (zinc finger nuclease, or ZFN) that fused a specific DNA recognition site to the cleavage domain of restriction enzyme FokI. The initial chimeric enzyme was used to recognize a unique site in the yellow (y) gene first in Drosophila (
Bibikova et al., 2002), and later work demonstrated the strategy’s utility in mammalian cells (
Bibikova et al., 2003;
Porteus and Baltimore, 2003;
Durai et al., 2005).
In parallel, a competing technology based on Meganucleases was being developed toward the same goal. Meganucleases are often encoded by bacteria to protect their genome from intruding phages or plasmids. The Seligman group discovered that introducing small variations in the amino acid sequences of a Meganucleases, especially homing endonucleases, can drive specific elimination of selected DNA targets (
Sussman et al., 2004). Both ZFNs and Meganucleases ushered a new era for genome editing. About ten years later another class of engineered nucleases, transcription activator-like (TAL) effector nucleases (TALENs) (
Christian et al., 2010;
Cermak et al., 2011), was reported and, another two years later, the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas system was also introduced into the gene editing field (
Cho et al., 2013;
Cong et al., 2013;
Hwang et al., 2013;
Mali et al., 2013;
Xiao et al., 2013).
The modular nature of zinc finger proteins and TALs facilitates their engineering into synthetic nucleases by grafting the catalytic domain of a restriction endonuclease (FokI) onto their DNA binding domain (Table 1). The DNA binding domain directs the nuclease to the desired locus, yielding a targeted double-stranded break. Because these nucleases act as a dimer, two DNA binding modules bind to target DNA sequences, separated by a spacer region where the nuclease catalytic domain dimerizes to cut the target DNA(
Urnov et al., 2010;
Carroll, 2011). Therefore, for ZFNs and TALENs to cleave a given locus, two different plasmids must be generated, each comprising half of the dimer. In contrast, the CRISPR/Cas system requires only a single vector construct to cleave a gene of interest. The CRISPR/Cas system has evolved in bacteria as a type of acquired immunity against phages and plasmids. The CRISPR system’s short palindromic repeats have “spacer” regions between them that encode for short segments of DNA thought to remain from past invasions. These spacer DNAs are transcribed into RNAs that are processed by the Cas proteins that then lead to the degradation of similar target RNAs upon subsequent invasions. Components of this bacterial immunity have since been used to modify the genome of pluripotent stem cells (
Jinek et al., 2012;
Cong et al., 2013;
Mali et al., 2013). Specifically, in this system, a guide RNA with a G (N21)GG sequence complementary to the target genomic locus (expressed under the human U6 pol III promoter) is complexed with an engineered Streptococcus pyogenes Cas9 protein. An SV40 nuclear localization signal targets the Cas9/gRNA complex to the nucleus and double stranded breaks are made in the genomic locus complementary to the gRNA.
When selecting a gene editing system, it is important to take into account targetting efficiency and potential off target effects of each system. The endogenous AAVS1 locus has successfully been targeted by the CRISPR/Cas system in human 293T cells, K562 cells, and induced pluripotent stem (iPS) cells as independently shown by the Church, Zhang and Charpentier laboratories (
Doetschman et al., 1987;
Porteus and Baltimore, 2003). In a comparison of all three genome editing methods, the Church group demonstrated that depending on the gRNA sequence, the CRISPR/Cas system can be more efficient than ZFNs or TALENs (
Mali et al., 2013). However, this study was conducted by one group examining a single locus (AAVS1), and similar studies need to be performed by other laboratories at additional loci. Regarding off-target effects, one potential shortcoming of the CRISPR/Cas system is that the target sequence must be a fixed size (19–21bp) which might limit site specificity. Similarly, studies of ZFN specificity demonstrate that a 38bp target sequence minimizes off target effects, yet most vectors only accommodate an 18-24bp target sequence (
Mussolino and Cathomen, 2011).In contrast to ZFNs and CRISPR/Cas, TALENs can be engineered to recognize relatively longer target sequences (
Miller et al., 2011).Even so, the relative targeting specificity of these three methods still needs to be systematically examined by high resolution DNA sequencing using multiple samples from independent laboratories.
Additional factors affecting overall genome editing efficiency include chromatin structure and homologous recombinatione efficiency. In hPSC lines, these nuclease systems can have difficulty accessing loci with less permissive chromatin configurations, which may be overcome by transient treatment with epigenetic modulators, such as valproic acid and/or 5-aza-cytidine.
27Subsequent to the nuclease inducing double-stranded DNA breaks (DSB), the DNA may be repaired through non-homologous end joining in the absence of a donor DNA template or through homologous recombination in the presence of a donor DNA template. The latter is arguably more desirable in the context of precise genome editing of hPSCs. Whereas in the absence of any targetted nuclease system, the donor DNA template must have over 3kb homology arms for efficient recombination, when using the ZFN/TALEN/Cas9 system, homology arms of ~1kb are sufficient for homologous recombination. The other important issues of homology arm cloning is the isogenic source of backbone DNA. It is known that hESCs and hiPSCs tend to have some variation in their genomic DNA, so it is suggested that the ‘isogenic’ hESC/hiPSC genomic source should be used for cloninghomology arms. Additionally, homologous recombination efficacy between these systems (ZFN, TALEN or CRISPR/Cas) seems to be comparable (
Cong et al., 2013;
Mali et al., 2013) and unpublished data in the Lee laboratory) at least at the AAVS1 locus.
It has been reported that ZFNs, TALENs, and CRISPR/Cas all have the capability to edit the genome of hPSCs (
Brunet et al., 2009;
Hockemeyer et al., 2009;
Hockemeyer et al., 2011;
Cong et al., 2013;
Mali et al., 2013), generate disease relevant mutations(
Ding et al., 2013), and even rescue such mutations (
Soldner et al., 2011;
Zou et al., 2011). For example, isogenic ‘apparently healthy’ hiPSC lines have been generated by ZFN editing (
Soldner et al., 2011) to repair the α-synuclein point mutation known to cause Parkinson’s disease. This approach is gaining increasing attention in hiPSC research as it can generate an ideal “control” cell type with an otherwise identical genetic background as the original disease hiPSC lines. The significance of this approach cannot be underestimated. Because humans have incredible genetic diversity, in contrast to inbred laboratory mouse strains, it is essential to prove that a given phenotype is caused by the specific mutation in question. Such otherwise isogenic hiPSC lines are the best control for modeling human monogenic diseases (
Zou et al., 2009,
2011;
Yusa et al., 2011;
Chang and Bouhassira, 2012;
Wang et al., 2012) as well as performing drug validation studies(
Liu et al., 2010). However, it is important to bear in mind that each engineering step necessitates a clonal bottleneck, and these clones can accumulate abberant genetic or epigenetic events, especially in the context of highly proliferating hPSCs. As a result, it remains prudent to study multiple clones of isogenic controls, even though each clone should be, in principle, genetically identical.
Applications of genetically manipulated hPSCs
In addition to editing mutations in hPSCS, these three systems (ZFN, TALEN and CRISPR/Cas) can also be used to generate genetic reporter systems in hPSCs by inserting a marker protein (like eGFP) and an antibiotic selection cassette after the stop codon of a gene of interest (
Hockemeyer et al., 2009;
Hockemeyer et al., 2011;
Cong et al., 2013;
Mali et al., 2013). As discussed earlier, previous methods to generate genetic reporter hESC lines were dependent on random integration that could cause transgene silencing, clone to clone variations, and unwanted activation of gene expression by insertional mutagenesis. In addition to reporting gene expression in real time, these new methods can also allow insertion of a protein tag (ie a FLAG or His tag) near the GFP sequence, permitting purification of the endogenous protein of interest for biochemical analysis (
Doyle et al., 2008;
Heiman et al., 2008;
Nishiyama et al., 2009). Another use for targeting nucleases is to engineer translocations frequently found in cancer. The use of pluripotent stem cells permits the engineering of cancer translocations in the correct cellular context so that one can study the early events of different cancer types (
Brunet et al., 2009;
Piganeau et al., 2013).
From a practical standpoint, the CRISPR/Cas system seems relatively easy to implement since it has only one cloning step and requires one plasmid to be constructed for facilitating the gene targeting process, whereas TALENs have multiple cloning steps (taking several days or even more than weeks) and require a pair of plasmids. However, this can be counterbalanced by the availability of a genome-wide library for each nuclease system using a high-throughput genome-scale cloning system. For general users, the Kim group has built an online ordering system to distribute a large scale of TALEN plasmids targeting over 18740 protein-coding genes (
Cong et al., 2013) and the Church group has also developed a genome-wide database of ~190000 unique gRNAs targeting ~40.5% of human exons (
Mali et al., 2013), demonstrating the feasibility of fabricating a genome wide gRNA library. Both efforts will free the end users from the cumbersome process of constructing TALEN or gRNA vectors, but construction of donor vectors still remains a challenge. For certain genes, it is likely that a constellation of nucleases will be needed to repair all mutations found in human disease, and the CRISPR/Cas system can be easily multiplexed through the use of multiple gRNAs (
Cong et al., 2013;
Wang et al., 2013). In fact, the Jaenisch group has reported the generation of transgenic mice in which two loci were targetted simultaneously, using the CRISPR/Cas system. As such, multiple practical issues must be considered beyond the availability of genome-wide libraries.
Regarding implementation of the CRISPR/Cas9 genome editing system, the Church group has made available a practical guide on their laboratory’s Addgene site (http://www.addgene.org/crispr/church/). Specifically, one should examine all 23 bp sequences of the form 5′-N20-NGG-3′ near their target site, and select the sequence that is least repeated elsewhere in the genome. The target sequence can then be cloned into the U6-driven gRNA target expression vector, available from the Church group. The gRNA expression vector and Cas9 expression vectors are then cotransfected to initiate double stranded breaks, and a donor template may be provided to create the desired indel or mutation.
Although we have focused on their nuclease activity thus far, these systems can have additional molecular functions, such as activating or inhibiting gene expression or modifying DNA methylation status in a specific locus, extending their utility beyond genome editing systems. For example, TAL effector proteins are transcription factors from a bacterial plant pathogen that recognize DNA sequences in a predictable modular fashion, making them ideal candidates for controlling transcription of a gene of interest (
Bogdanove and Voytas, 2011). In a process known as CRISPR interference (CRISPRi) (
Qi et al., 2013), a catalytically mutant Cas9 lacking endonuclease activity, when co-expressed with a guide RNA, interferes with RNA polymerase binding, transcription factor binding, or transcriptional elongation of the target sequence. Another interesting application of these site-specific nucleases is to combine them with functional enzymes such as methyltransferase (dimerized endogenous form). In one experiment, a target locus was artificially methylated, allowing for site-specific epigenetic changes with high specificity (
Chaikind et al., 2012). In addition, the Church group has recently reported the development of surface zinc fingers (sZFs) to barcode cells through surface expression of programmable zinc-finger DNA binding domains; these can interact with specific extracellular DNA probes and can potentially comprise a highly specific cell capturing system or facilitate targeted cellular delivery of functional recombinant virus (
Mali et al., 2013).
Conclusion
Technological advance in human pluripotent stem cell biology has lagged far behind that of mouse stem cells, in large part due to the inability to efficiently perform homologous recombination. Randomly-integrated transgenes were often adequate for reporting cell fates in the early years of human pluripotent stem cell research, but the advent of induced pluripotent stem cell technology has made site-specific changes more important. The increasing availability of site-specific recombinases is making homologous recombination much more efficient. These technologies have evolved from systems in which long-term, expensive, large-scale screens were once needed to identify DNA binding domains, into rationally-designed systems that only require the ordering of a few oligonucleotides. Our ability to engineer changes in specific base pairs in hPSCs should close the chasm between mouse and human pluripotent stem cells and allow us to probe the vast genetic diversity and complexity of human developmental biology and disease pathogenesis. In conclusion, we believe that recent technological advances in the gene editing field certainly can synergize with the power of hPSCs, ultimately maximizing their potential for advancing clinical medicine as well as understanding human biology.
Higher Education Press and Springer-Verlag Berlin Heidelberg