Advances in genetic engineering of domestic animals

Global population will increase to over nine billion by 2050 with the doubling in demand for meat and milk. To overcome this challenge, it is necessary to breed highly efficient and productive livestock. Furthermore, livestock are also excellent models for human diseases and ideal bioreactors to produce pharmaceutical proteins. Thus, genetic engineering of domestic animals presents a critical and valuable tool to address these agricultural and biomedical applications. Overall, genetic engineering has evolved through three stages in history: transgenesis, gene targeting, and gene editing. Since the birth of the first transgenic pig, genetic engineering in livestock has been advancing slowly due to inherent technical limitations. A major breakthrough has been the advent of somatic cell nuclear transfer, which, for the first time, provided the technical ability to produce site-specific genome-modified domestic animals. However, the low efficiency of gene targeting events in somatic cells prohibits its wide use in agricultural and biomedical applications. Recently, rapid progress in tools and methods of genome engineering has been made, allowing genetic editing from mutation of a single base pair to the deletion of entire chromosomes. Here, we review the major advances of genetic engineer- ing in domestic animals with emphasis placed on the introduction of latest designer nucleases.


Introduction
There is an old Chinese saying: food is the paramount necessity of the people (in other words, hunger breeds discontentment) indicating the importance of food security. Worldwide, the population is still growing with an estimate of nine billion by 2050, which means a requirement for a 50% increase in food supply (Data from the Population Division of United Nations). In particular, given the rapid economic development and urbanization in developing countries, including China, food composition is changing with the demand for animal protein increasing dramatically. Specifically, the FAO (the Food and Agricultural Organization of the United Nations) predicts that global demand for meat and milk will double by 2050. Thus, highly efficient and productive livestock are needed to meet this increased demand. Nonetheless, traditional breeding of livestock is limited by the length of the process, slow progress of genetic improvements and inability to isolate desired traits from undesirable traits [1] .
To overcome these limitations, it is necessary to take advantage of advances in biotechnology, such as gene engineering (Fig. 1). For instance, expression of lysostaphin specifically in the bovine mammary gland makes the cow highly resistant to infection by Staphylococcus aureus. Extensive efforts in our and other laboratories have been made to improve meat production [4,5] , optimize milk composition with improved nutritional value [6][7][8] .
Domestic animals also present excellent models in biomedical research (Fig. 1). For example, pigs are widely used as models for human diseases, including diabetes, Alzheimer's disease, cystic fibrosis and cardiovascular diseases [9] . Furthermore, pigs are proposed as potential sources of organs to address the shortage of donated organs for transplantation in humans [10] . Cows have also been genetically modified as bioreactors to produce pharmaceutical proteins [6] . Currently, the most popular method to produce pharmaceutical proteins is to use a mammalian cell culture system, which is expensive and limited by its capacity, whereas the use of bovine mammary gland for the expression of human genes promises to be a cost-effective method to produce valuable pharmaceutical proteins [11] .
Advances in animal genetics, reproduction and nutrition have resulted in a significant increase in the productivity of modern livestock. However, further development of these fields requires a better understanding of the identity and function of critical genes affecting important economic traits. Meanwhile, with the completion of whole genome sequencing for various mammals, functional annotation of their genomes remains elusive, especially in livestock due to limitations in "loss of function" tools. Indeed, mouse genetics has been greatly enhanced by the use of sitespecific genome mutagenesis in embryonic stem cells (ESCs) followed by production of knockout animals. However, reliable ESCs have yet to be established for livestock.
To fulfill the demand for agricultural and biomedical applications as well as fundamental research in animal genetics, extensive efforts have been made to advance the genome modification technologies in livestock (Fig. 2). From the middle 1980s to the early 1990s, pronuclear DNA microinjection was the most popular approach to generate genome modification, albeit in a random-integration manner [12,13] . Thanks to the birth of the cloned sheep, Dolly, gene targeted animals could be produced by performing site-specific genome modification in somatic cells followed by nuclear transfer [14] . This strategy is technically challenging, however, preventing its wide use in functional characterization of livestock genomes. Recently, application of designer nucleases, particularly the clustered regularly interspaced short palindromic repeats-associated protein 9 (CRISPR-Cas9) system, in site-specific genetic engineering provides a revolutionary tool for animal biotechnology and functional genomics in domestic animals [15,16] .
In this review, we will summarize the major advances in genome engineering of livestock over the last three decades. Emphasis is placed on the recent introduction and applications of designer nucleases, including zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), CRISPR-Cas9 in livestock.
2 History of genetic engineering in animals 2.1 Phase one: random-integrated transgenesis In the 1970s, the earliest trials of genomic modification in mammals were conducted in mouse after the first application of genetic engineering in bacteria [17] . The major advances in the field of molecular biology (e.g., recombinant DNA) and feasibility of culturing and manipulating mouse embryos made it possible to perform mouse transgenesis during that period. Several groups independently generated transgenic mice during the late 1970s and the early 1980s [13,[18][19][20] . Among them, the first transgenic mouse that resulted in changes in phenotype expressed rat growth hormone (GH) [13] . Pronucleus DNA microinjection was adopted as the most popular method to deliver foreign gene to the host genome [18] . The first report of transgenesis in livestock animals, involved microinjection of the gene encoding human growth hormone into the pronuclei or nuclei of eggs from super-ovulated rabbits, sheep and pigs [12] . Integration of the gene into the host genome was demonstrated in all three species. Six years later, the first transgenic cattle were generated via similar approach [21] . This technique had low efficiency, however, with only 1% of injected zygotes producing transgenic animals [22] .
Meanwhile, several alternative delivery methods were explored. Retrovirus infection: genetically modified animals could be produced by infecting early embryos with a replication-deficient virus carrying the transgene. Indeed, employing this approach in pigs dramatically enhanced the proportion of transgenic offspring obtained [23,24] . Spermmediated DNA transfer: sperms are capable of integrating foreign DNA during fertilization, thus allowing highly efficient production of transgenic animals [25] . These two methods are technically easy and relatively cheap to deliver a foreign gene into the host genome. Nonetheless, disadvantages include the difficulty in separating embryos for transgene integration before embryo transfer, the lack of specificity of expression because of random integration of exogenous DNA, and the inability to perform other types of gene editing (e.g., deletion) rather than transgene insertion.
The integration of exogenous genes into the host genome is random due to its reliance on double-stranded DNA break repair machinery. As a result, some embryos failed to develop when the foreign gene was inserted into sites that interfere with the expression of critical genes. Moreover, when transgene integration occurred within heterochromatin, expression of the transgene could be suppressed and modulated by those adjacent regulatory elements. With approximately 15% of microinjected livestock embryos resulting in live offspring, and a success rate of less than 1% of those embryos resulting in transgenic offspring, embryo survival and transgene integration have been major hurdles in the advancement of transgenic animal production [26] . These inherent problems limit its extensive use as an important tool in addressing fundamental questions in life science as well as agricultural and biomedical applications.

Phase two: homologous recombination-directed gene targeting
The principle of homologous recombination (HR) was first documented in bacteria over 50 years ago. In the 1970s, similar machinery was also identified in eukaryotes [27] . It was subsequently shown that HR-mediated recombination events occur in mammalian cells [28] . Taking advantage of this endogenous mechanism, it was used to deliver a plasmid DNA sequence into the genome of a human cell [29] . Another milestone in the history of mammalian genetic engineering, was the successful culture and manipulation of mouse ESCs in vitro during the early 1980s [30] . Thus, ESCs were soon adapted to create sitespecific genetically engineered animals by injecting the genetically targeted cells into blastocysts [31,32] .
To date, ESCs have not been established for livestock, despite extensive efforts having been made, prohibiting its large-scale applications in gene targeting of livestock. Fortunately, a revolutionary tool, somatic cell nuclear transfer (SCNT), was successfully implemented with the birth of Dolly in mid-1990s [33] . For the first time, SCNT presents a methodological basis for performing sitespecific/targeted genetic modifications in livestock animals [14,34] . Nonetheless, in comparison with mouse ESCs, somatic cells have limited life span in cell culture and low HR activity (one desired recombinant event per 10 6 cells) [35,36] , which present an enormous technical challenge to conduct targeted genetic modifications in somatic cells. Despite this low efficiency, Prather and his colleagues obtained the first pig carrying site-specific genetic modifications using this approach [14] . Shortly afterwards, the first cattle with site-specific genetic modifications were produced [37] .
A variety of strategies have also been developed to improve the efficiency of gene targeting via HR. These methods include positive and negative selection [38] , and gene trapping [14] . Furthermore, the flanking homologous arms were extended via BAC vectors to improve HR efficiency. For instance, the usage of modified BAC vectors in targeting CFTR and DMD [39,40] porcine kidney cells led to increased targeting efficiency (above 1%).

Phase three: designer nuclease-mediated gene editing
A breakthrough in the efforts to identify approaches to enhance HR was the discovery that the efficiency of HRmediated gene targeting in mouse ESCs can be improved significantly by the targeted introduction of double strand break (DSBs) created by a rare-cutting endonuclease I-SceI [41][42][43] . Eventually, experiments with zinc finger proteins demonstrated its potential as a designer nuclease to generate a DSB at a site of interest [44] . So far, three major designer nucleases have been identified, namely, ZFNs, TALENs and RNA-guided endonucleases derived from the bacterial CRISPR-Cas system.
All these nucleases share similar mechanisms in mediating site-specific gene editing. They produce sitespecific DNA DSBs, which elicit the endogenous DNA repair system. Generally, there are two major repair mechanisms: non-homologous end-joining (NHEJ) and homology-directed repair (HDR), both of which can be used for targeted editing as defined below [45] (Fig. 3).
Gene disruption: In absence of homology arms, the error-prone NHEJ pathway can introduce short insertions and deletions (indels) in the target site, eventually resulting in frame-shift mutations and gene disruption. Gene insertion: Editing can be achieved by delivering designer nucleases along with a targeting vector carrying the genetic segment to be inserted and flanked by homology arms.
Single nucleotide mutagenesis: Single nucleotide/point mutations can be corrected or introduced in the genome via delivery of both designer nucleases and targeting vectors or single-strand oligo deoxynucleotides.
Rearrangements of chromosomes: DSBs stimulated by designer nucleases can also trigger chromosomal rearrangements, such as deletions, duplications and inversions.

Zinc finger nucleases
ZFNs have two components: nonspecific DNA cleavage domains, derivatives of FokI endonuclease and sitespecific zinc finger proteins, which are naturally occurring and the most abundant DNA binding domains in humans. Each zinc finger domain can recognize 3 bp of DNA. Two decades ago, the first crystal structure of zinc finger proteins was disclosed, showing simplicity of the interaction with DNA [46] . Since then, extensive efforts have been conducted to explore methods to engineer these proteins with sequence-specificity. Eventually, zinc finger proteins can be assembled with user-defined specificity, although this is time-consuming and expensive. Up to now, ZFNinduced gene editing has been achieved in a variety of organisms, including Drosophila [47] , Caenorhabditis elegans [48] , zebrafish [49] , rats [50] , cattle [5,51] and pigs [52,53] .
In 2011, we successfully obtained β-lactoglobulin in knockout cattle via a ZFN-induced gene-targeting strategy with high efficiency [51] . β-lactoglobulin belongs to a major whey protein in bovine milk and has been found as a major milk allergen. Eight cattle were produced from the biallelically targeted donor cells. However, only one calf survived because of the high-frequency of calf loss resulting from animal cloning. The high rate of induced bi-allelic mutations greatly reduces the time to produce null mutant livestock animals and was a great stride for animal genome modification.
In our laboratory, ZFNs were also assembled to disrupt the bovine MSTN gene that encodes myostatin protein [5] . Myostatin belongs to TGF-β superfamily and inhibits the growth of muscle cells. Using a ZFN-induced gene targeting strategy, the frequency of mutation is around 20% and 8% bi-allellic mutation efficiency is achieved. By performing SCNT, we successfully obtained cloned cattle from the MSTN-targeted fibroblast cells derived from Chinese yellow cattle [5] . As expected, these cattle exhibit a double muscling phenotype as early as one month old and muscle fiber hypertrophy is also noted.
Although the application of ZFNs opened a new era of gene targeting, it has not been widely used due to the difficulty in engineering active ZFNs and validating them for target sites of interest. Another disadvantage is its inability to target DNA sites at a resolution of a single nucleotide (e.g., SNPs and active sites of enzymes) due to its characteristic of context-dependent specificity in recognizing DNA.

Transcription activator-like effector nucleases
Transcription activator-like effector (TALE) proteins exist naturally only in Xanthomonas bacteria [54,55] . TALENs, like ZFNs, are artificial fusion proteins that contain a sequence-specific DNA binding motif (TALE domain) and nonspecific FokI endonuclease. It has become a valuable tool in genome engineering since the TALE-DNA binding code was determined in 2009 [55,56] . TALEs, like ZFNs, consists of an array of module repeats, each of them having 34 amino acids versus 30 for zinc fingers. Each TALE module is able to recognize one nucleotide of DNA. The use of TALEN in genome engineering was first reported in 2010 [57] and by 2012, it was used to produce genetically modified livestock [58] .
As one of the first studies in using TALENs to modify the genomes of livestock, Carson et al. demonstrated that TALENs are very efficient in making targeted DNA modifications in the bovine genome in both fibroblast cells and preimplantation embryos [58] . When fibroblast cells were co-transfected with two TALEN pairs targeting the same chromosome in pigs, large deletions and inversions of chromosomal segments were found to occur [58] . Also, MSTN-targeted cattle can be made simply through microinjection of TALEN mRNAs to bovine zygotes, greatly reducing the time needed to produce homologous gene-targeted livestock.
Compared with ZFNs, TALENs are easy to design and relatively cheap to assemble commercially (Table 1). However, the difficulty in design, synthesis and validation of TALEN proteins remains a major challenge that limits its widespread application in genome engineering.

Clustered regularly interspaced palindromic repeats
First described in 1987, CRISPRs consist of an array of short repetitive sequences (29 nt direct repeats) interspaced by short sequences (32 nt) in the Escherichia coli genome [60] . It is noteworthy that similar repeat elements have also been found in numerous other bacterial and archaeal species. These sequences were later classified as unique repeat elements existing in more than 40% of sequenced bacteria and the majority of archaea [61] . The term CRISPR was coined later and Cas genes were identified located adjacent to each CRISPR locus [62] . However, the functional role of CRISPR and Cas9 in prokaryotes remained unknown.
A breakthrough observation was published in 2005, indicating that the spacer sequences separating the direct repeats within CRISPRs originated from plasmid and phage-associated sources [63][64][65] . Previous studies also demonstrated that (1) CRISPR loci are transcribed [66] , (2) Cas proteins carry putative nuclease and helicase domains [62] , and (3) viruses cannot infect archaeal cells that contain spacer sequences matched snippets of their own DNA [63] . Based on this evidence, it was proposed that CRISPR-Cas is an adaptive immunity system that defends prokaryote cells against phage infection [63,64] . This hypothesis was confirmed in 2007 by the infection of the lactic acid bacterium Streptococcus thermophilus with lytic phages [67] . In 2008, mature CRISPR RNAs (crRNAs) were shown to serve as guides in a complex with Cas proteins to interfere with virus infection in E. coli [68] .
It was originally presumed that spacers within CRISPR interfere with viral gene expression in a similar manner to RNAi. In 2008, it was demonstrated in Staphylococcus epidermidis that the target of Cas enzyme activity is DNA, not RNA, ruling out the possibility of an RNAi-like mechanism for the CRISPR-Cas system [69] . Two years later, Moineau and colleagues discovered that the CRISPR-Cas9 system creates DSBs in the exact position of the target DNA 3 base pairs upstream of the protospacer adjacent motif (PAM) sequence, highlighting the critical role of the PAM sequence [70] . Moreover, Cas9 is the only protein within the Cas gene cluster that is required for targeted DNA cleaving in the CRISPR-Cas9 system [70] .
Clarifying the final uncertainty about the mechanism of CRISPR-Cas9-mediated immunity system, Charpentier and colleagues observed a second non-coding RNA called trans-activating CRISPR RNA (tracrRNA) distinct from crRNA as determined by small RNA sequencing of Streptococcus pyogenes [71] . In addition, they found that tracrRNA hybridizes with crRNA to form a duplex, which guides Cas9 to its targets [71] .

Applications of CRISPR-Cas9-induced genome editing
In 2013, the CRISPR-Cas9 system was adopted for genome editing in mammalian cells. Zhang and colleagues constructed two different Cas9 orthologs and revealed targeted DNA cleavage in both mouse and human cells [15] . It was also been shown that this system could be harnessed to target multiple DNA loci and mediate HDR [15] . At the same time, another laboratory from Harvard University reported similar findings [16] . Since then, CRISPR-Cas9 has been widely employed as a preferred tool to perform genome editing in a variety of cells and organisms, including mice [72] , rats [73] , pigs [74] and monkeys [75] , because of its simplicity in design and synthesis (Fig. 4). Commercial price [59] ($/target) 4000-7000 3600-5000 500 Note: 1, Includes wildtype Cas9 and its variants; 2, defined as the proportion of NHEJ-mediated indels created at the target DNA sites.
Typically, there are two essential components of a customized CRISPR-Cas9 system: non-specific Cas9 enzyme and sgRNA (single guide RNA), a designed RNA that mimics the crRNA-tracrRNA hybrid and guides Cas9 to target sites. A prerequisite for selecting target sites is the presence of PAM immediately downstream of the target site because of the essential role of PAM in mediating DNA cleavage activity of Cas9. Since the initial report of the use of CRISPR-Cas9 in genome editing, extensive efforts have been made to optimize the system. For example, optimizing the Cas9 codon and engineering of Cas9 by adding a nuclear localization signal, leading to improved gene editing activity in mammalian cells or organisms, e.g., human cells, mice, and rats.
One distinguishing feature of the CRISPR-Cas9 system is the capacity to simultaneously cleave multiple genes. This can be achieved by co-expression of an array of CRISPR carrying multiple spacers or multiple sgRNAs that target distinct sequences. With this strategy, efficient gene editing at multiple DNA sites has been demonstrated in mammalian cells and early embryos [16,72] . Furthermore, deletion and inversion of large chromosomal segment can be completed by this approach.
Cas9 protein has also been engineered to destroy its DNA cleavage activity while maintaining its DNA binding activity. This modified type of Cas9, also termed dead Cas9 (dCas9), are being explored for multiple purposes in molecular biology as a RNA-guided target DNA binding protein. For instance, dCas9 can be used to modulate transcription [76][77][78] . Indeed, transcription initiation or elongation can be inhibited when dCas9 is guided to bind the promoter region or open reading frame in human cells [76,79] . Fusion of dCas9 with the transcriptional repression domain of KRAB will lead to significantly reduced transcription in either yeast or human cells [76,80] . Moreover, tagging dCas9 with fluorescent proteins can be used to visualize specific DNA loci in live cells [78] . For example, Chen and coworkers successfully observed the spatiotemporal dynamics of DNA sequences of interest in live cells by fusing dCas9 with Egfp [81] .
Using the CRISPR-Cas9 system, knockout pigs have been successfully produced by either SCNT [82] or directly by microinjection of zygotes [74,82] . Zhou and his colleagues first demonstrated the feasibility of efficiently producing bi-allelic knockout pigs simply through injection of Cas9 mRNA and sgRNA into zygote cytoplasm [74] . Sixteen piglets were generated, including six bi-allelic mutants and five mono-allelic mutants, indicating a high efficiency of gene targeting. Meanwhile, they also addressed low toxicity of the genome manipulation to the porcine embryonic development as evidenced by the high birth rate and survival rates [74] .
The generation of CD163 and CD1D knockout pigs using the CRISPR-Cas9 system has also been reported [82] . Astonishingly, 100% efficiency of Cas9-induced targeted gene modification was achieved in injected zygotes. As in Fig. 4 Overview of producing genome modified animals with CRIPSR-Cas9 technology mouse and human cells, it was also feasible to disrupt multiple genes in parallel by delivering CRIPSRs targeting multiple genes into zygotes produced in vitro [82] .
By comparison with ZFNs and TALENs, the CRISPR-Cas9 system holds similar specificity and, more importantly, it is relatively easy to design and synthesize with minimal cost [83] (Table 1). The advent of CRISPR-Cas9 tools has indeed revolutionized the creation of targeted gene modified animals. Previously, it typically took years to obtain pigs carrying bi-allelic mutants but now it takes only 6 months.

Specificity of CRISPR-Cas9 technology
Off-target effects are a major concern for the application of the CRISPR-Cas9 system since the off-target events may lead to unwanted genome modifications. To overcome this challenge, a variety of methods (detailed below) have been developed to improve the targeting efficiency of the CRISPR-Cas9 system.

Selection of target sites
Multiple factors related to selecting target sites and designing sgRNAs have been found to affect the frequency of off-target events. For instance, high GC content of the target site is recommended to enhance hybridization and accommodate more mismatches because low GC contents were associated with high frequency of off-target events [84] . 4.2 Selecting delivery content (DNA/RNA/protein) This is helpful for reducing off-target effects by delivering Cas9 mRNA/protein and sgRNA rather than Cas9 and sgRNA expression plasmids. mRNA/protein only work temporarily, however, plasmids will stay for a relatively long time to express Cas9 and sgRNA, which increases the frequency of off-target events and leads to more random integration of CRISPR-Cas9 into the host genome [85] .

Using double-nicking approach
Cas9 enzymes cut DNA by using their HNH and RuvC nuclease domains, each one cleaving a strand of DNA to produce a blunt-ended DSB. Cas9 can be engineered into a nickase that only generates a single-stranded break (SSB) by introducing mutations (D10A or H840A) to inactivate one of those two nuclease domains. DNA SSBs are typically repaired through the base excision repair pathway, which exhibit high fidelity. Thus, paired sgRNAs and Cas9 variants with D10A or H840A mutation can be harnessed to induce DSBs and produce efficient indel formation. This strategy would greatly enhance the specificity compare to wildtype Cas9 due to the fact that those off-target nick sites can be precisely repaired through the BER pathway [86] .

Truncated sgRNAs
It has been reported that truncated sgRNA with less than 20 nt complementary sequences significantly reduces the chance of undesired mutagenesis at off-target sites [87] . Moreover, if this strategy was used in conjunction with a double-nicking approach, the off-target effects will be further reduced.

Conclusions
Before the advent of designer nucleases as tools for genome editing, the dissection of gene function was restricted in higher organisms by a lack of efficient tools for targeted genome modifications. Only during the past 5 years, a substantial number of different targeted genomemodified animals, including livestock and primates, have been produced for addressing issues in animal production or human medicine. It is reasonable to expect that our knowledge of functional genomics and applications of these tools will be greatly increased in the near future.
Among these designer nucleases, the CRISPR-Cas9 system is leading this advance and is currently the most convenient and cost-effective gene editing tool. Future investigation of the biochemistry and structure of the CRISPR-Cas system will help us design Cas9 mutants with enhanced performance, including improved specificity. Indeed, recent introduction of enhanced specificity of Cas9 is based on the elucidation of the crystal structure of the sgRNA-Cas9-target DNA complex [88] .