About thirty years ago when the human genome project (HGP) was first proposed, there was widespread disagreement on whether it was worth spending $3 billion to sequence a genome with only 3% coding sequences, which was potentially with enormous errors due to technical limitations at the time [
1]. However, it turned out to be successful, generating a return on investment to the US economy estimated at more than $796 billion [
2]. In addition, it led to a wave of biotechnology revolutions including ultra high-throughput sequencing techniques, long read sequencing, computational methodologies and others, which were not foreseen at the inception of the project. Ultimately it led to a better understanding of the fundamentals of life sciences and promises to provide enormous benefit to human health. Synthesis of a complete genome has encountered similar criticism, especially in the face of emerging techniques that could potentially be used to editing whole genomes [
3‒
6]. We envision the impact of writing genomes will be even larger than that of reading genomes. Also, the societal implications will also be more complicated and need to be handled with greater care [
7].
Why synthesize a complete genome? It has been a long interesting question whether a living cell (and not only its genome) could be made from scratch in the lab, a goal which may not be realized anytime soon [
8]. Therefore, making an organism’s genome, which contains all information a living organism requires, is a good alternative. Second, despite the fact that modifying one genome is facilitated by recent advances in genome-editing technologies, it still takes years to make a small but pervasive change throughout the genome such as eliminating one codon from
E. coli [
9‒
14]. Third, there are lots of things that may not be feasible, if not impossible, to test without a designer genome bearing these pervasive changes. For example, how else can one create a multitude of randomly rearranged genomes with changed position and copy number of large numbers of genes? What are the functions of the repetitive sequences, which occupy over 50% of human genome and even larger fractions of some plant and other animal genomes?
Efforts to synthesize a whole genome dated back to the time when the HGP was near completion. As early as 2002, by synthesizing a full-length polivirus cDNA, Eckhardt Wimmer and his colleagues were able to, for the first time, assemble a poliovirus genome starting from oligonucleotides [
15]. In 2003, a paper from Hamilton O. Smith and Craig Venter reported the synthesis of the 5.6 kb Fx174 bateriophage genome [
16], the first organism to get its genome sequenced by Fred Sanger in 1978 [
17]. It took two weeks from the design of chemical sequences to the assembly of a living virus. Five years later, the
Mycoplasmagenitalia genome, which is 100 times bigger than that of Fx174, was made from scratch [
18]. Unfortunately, the genome failed to boost a living cell due to mutations. It was not until 2010, after synthesis of the somewhat larger
Mycoplasma mycoides genome, the group was able to construct the first self-replicating cell powered by a man-made genome [
19]. This feat depended critically on the Venter group’s critical earlier demonstration that entire genome size DNA molecules could be transferred from
M. mycoides to a closely related species,
M. capricolum, a process referred to as genome transplantation [
20]. Based on the synthetic
M. mycoides, the same group recently produced a bacterial cell with a minimal genome by reducting the size over 50% [
21]. However, the aforementioned genomes of goat and cattle pathogens are all relatively small, the sequences constructed were essentially wild-type (i.e., not extensively engineered) and of course the genomes in question are from prokaryotic cells. Has the time arrived to tackle much larger and more complex eukaryotic genomes?
In the past decade, high-throughput sequencing technology has completely revamped our knowledge about genomic information. Nowadays, we are able to read about 15 petabases per year [
22]. The trove of sequencing data allows us to identify new functional proteins and pathways never previously known. In addition, it provides a chance to compare the genomes from related species and guide more rational genome re-design. In 2006 when we first designed the synthetic yeast project, the price for every base pair was around $1 and typical projects were 1 kb in length. Today, the price has dropped substantially (about 20-fold) to obtain DNA fragments of multiple kilobases, enabling more affordable genome building, although price is still a major obstacle. New technologies to synthesize DNA, facilitated by methods exploiting synthesis of oligonucleotides made on microarrays and produced as pools [
23,
24] and other methods [
25] are beginning to come on line. If the price of gene synthesis continues to drop as expected, it is possible to synthesize a 10-million-bp genome for less than 1 million USD in five years. Finally, different technologies to assemble large size DNA fragments from small ones have been invented, enabling “scarless” assembly of designer DNA such as the golden-gate cloning method and
in vitro as well as
in vivo recombination-based techniques [
26‒
31]. All of these provide the knowledge and tools to think about designing and synthesizing larger genomes such as the
S. cerevisiae genome (Sc2.0, Figure 1).
The aim of the Sc2.0 project is to build a designer genome which will allow us not only to test our ability to construct a yeast with multiple synthetic chromosomes, but also to answer a series of important biological questions. Therefore, besides testing our ability to synthesize a ~12-million-base pair yeast genome, we also incorporated a lot of engineered features into the synthetic genome [
32], such as i) Genome reduction by systematically removing all retrotransposons and subtelomeric repeats. In addition, most introns were removed, and the tRNA genes were relocated to a specialize chromosome. ii) Genome recoding to incorporate trackable watermarks (PCRtags) without changing the amino acid sequences. In addition, the least abundant codon, namely the TAG stop codon was replaced by TAA, allowing future genetic code manipulation. iii) Genome expansion by addition of symmetrical loxP sites, leading to the development of a “SCRaMbLE” system to facilitate future genome manipulation [
33]. These design features are not “fixed” and new ones could be included in the future. Recently, two papers, one on the design and synthesis of a chromosome arm and the other about the first completely synthetic chromosome were published [
32,
34]. Excitingly, seven papers which covering five synthetic chromosomes were published in recent issue of Science, setting another milestone in the journey of eukaryotic genome synthesis [
35‒
41]. Work on the synthesis of remaining yeast chromosomes is ongoing around the world.
What can the synthetic yeast be used for? The initial designs of the project allow us not only to generate an organism with a completely synthetic genome, but also to test several important biological questions such as the function of pervasive retrotransposons, repetitive sequences, and introns. At the end, we hope the synthetic yeast can also be used as a platform in biotechnology industry. One potential example is that with the incorporation of the SCRaMbLE system, the synthetic yeast could be used to produce new strains with improved fermentation efficiency and ethanol tolerance, therefore, better for the wine industry or biofuel production. In addition, the strains can also be served as a cell-factory to manufacture valuable products such as fine chemicals, drugs, antibiotics or vaccines by shuffling the exogenous pathways into the synthetic genome. More applications could be developed in the future with the Sc2.0 “open source” project.
Besides science, the Sc2.0 project has become a great platform for international collaboration (Figure 1). Since the first synthetic yeast genome meeting organized by BGI and Tsinghua University in 2012, which sparked the partnership, the Sc2.0 consortium now has over ten research groups across four continents and also industrial participants. In addition, the BAG (build-a-genome) course [
42], which was designed as the education part of Sc2.0, has become a real success. Since it was first taught at Johns Hopkins University in 2011, three more US universities/colleges are now piloting or offering the course. Outside the US, Tianjin University has become the first institute offering the course to both postgraduates and undergraduates, and has finished making two chromosomes in the past two years. The BAG course served as a great teaching vehicle to educate the young students, giving them chance to experience real scientific research and synthetic biology. Up to now, through collaboration, the consortium has
ChrII,
ChrV,
ChrVI,
ChrX and
ChrXII completely synthesized [
36,
38‒
41] while several other chromosomes are in progress or near completion. In addition, several tools/technologies have been developed through the execution of Sc2.0 [
26,
43,
44]. Finally, it is worth mentioning that in this international collaboration project, the three teams in China will in total contribute five chromosomes, which cover over 40% of the yeast genome, much more than China’s contribution to the human genome project (1%).
One concern regarding Sc2.0 and many other engineered organisms is the worry that the artificial organisms could contaminate the environment, endanger the natural species or somehow be used nefariously. It may be hard to turn the synthetic yeast into a bioweapon but it does have the theoretical potential, however slim, to become a threat to the natural species, if beneficial mutations keep accumulating. While no plan exists to release Sc2.0 into natural environments, and ongoing studies suggest that the designer changes tend to make it less fit rather than more so, there is a general need for safeguarding mechanisms for engineered microbes, some of which are intended to be deployed into the natural environment. To address this potential problem, a “genome safeguard” system has been designed and will eventually be incorporated into the synthetic yeast. In this system, two orthogonal classes of switches were constructed, allowing the synthetic strains to survive only in the presence of small molecules absent from the native environment [
45]. Recently, CRISPR-Cas9 gene drive systems were designed to prevent unintended genome editing occurring through the escape of lab strains [
46]. Studies in bacterial systems have developed genome safeguards based on the TAG codon-engineered strains (
REcoli) and orthogonal tRNA synthetase systems [
12,
13], and on riboregulation of essential genes [
47].
The consortium of investigators leading the Sc2.0 project also came together around a “Statement of Ethics and Governance” addressing bioethical and biosafety aspects of the project. This statement, which is posted on the project’s web site syntheticyeast.org and the process by which it was collectively drafted, is described in a recently submitted publication [
7].
Given the increasing capacity to synthesize larger DNA fragments and deceasing price on gene synthesis, we are optimistic on the completion of the DNA synthesis phase of project Sc2.0 by 2017. It will become the first eukaryotic system with a genome size over 10 million bases completely synthesized in human history. However, we still face possible challenges since despite that the yeast strain bearing a single synthetic chromosome seems to grow normally, how the final strain containing all 16 synthetic chromosomes will behave remains to be determined. It is quite possible that the creeping effect of accumulation of many tiny fitness defects distributed throughout the chromosomes will conspire to thwart final assembly into a single strain.
In the mean time, it makes sense to think about what could be the next genome in the pipeline after Sc2.0. There are several microbial candidates such as the bacterium
E. coli, the model prokaryote used widely in metabolic engineering and synthetic biology. Actually, recent work from George Church’s lab demonstrated that seven codons could be eliminated from the
E. coli genome after design and reconstruction [
14]. On the eukaryotic side, the worm
C. elegans, with a relatively small genome (97 million bases) would allow us to test whether we have sufficient knowledge and technology to design a synthetic genome capable of directing normal differentiation. Excitingly and also controversially, the Human Genome Project-Write (HGP-Write) was proposed recently, promoting not only the ability to design and synthesize an ultra large genome, but also the ethical framework [
48]. However, many technical challenges remain since none of these organisms have the high intrinsic efficiency of mitotic recombination as that offered by budding yeast, and therefore, it is not practical to perform the step-wise genome replacement methods used in Sc2.0. Recent advances on CRISPR/Cas9-mediated genome editing could potentially provide a good solution [
3,
4]. Alternatively,
S. cerevisiae assembly followed by genome/chromosome transplantation could potentially be used.
Higher Education Press and Springer-Verlag Berlin Heidelberg