Introduction
Monogenic disease (also called Mendelian disease) caused by a single gene variant conforms to the laws of Mendelian inheritance. Over 6000 Mendelian diseases have been identified according to the statistics of Online Mendelian Inheritance in Man (OMIM); however, the pathogenic genes of about half of these diseases remain unknown (http://www.ncbi.nlm.nih.gov/omim, 2011).
The next-generation sequencing (NGS) technology introduced in 2005 is a high-profile tool with advantages in detecting genetic variations ranging from single nucleotides to larger structural rearrangements. Whole-exome sequencing (WES) can be used to perform NGS of exon-enriched samples and to identify protein-coding mutations, including missense, nonsense, splice site, and small deletion/insertion mutations. Using WES in Mendelian disorders only needs a small number of individuals with family. The application of WES is a revolutionary progress in Mendelian disease research [
1,
2].
Traditional approaches of disease gene identification
Different strategies, including candidate gene study, and combined linkage study and positional cloning, have been applied in identifying causative genes for over several decades [
3]. These previous approaches need the foundation of a set of disease genes identified by bioinformatic tools, such as gene functional similarity search tool (GFSST)and CANDID, a flexible method for prioritizing candidate genes for complex human traits [
4,
5]. However, most rare Mendelian diseases remain unknown, and a few candidate genes could be validated in affected families. The next to appear was the linkage study, which developed as the main tool to elucidate the genetics of Mendelian disorders. According to the principle with a perfect segregation pattern of the causal variant with the disorder in consanguineous families. Combining linkage study with follow-up positional cloning has found one-half to one-third (~3000) of genes underlying Mendelian disorders [
6]. However, these conventional strategies are not efficient in detecting extremely rare Mendelian genetic disorders or
de novo genes in sporadic cases.
WES technology
With the recent developments in NGS technologies and commercial whole exome-enrichment kits [
7], exome sequencing has been widely used in identifying mutations or genes for Mendelian diseases [
8]. Compared with traditional approaches and genome-wide association study, this technology only sequences approximately 1% of the genome that is in the coding-protein area and forms modest bioinformatic data, whereas almost 85% of mutations for Mendelian diseases are in gene-coding regions [
3]. Moreover, this technology can reduce the time and cost of sequencing and detect the causative mutations directly. With this technology, researchers can use small family samples to find disease-causing variants. This technology also overcomes the large barriers in researching rare Mendelian diseases and opens a new phase in the study of Mendelian diseases. Using targeted capture and massively parallel sequencing to deal with the exomes of 12 humans (eight HapMap individuals and four individuals with Freeman-Sheldon syndrome), researchers found exome sequencing to identify candidate genes for Mendelian diseases with an unrelated, small size of affected individuals in 2009 [
1].
Currently available NGS technologies, such as IlluminaHiSeq™, Life Technologies™ SOLiD4™, and Roche 454 GS FLX, generate hundreds of millions of short sequence reads with an average length of 50 bp to 125 bp. Fragment sequencing has higher sequencing error rates than Sanger sequencing; thus, further validation using Sanger sequencing is essential [
9]. The weaknesses of these technologies are solved with the arrival of third-generation sequencing technologies; they can omit amplification steps in emulsion and bridge processes, and avoid errors in the polymerase-mediated amplification and “dephasing” stages. Helicos Biosciences and Pacific Biosciences revolutionized and provided this sequencing technology [
10,
11]. All neotype sequencers can generate considerable data from detective bases. Statistical strategies committed to measure the average frequency of
de novo variants are used and compared with the common variation found in dbSNP. With the application of the Bonferroni correction, the true variants would be distinguished through the yielded threshold [
2].
Typical samples used for WES
Previous disease-searching methods always need large families to be explored and ignore small families or sporadic cases. WES appears as a personalized medicine strategy for searching disease-causing genes. Consequently, the genetic defects of small Mendelian families or single sporadic case can now be determined [
12].
Design in terms of certain mode of inheritance
Three strategies, namely, overlap-based, linkage-based, and
de novo-based strategies, for sequence sample selection are commonly summed up according to the information of pedigrees [
13]. The overlap-based strategy is mainly used in the absence of genetic heterogeneity. The most potential causative gene can be rapidly decreased to a small number by sequencing in multiple unrelated patients with the same phenotype. The linkage-based strategy is appropriate for the collection of a large family with prominent phenotype. Distantly related individuals share only a small fraction of the genome. Nonphenotypic individuals within the family are also added for exome sequencing. We can rapidly locate the causative variant in the linkage region. The
de novo-based strategy is a highly effective approach for searching causative genes in parent-child trios. This strategy is applicable for sporadic patients. We sequenced the unaffected parents and affected child to detect
de novo mutations that are responsible for rare Mendelian disorders [
12].
Exome capture pipeline
The three main steps of exome sequencing are exome enrichment, DNA high-throughput sequencing, and biological interpretation. Exome enrichment is the basis for exome sequencing. Genomic DNA is randomly sheared and used to construct a shotgun library. The exome regions are enriched by hybridization capture (using oligonucleotide probes to hybridize the fragments of interest). The noncoding DNA sequences are then removed. Agilent, Nimblegen, Illumina, and other companies offer exome-enrichment kits for exome capture [
14,
15]. Exome-enrichment kits not only aim at protein coding regions and splice sites but also focus on the sequences of the pair end of exome, such as exon-flanking regions and predicted miRNA fragments. High-throughput methods can isolate exons more effectively than traditional methods.
Typical analysis pipeline
After sample preparation and sequencing, 20 000 and 50 000 variants are identified per sequenced exome. The disease-causing variants should be focused on the nonsynonymous (NS) variants, splice acceptor-site or donor-site mutations, and insertions/deletions. Variants are absent from the most known variant databases, such as dbSNP, the 1000 Genomes Project, other published studies, or in-house databases. This step can greatly reduce the number of potential candidate mutations by 90% to 95% [
16]. We then selected the variants conformed to the geno-phenotype cosegregation. Further reduced candidate variants can be predicted by ANNOVAR or SIFT. Finally, the predicted damaging variants can be sequenced by the traditional Sanger sequencing to confirm the real causative variant. The detailed analytic flowchart is shown in Fig. 1.
Application of WES in Mendelian diseases
A rapid genetic explorative era has emerged since the release of the first draft of the human genome sequence in 2001. The human genome sequence was completed in 2003. In the following year, next-generation technologies were introduced and a theoretical system was accomplished. The first NGS platform was released in the market in 2005, which laid the foundation for commercial applications. Ng
et al. performed WES in Miller syndrome in 2010 [
2]. They were the first to apply exome sequencing successfully to a rare Mendelian disease. Since then, exome sequencing has been widely used to identify mutations for dominant or recessive diseases and has made great progress.
In January 2013, approximately 150 types of Mendelian disorders have been studied by WES, with coverage of each physiological system (Table 1).
Development of technical research
The extensive commercial use of high-throughput sequencing exome-enrichment kits has intensively decreased the cost of WES and has achieved adequate sequencing depth. Thus, various biological functions and clinical translational research have been respectively determined and conducted using WES. In 2009, Choi
et al. [
167] reported a case with clinical manifestations of dehydration and evaluation of failure to thrive. Such case was diagnosed as Bartter syndrome, but some other diseases could not be excluded. The missense variant in SLC26A3 was identified through exome sequencing analysis. Strom
et al. [
153] performed WES in nine putative Stargardt disease probands and identified seven disease-causing variants across four retinal or macular dystrophy genes. This study was used as basis to make a definite diagnosis in six previously uncharacterized individuals. Joubert syndrome is clinically featured with the “molar tooth sign” on brain images, accompanied by neurological symptoms, including episodic hyperpnea, a heterogeneous disease that is caused by 17 genes. Tsurusaki [
168] applied WES to five JSRD pedigrees and found the causative mutations in all families. These results support the advantages of using WEG in molecular diagnosis.
The disease-causing variants and genes identified by WES can help researchers discover new drug targets. Identifying the genetic bases and mechanisms of Mendelian disorders is necessary to realize accurate diagnosis and treatment of these diseases based on etiology. Thus, medical genomics has become a reality with exome sequencing [
169].
Technical limitation
Many causative genes for Mendelian diseases have been detected by exome sequencing. However, some limitations impede the complete explanation of genetic bases. First, although WES has achieved a great progress in the diagnosis of monogenetic diseases, the human genome is highly variable, and the rare variants outside coding exons can also confer disease. Approximately 85% of such diseases are caused by the coding exons and>15% lie outside the exons, including simple nucleotide variations (SNVs) and copy-number variants (CNVs), in the noncoding, conserved, or regulatory regions. The latter data cannot be assayed with the use of WES [
170]. Second, the shared databases, including dbSNP or HapMap, cannot effectively exclude common and irrelative variants. Exome sequencing usually generates>10 000 variants per sample; thus, classifying variants with different frequencies and refining candidate genes in a small data set are the challenges in diagnosing partial disorders [
171], especially extremely rare disorders. Third, some GC-rich sequences can lead to uneven coverage and incomplete capture because of the inherent defect of WES technology. Studies have revealed that 80% to 90% of the targeted regions are covered by more than 10 × and can also leave 4 Mb to 8 Mb (or 1000 to 2000 genes) without sufficient coverage for variant detection [
172]. Furthermore, studies are mainly restricted by the effectiveness of variant filters and by the segmental duplications that biologically exist in the reference sequence, which can prevent accurate reading. Statistical metrics are necessary to distinguish false positives from true positives [
169].
Conclusions and future directions
The sequencing of individual exome generally contains 20 000 to 25 000 coding variants, of which 9000 to 11 000 variants are NS, which can potentially consist of the genetic factors that cause a Mendelian disease [
170]. An individual with phenotypic manifestations would likely carry 40 to 100 deleterious variants simultaneously. Several variation patterns and constituted haplotypes are belived to exist in individual human genomes that are transferred in “father-to-son” pedigree. Although several diploid individual human genomes have been sequenced, the haplotypes in these genomes remain unclear [
173]. All the variants in affected individuals must be detected, and the haplotype project in human must be continually improved. However, mutant gene varies among different individuals affected with Mendelian diseases. Not only one gene affects the disease onset. However, the mechanism by which different genes trigger the same disease remains unknown. The functional influences of the responsible genes should be determined. The functions of approximately 20 000 genes have not been assigned, and the gene interaction needs to be elucidated [
170].
The success rate of using the present technology to explore the genes responsible for Mendelian disorders ranges from 60% to 80%, which could be constantly improved by an adequate depth of sequencing. With the reduction in sequencing expense, whole-genome sequencing can be used to search all abnormal sequence information and reveal their correlation. This technology is automatable and removes the capture step, thereby reducing the false-positive rate and resulting in the detection of approximately 3.5 million variants per genome, including all genomic variation (SNVs and CNVs). This technology is especially suitable for complex trait gene identification and sporadic cases caused by
de novo SNVs or CNVs [
174].
The components of a medically actionable variant have yet to be identified. Guidelines for the clinical interpretation of exome sequencing are beginning to be established. WES is expected to become the most commonly used tool for searching the causative genes in Mendelian disorders. More variants would be detected in the future, and diagnostic molecular chip would be on market for clinical service applications.
Compliance with ethics guidelines
Xuejun Zhang declares no conflict of interest. This manuscript is a review article and does not involve a research protocol that requires approval by relevant institutional review board or ethics committee.
Higher Education Press and Springer-Verlag Berlin Heidelberg