HISTORY OF NGS AND ITS APPLICATION IN TRANSCRIPTOME STUDY
First generation versus second generation sequencing technologies
Debuted in 1977, the Sanger method [
1] and hereafter capillary-based automated sequencing technology [
2]represent the first generation sequencing technology that contribute to a series of break-through discoveries, including the completion of the human genome project [
3]. Despite its many successes, the impact of the first generation technology is limited by its low throughput, high expense, and sample requirements. To overcome its limitation, a number of so called “second generation sequencing” or “next generation sequencing” (NGS) technologies emerged a decade ago, which employed the massive parallel sequencing scheme. Compared to the first generation sequencing, NGS technologies are characterized with high-throughput but short sequence reads. As the traditional Sanger method can obtain sequence reads of 800–1000 bp, the read length of NGS technologies typically varies from 35 bp (SOLiD) to 700 bp (Roche 454), depending on the platform used [
4–
6]. The read length of sequences would have serious impact on the reconstruction of transcriptome from RNA-seq data. The Roche 454 platform uses the emulsion PCR for DNA fragments isolation and amplification, and pyrophosphate-based single-nucleotide addition sequencing method on a micro-fabricated array of picoliter-scale wells [
4]. Initially, its average read length was 108 bases [
4], yet increased gradually to more than 300 bp [
7,
8]. In the Illumina/solexa platform, the template amplicon is achieved through bridge PCR, followed by four-color cyclic reversible termination steps in sequencing and imaging process [
5]. The read length of Illumina/solexa platform is usually 35–150bp, shorter compared to 454 platform. The reversible terminator chemistry enabled the Illumina/solexa platform with a higher throughput and lower cost [
5,
7,
9]. The LifeTech SOLiD platform takes use of DNA ligase, rather than polymerase, to drive the sequencing by synthesis [
6]. These three platforms are the most popular and commercially available technologies, which are widely used in genomics and transcriptomics studies because of their advantage in lower cost and higher throughput compared to the first generation sequencing [
4,
5,
7,
9,
10].
Second generation sequencing versus other technologies in transcriptomics study
In multi-cellular organisms, the differentiation of cell types and their functions are defined by the constitute and quantity of transcripts, so called transcriptome, inherited from identical genetic make-up. Understanding the transcriptome is essential for interpreting the function and regulation of genes that offer insights into the mechanism of development and diseases [
11–
15]. The missions of transcriptomics studies are: (i) reconstructing/assembling all transcripts, including mRNA and noncoding RNA [
16–
18], small RNA [
19], etc; (ii) identifying transcript structures, e.g., transcript start/end sites [
15], exon-intron structure [
20], alternative splicing [
21–
23], etc; and (iii) quantifying the expression levels of transcripts under certain biological conditions, e.g., development, and stress [
24].
Before the development of NGS and its application in transcriptome study (RNA-seq) technologies, a variety of technologies have been used to study transcriptomics, mostly based on the Sanger sequencing or the hybridization technology (Table 1). These traditional methods were often designed for a specific aspect of transcriptome with severe limitations. The advantage of high-throughput and low-cost offered by RNA-seq technology makes it feasible to fully assess the transcriptome of organisms, with or without their genome sequences.
The expressed sequence tags, or EST, which were derived from cDNA libraries, had been proven to be useful in applications of expressed gene identification [
25,
26], and gene structure determination [
27,
28]. Because EST, which relies on the Sanger sequencing, is generally low throughput and costly in production, it can rarely be used to quantitative transcripts or discriminate gene expression between tissues or developmental samples. Notably, tag-based methods were developed for respective usage. Serial analysis of gene expression (SAGE) method [
29], which counts sequence tags flanking the restriction sites of endonucleases to quantify gene expression, has been used in study of cancers, and other human diseases [
30–
32]. Its shortage was also noted to analyze changes in the regulatory regions of the transcripts. Another tag-based approach, cap analysis of gene expression (CAGE), captures and counts the 5´-cap region of full-length cDNA [
33], and use similar protocol to quantify gene expression by sequencing, in analogy to the SAGE method [
33,
34]. Although CAGE has advantage in both gene expression analysis and transcription start site identification [
35,
36], it also failed to reveal variation in gene transcripts. Massively parallel signature sequencing (MPSS) was developed as another approach for gene expression quantification, which isolates template fragments through digestion with type IIs restriction endonuclease and determines their sequence after fixing them on microbead arrays by ligation [
37]. Although the tag-based approaches are relatively high-throughput compared to EST technology, their disadvantages pertaining to read length (less than 20 bp), cost, and dependence on restriction endonuclease recognition sites, are also obvious [
34,
38,
39]. The hybridization-based microarray technology has been used for years to analyze gene expression using gene probes fixed on glass or silicon chip surface [
40]. Although microarray technology is highly advantageous in throughput, the requirements for gene probes and predetermined gene sequences often limit its application in model organisms. Its high background noise and limited expression dynamic range are also factors restricting its usage in transcriptomics study [
39,
41].
The NGS technology has revolutionized the field of transcriptomics study. In contrast to the traditional methods, RNA-seq provides a comprehensive solution to transcriptomics study in full spectrum. Firstly, because RNA-seq does not rely on the existing gene annotation, it can catalog more transcripts. Not only has RNA-seq been used to identify novel genes such as lincRNA [
42], it has also enabled more isoforms (alternative splicing) discovered. For instance, more than 94% genes in human and 61% genes in
Arabidopsis are found to undergo alternative splicing by RNA-seq study, compared to 35% and 3% through previous methods [
20,
43–
45]. The strand-specific RNA-seq technique offers a unique approach to study and distinguish sense and antisense transcripts [
46], one of novel aspects of transcriptome discovered in recent years [
47]. Secondly, RNA-seq has been applied in definition of gene structures, including transcriptional start sites [
48], regulatory elements [
49] and polyadenylation [
50]. Thirdly, compared to the limited magnitude in gene expression change that can be detected by microarray, the dynamic range of gene expression can be analyzed by RNA-seq is unprecedented [
51]. Moreover, RNA-seq are used to characterize the single nucleotide variation (SNV) [
52,
53] and RNA-editing activity in gene transcripts [
54,
55]. Notably, in regard to the non-model organisms that lack genome sequences, the RNA-seq technology is an extremely valuable tool. There are an increasing number of non-model species that have undergone transcriptomics study before their genome sequences were determined, especially for those crops of polyploidy [
13,
56,
57].
RNA-seq has been applied in transcriptomics studies in many plant and animal species. It leads to new discoveries in alternative splicing and gene structure in model organisms, including human [
51,
58],
Drosophila [
11],
Arabidopsis [
44], and rice [
59], etc. It proves to be a powerful tool in study of unusual
trans-splicing genes [
21,
60–
62]. Long noncoding RNA [
63], new ORFs within UTR [
15], antisense transcripts [
47], and gene fusion [
64] are some of the new fronts attributed to RNA-seq. Currently, the submitted RNA-seq data to public databases are exponentially increased each year [
65]. There are more than 60 plants [
66] and a large number of animals, including insects [
67–
69], fishes [
42], birds [
22], mammals [
16,
70] that are subjected to
de novo transcriptome study by RNA-seq, yielding insights into the mechanism of development and gene regulation.
De novo assembly of transcriptome is a major challenge in using RNA-seq technology for transcriptomics study. Another strategy, the reference-based assembly, which relies on a reference genome to first align all the reads to the genome and then cluster those overlapping reads into transcripts (such as Cufflinks [
71]), will be covered in details in a separate chapter. In this review, we focus on the development of algorithms and computation details in the
de novo assembly of transcriptome.
De novo ASSEMBLY OF TRANSCRIPTOME
Application of De Bruijn graph in de novo assembly of short-sequence reads
The use of
De Bruijn graph (DBG) in assembly from short-sequence reads was first applied in EULER assembler [
72]. Different from the overlap-layout-consensus approach [
73], sequencing data are dissected into
k-mers (words of
k nucleotides) and organized into graph, consisting of paths. Paths are formed from
k-mers in a certain order. Utilities that assembly from short-sequence reads have only been developed more recently [
74–
80] after the emerging of next-generation sequencing technologies. These DBG assemblers often consist of several programs that perform error correction, merging sequences, path building, repeat resolution, paths separation, and scaffolding with paired-end/mate-pair reads. The major
de novo transcriptome assemblers and softwares are summarized in Table 2.
The challenge of de novo transcriptome assembly
Reconstruction of the full transcriptome by
de novo assembly from sequence reads, will help catalog expressed genes, identify splicing isoforms, and capture the expression detail of all transcripts. Without reference genomes, the
de novo assembly approach is considered to be more difficult than the assembly of
de novo genome using short sequence reads [
86]. In comparison to
de novo genomics assembly,
De novo transcriptome assembly faces many unique challenges [
87]. Among them, transcripts are expressed from low, medium to high level, which can cover a dynamic range of gene expression in several orders of magnitude [
78]. In plants, the range of gene expression in leaves was reported to spans more than five orders of magnitude [
88,
89]. The NGS platforms have associated biases, i.e., sequence reads redundancy, error rates tendency, etc., which can further skew the transcript data [
90]. Transcript isoforms due to alternative splicing, which is only pertaining to gene transcripts from eukaryotes, is another critical issue that
de novo assembly has to address [
81]. Forming contigs from isolating paths can be hindered by sequence repeats and nucleotide variations, and also by alternatively spliced isoforms. Artifacts introduced during reverse transcription is proven to be another serious concern, as the noise unique to RNA-seq experimental process is not encountered in genome study. Taken together, these accompany factors grossly compound the difficulty in
de novo transcriptome assembly.
Strategies for preprocessing and filtering sequence reads
High-throughput sequencing data are often contaminated with artificial elements, generated from library construction and/or PCR amplification. Thus, preprocessing of sequence reads to remove artifacts from RNA-seq data sets before assembly is essential to improve computational efficiency and assure the accuracy of assemblies. For RNA-seq data, this step targets mainly four types of sequences: adaptors [
91,
92], low-complexity reads [
91], PCR products of non-biological origin [
93], and rRNA. Tools recommended for the tasks include hardwired solution [
94] to programmed tools [
91,
92,
95].
The high-throughput sequencing (NGS) data are found to contain sequence errors pertaining to the new technologies. For example, the Illumina platform has, in general, an error rate between 1 and 3%, with overwhelmingly substitution errors [
96]. They are distributed non-randomly with the error rate increasing from 5´- to 3´-end [
97]. Other NGS technologies display similar characteristics [
90]. Sequencing errors are serious compounding factor affecting the performance of
De Bruijn graph, increasing its size and complexity, and demanding extended memory space and processing powers of computers [
98,
99]. There is added difficulty to separate sequence errors from genuine variations for RNA-seq data, due to the complication of post-transcription processing of RNA. Thus, sequencing error correction becomes essential in preparing transcriptome assembly.
Removal of sequence of low quality score is a common practice in filtering sequencing reads (a useful review by Yang
et al. [
96]). Two approaches are typically employed. Low-quality scored nucleotides are trimmed off at either one or both ends of the sequencing reads. Alternatively, average quality score is computed for a sliding window of fixed number of bases, and sequence regions with score below certain threshold are removed. Many tools have been developed for such tasks, namely FASTX TOOLKIT [
100], Sickle [
101], FastQC [
102], TRIMMOMATIC [
103], BIO-PIECES [
104], and UrQt [
105]. Although aggressive quality filtering is often employed to ensure the quality of data to be used in follow-up analysis, sometimes this can result in discarding a substantial portion of sequence data, thus may disproportionately affecting some transcripts with biased nucleotide content or lower expression level [
106,
107]. To determine the optimal approach in filtering sequence reads, especially for RNA-seq data, MacManes and Eisen performed some carefully designed study [
108,
109]. Their results indicated that significant improvement on assembly accuracy was achieved by applying the error correction process [
109]. However, they noted stringent trimming of nucleotides with quality scores≤20 produced poorer transcriptome assembly, measured with several different metrics [
108]. Thus, researchers interested in
de novo transcriptome assembly are advised to use more gentle quality trimming scheme, or no trimming to achieve the most favorable results [
108,
110].
The efficiency of different K-mer length
The
K-mer length is a critical parameter in assembly using
De Bruijn graph, even more so for transcriptome assembly. The assembly quality of a
De Bruijn graph is highly variable depending on the
k-mer length, which defines sequence overlap between two contiguous reads. For genomics assembly, there is generally uniform reads coverage across the genome, so the optimal
k-mer length is determined as a function related to sequencing depth [
111]. However, for transcriptomics assembly, the system is complicated with the complexity of an organism’s gene contents, the highly variable gene expression levels, and multiple isoforms from alternative splicing of the same gene, in addition to the variables such as sequence depth, error rate, etc [
78,
83]. On top of these, the widely exist isoforms of transcribed genes in animals and plants [
112–
114] prevent the use of coverage depth for resolving repeated motifs. Hence, the optimal
k-mer length in transcriptome assembly is affected by a lot of more factors [
79]. In practice the
k-mer value for transcriptome assembly is sometimes determined based on the particular study. When a more contiguous assembly is the primary goal and the loss of lowly expressed transcripts is not a concern, a large
k-mer length is preferred. On the other hand, small
k-mer length is often used to capture poorly expressed transcripts, resulting in more fragmented and diverse transcripts. Such theoretical scheme and interdependency of variables in transcriptome assembly were tested and best defined in the experimental or simulated studies using model organisms [
87,
115]. In common practice, the length of
k-mer is arbitrarily decided to use an intermediate value, often as the result of a compromise between the conflicting goals of transcript diversity and transcript contiguity. Gruenheit
et al. noted the close correlation between the
k-mer size and the coverage depth cutoff of a transcriptome assembly, and that both parameters need to be optimized in a balanced approach for the best outcome [
116]. Zhao
et al. extensively analyzed the efficiency of
k-mer size from small to large in capturing transcripts at different expression quantiles [
87]. Interestingly, they have shown that with the same
k-mer length, the efficiency for capturing transcripts at different expression quantiles varied greatly. In the single
k-mer settings, when measuring with percentage of full-length transcript constructed, Trinity [
71] performed well cross the full spectrum of gene expression levels, whereas SOAPdenovo had the worst outcome in both low and high expression quantiles. The outstanding performance of Trinity, is due in part to its implementation of an enumeration algorithm after construction of
De Bruijn graph from RNA-seq data. The algorithm scores all possible paths and branches, recovers paths supported by actual reads and removes ambiguous/erroneous edges, so to retain those plausible ones. Its broad applicability was demonstrated in recovering full-length transcripts and isoforms in yeast, mouse, whitefly, and other non-model organisms [
71,
117]. From the above analysis [
87,
116], it was suggested that an extensive pre-testing and evaluation of
k-mer length for different transcriptome assemblers is needed for each individual case to determine an appropriate
k-mer size. Even so there is no guarantee that an optimal outcome can be achieved for a given organism. People are advised to use an approach of multiple
k-mer size, which will be the focus of next session. D
e novo transcriptome assembly taking advantage of multiple
k-mer length is highly desirable.
The multi-k-mer strategy balancing efficiency and sensitivity
The strategy to use multiple
k-mers of different length was initially proposed by Robertson
et al. [
83], and later by Surget-Groba and Montoya-Burgos [
81]. The principle of the multi-
k-mer approach is to assemble transcriptome with various
k-mer lengths at first. Then the outputs from the first step are merged to form a final assembly. Using the testing RNA-seq data set of
A. aegypti [
115], the assembly with single
k-mer (=21) gave a good compromise between the number of contigs and their average length (measured with N50 value), which was determined by comparing to the reference transcriptome from the work of Gibbons
et al. [
115]. Impressively, the final assembly from multi-
k-mer approach vastly outperformed each single-
k-mer assembly, marked by substantial improvement in contiguity and increased number of contigs with length over 100bp [
81]. The authors also noted the multi-
k-mer result achieved the highest coverage of the reference transcriptome, in which the base coverage of reference transcripts was doubled compared to the single
k-mer assemblies.
The multi-
k-mer approach has been applied to a number of different assemblers, including Trans-Abyss [
83], SOAPdenovo-MK [
87], and Oases-MK [
87]. Robertson
et al. observed that transcripts with lower or higher read depth were represented more effectively with smaller or larger
k-mer values, respectively. They concluded that assembly across a range of
k-mer values may be essential to recover transcripts with very different expression levels [
83]. Their multi-
k-mer version of the Abyss assembler, Trans-ABySS reported the numbers of transcripts were comparable to that produced by Cufflinks [
118] that uses the output of the read aligner TopHat [
119] to reconstruct transcripts. They believed that
de novo assembly using multi-
k-mer approach offered a sensitive and effective method to address the issues of variable expression levels and multiple transcript isoforms. They noted that for genes with contig-to-exon coverage ratio≥0.8, Trans-ABySS and ALEXA-seq [
120] (a tool for expression analysis by sequencing) had well correlated expression estimates (Pearson’s correlation coefficient= 0.921) [
83]. In order to evaluate the efficiency and performance of single-
k-mer verse multi-
k-mer conditions, Zhao
et al. built a matrix of performance, including the number of transcripts>100 bp, N50 value, total number of transcripts, total transcript length, and number of full length transcripts captured at different expression quantiles [
87]. Using the RNA-seq data sets from yeast and
Drosophila, they observed, for all tested assemblers, the multi-
k-mer method had a significant improvement in the full range of coverage depth over their single
k-mer peer. This holds true for both
S. pombe (~6,000 genes) and
D. melanogaster (more complex; ~30,000 genes) transcriptomes. Impressively, their work illustrated the efficiency of assembly in capturing transcripts in full spectrum of expression quantiles [
87]. They showed that transcripts at both ends (low and high expression quantiles) were not efficiently recovered by any single
k-mer length. The major improvement by multi-
k-mer approach over their single
k-mer peers was observed in the high-quantiles range, but less significant in the low quantiles [
87]. However, the multi-
k-mer method is hindered by its complication in computation. Melicher
et al. used cloud computing to build a bioinformatics pipeline to assemble and analyze the transcriptome with off-site data management and processing [
121]. They employed Velvet-Oases (using various
k-mer length) or Trinity (
k-mer=21) for the initial assembly and performed a secondary assembly with CAP3. By reconstructing transcriptomes from three non-model organisms, they demonstrated that their pipeline and the multi-
k-mer method can be used broadly to assemble higher quality transcriptomes than any single
k-mer approach [
121].
The coverage depth of RNA-seq reads
While genomic sequencing coverage is generally uniform across the genome, transcriptomics sequencing coverage is highly variable that prevents the use of coverage information from resolving repeated motifs [
78]. Zhao
et al. showed that with increasing coverage depth, generally a larger number of transcripts and more total bases were assembled. However, the transcripts’ mean length and N50, after an initial increase, peaked at a threshold and started to decrease [
87]. On the other hand, the percentage of RMBT (reads mapped back to assembled transcripts) had a pattern reversely correlated to increasing coverage depth for all assemblers except Trinity. The percentage of RMBT is an important benchmark for evaluating the performance of an assembler. An optimal program should use as many reads as possible to reconstruct high-quality transcriptome. Trinity reached almost 90% in RMBT, which may be attributed to its greedy
k-mer based approach at the Inchworm step. Oases-MK came in second for this measure. Given the lower value in RMBT, the performance of SOAPdenovo was not satisfactory [
87]. The peak of the mean transcript length and N50 seems to be correlated to the complexity of genome for a species. Similar pattern was observed with the number of constructed full-length transcripts. A peak was reached after initial increase with the increase of coverage depth, before the number of full-length transcripts started to drop. The turning points appeared to be related to the complexity of the genome, for which it was 3G (sequencing data) for fruit fly, and between 1 and 3G for fission yeast [
87]. Others found the quantitative difference in the assemblies using whole-animals RNA-seq data versus tissues data [
122]. In assemblies from whole-animal data, increasing reads led to rapid increase of short transcripts and discovery of conserved genes. But single-tissue assemblies showed a slower discovery of conserved genes but often with longer transcripts. Additional study showed, in the mouse assemblies, more reads also led to more frequent assembly errors which must be mitigated using more stringent parameters [
122]. Gruenheit
et al. noted that
k-mer size and read coverage depth are interacting factors that need to be considered simultaneously [
116]. Their analysis showed that varying
k-mer length with the coverage cutoff had a significant impact on the success of gene assemblies, and both parameters,
k-mer length and reads coverage cut-off, need to be optimized together for the best outcomes.
Other considerations and future direction
De novo transcriptome assembly facilitates the study of organisms whose genome sequences are not available. However, such tasks also create new challenges for accurately assessing the quality of an assembly. Commonly, many parameters used in genomic assembly are referred in transcriptome assemblies, such as median contig length, number of contigs, and N50 [
123,
124]. However, these measures were proven insufficient and unreliable [
125,
126]. With available reference genome, the reference-based approach is helpful to estimate the accuracy and completeness of an assembly. By comparing assembled transcripts to a reference transcript set, the fraction of assemblies matching a reference, the fraction of reference being matched, and the fraction of assemblies containing complete CDS can be estimated with high confidence [
87,
123,
125,
127]. More recently, methods for evaluating
de novo transcriptome assembly not relying on reference genome were developed [
128,
129]. They, instead, use a probabilistic model to assess an assembly and its underlining sequencing read data. Although these statistics–model based methods were powerful tool and showed to accurately reflect assembly quality in many tested cases, care must be taken as discrepancy was also noted when compared to traditional measures or reference-based approach.
The recent advances in
de novo transcriptome assembly have enabled the expansion of RNA-seq studies to many organisms, with or without high-quality reference genome available. In light of such broad application of RNA-seq technology, there are other factors warranting consideration. While it is often critical in assembly of large genome, resources usage for transcriptome assembly bears some equal importance for practical reason. Zhao
et al. recorded the dramatic difference in performance among Oases, Trinity, ABySS, and SOAPdenovo when the same
Drosophila RNA-seq data set was used for
de novo assembly [
87]. They noted that Oases was the most sensitive, and ABySS the least sensitive in response to increasing data size, although generally memory usage displayed a good correlation with data size. The
k-mer length also had a great impact on both memory usage and runtime. While runtimes for ABySS, Oases, and SOAPdenovo were reversely correlated with the
k-mer length, memory usage remained almost constant for SOAPdenovo and ABySS, except for Oases whose memory usage had a reverse correlation with
k-mer length. They also found that processing of a large data set by Trinity can exceed reasonable execution time and hence becomes impractical [
87]. Trinity was initially built for reconstruction of full-length transcripts with maximum sensitivity [
71]. Its efficiency was later improved by halving memory requirements and increasing processing speed via parallelization [
130]. Currently, its newer release was recommended to have ~1 GB of memory per 1 million paired-end reads. A common multi-core server with 256 GB to 1 TB of memory would be sufficient for a set-up at departmental core facility [
117].Recently, new
de novo transcriptome assemblers were developed. For instance, SOAPdenovo-Trans [
131] took the advantage of the error-removal model from Trinity [
85] and the robust heuristic graph traversal method from Oases [
84]. Bridger [
132]incorporated the key ideas of
de novo assembler Trinity [
85]and reference-based assembler Cufflinks [
118]to construct the splicing graphs and the full-length transcripts.
Requirements for computation resources by assembling large transcriptome data sets can be mitigated with high-performance cluster computing. To take use of high-performance computing with thousands of CPU cores, many transcriptome assemblers, like Trinity [
117], Oases [
84], Trans-ABySS [
83], Rnnotator [
93], and SOAPdenovo [
133], employ parallel computing methods of different levels. More recently, cloud computing (refer to review [
134]) becomes increasingly popular with the bioinformatics community, as resources are rented as a service per a user’s need. The Hadoop-Based project in developing MapReduce programming paradigm is underway as a community effort, attributing to the effectiveness of MapReduce in parallelization of bioinformatics algorithms, particularly those as the leading application in the area of NGS data analysis [
135]. To solve the large transcriptome assembly problem, a scalable cloud-based solution is deemed to be the destination to meet future computation needs.
Transcriptome analysis has seen the transition from microarray technology to high-throughput NGS technologies. The RNA-seq approach provides transcriptome profiling and analysis as a “comprehensive” solution that is superior to other methods we have mentioned in the introductory section. Meanwhile, as the RNA-seq technology and experimental protocols continue to evolve, we foresee the emerging of new challenges for bioinformaticians. Many groups and commercial vendors are currently developing different flavors of the third generation sequencing technology [
136,
137], which are characterized with longer reads, single molecule, or realtime data. For example, RNA-seq reads from PacBio [
138] have much longer reads (several kilobases) to enable it to sequence a single transcript to its full length, but are accompanied with high error rates (~15%). The PacBio’s long reads are more advantageous when the error rate issue is mitigated with Circular Consensus Sequencing mode [
139]. PacBio technology has been applied in updating genome sequence [
140] and in evaluating the assembly efficiency of
de novo assembly [
141]. In such scenario, bioinformatics tools resolving sequence errors by combining second and third generation sequencing data would become most valuable in transcriptomics profiling and analysis.
Higher Education Press and Springer-Verlag Berlin Heidelberg