Introduction
Developments in genomics have been creating new paradigms in biological investigations. Two marching fronts—technical advances in various omics, and functional discoveries and confirmations of individual genes—have moved towards each other, generating new insights and exciting syntheses in biology. One important expansion in the new syntheses is on searching the functional significance of genetic polymorphisms. Although the expansion is still at its infancy, it holds great promise of integrating many branches of biology due to its potential impact on biochemistry, physiology, ecology, and evolutionary biology. Genetic polymorphisms encompass all kinds of variations present among homological DNA sequences, taking the forms of single nucleotide substitution polymorphisms (SNPs), nucleotide deletions or insertions (indels), varying tandem repeats, gene duplications, nucleotide rearrangements, and the presence or absence of transposable elements, etc. The proportion of genomes having these variations may not be large, but much of the functional diversity and adaptation may well be founded on them. This review will summarize some of the advances made in recent years.
Prevalent polymorphisms and functional consequences
Genetic polymorphisms are a common feature of genomes. Over one million SNPs were found among the 20 accessions of the
Arabidopsis thaliana genome (
Clark et al., 2007), showing a level of natural polymorphisms that is likely not the extreme on the natural polymorphic spectrum of a species. Also in
A. thaliana, a cDNA survey of single-feature-polymorphism (SFP) exhibits an uneven distribution with a significant elevation of SFP around centromeres and the heterochromatic knob region (
Borevitz et al., 2007). At individual loci, genetic polymorphisms span all kinds of systems but are particularly prevalent in various recognition systems (
Hedrick, 2006;
Clark et al., 2007;
Emerson et al., 2008). For instance, the extreme allelic polymorphism has been seen at the self-incompatibility locus (the
S locus) in plants (
Richman, 2000;
Lu, 2002) and the major historical compatibility loci (MHC) in animals (
Klein et al., 2007;
Solberg et al., 2008). Although these extreme polymorphisms have been typically ascribed to balancing selection, additional influencing factors have been implicated by empirical data. For instance, in two self-incompatible plants,
Solanum carolinense and
Physalis longifolia, the former species maintains fewer and more divergent alleles at the
S locus than the latter (
Richman et al., 1995;
Lu, 2002). It was shown that in addition to the balancing selection, seed dispersal mechanisms (
Lu, 2006) and an ancient bottleneck (
Paape et al., 2008) may further create variations in the allele number and allelic divergence between species. The functional significance of polymorphisms has been explored more with the MHC system in animals, particularly in humans (
Milinski, 2006;
Traherne, 2008). Heterozygous or specific MHC class II alleles may increase pathogen resistance (reviewed by
Bernatchez and Landry, 2003). Most likely surviving three-spined stickleback fish during heat stress are those with a moderate number of MHC class II alleles (
Wegner et al., 2008), implying that selective processes additional to balancing selection may modify genetic polymorphism at the recognition loci.
Despite various mechanisms related to genetic polymorphisms, data (
Smith and Eyre-Walker, 2002) also suggest that some polymorphisms may have experienced temporal windows to maintain their neutral status after emergences. The functional significance of these tentatively neutral polymorphisms may be exploited under certain circumstances to benefit their carriers, hence increasing their chances of being fixed in the species. This kind of scenario (Fig. 1) might explain the following cases.
The DNA repair protein O
6-methylguanine DNA-methyltransferase (MGMT) has polymorphic variants of at least nine amino-acid substitutions in human populations, but the effects of some of the polymorphisms (e.g., Leu84Phe or Ile143Val) are subtle (
Bugni et al., 2007). Nonetheless, others of the MGMT polymorphisms (Ile143Val and Lys178Arg) may differ at the expression level (
Margison et al., 2005) and, further, patients carrying Lys178Arg polymorphism show a reduced risk for lung cancer (
Crosbie et al., 2008), suggesting that individuals carrying different MGMT polymorphisms develop cancers at different rates. The data highlighted the point that not all polymorphisms have the same functional effect, and some may maintain more or less a neutral status.
Besides the expression level, the spatial expression pattern may also be altered by polymorphisms. During photosynthesis, the ultimate CO
2 fixation takes place in the mesophyll cells of C3 plants, while C4 plants first capture CO
2 in the mesophyll cells and then fix it at a high concentration in the bundle sheath cells to improve the photosynthetic rate (Fig. 2). This divergence in CO
2 fixation is assisted by the different gene expressions between mesophyll and bundle sheath cells (
Nelson and Dengler, 1992).
The mesophyll-specific expression of phosphoenolpyruvate carboxylase is determined by the gene’s promoter module, mesophyll expression module 1 (MEM1), in
Flaveria. It has been shown that exchanges of the two MEM1 polymorphisms between C4 plant
F.
trinervia and C3 plant
F. pringlei were necessary and sufficient for switching the photosynthetic phenotypes between the species (
Akyildiz et al., 2007). This suggests that at least one of the two polymorphisms—an A-to-G substitution and a 4-nucleotide insertion—might not cause functional changes until the next polymorphism occurred.
Single-locus DNA polymorphisms are also known to cause about 200 diseases in common livestock (
Ibeagha-Awemu et al., 2008), providing numerous cases of how genetic polymorphism affects cellular functions to varying degrees. Since functional assessments of genetic polymorphisms add an important dimension to the understanding of the natural distribution of genetic polymorphisms, more cases are expected to be documented in the future, providing a clearer view of how genetic polymorphisms fare in various molecular systems.
Pathway gene polymorphisms and patterns
Polymorphisms in the context of pathways or networks are more revealing. Like the internet transmitting digital information across the web, pathways carry substance flows within and among molecular systems encoded by genome. A metabolic pathway is, in a broad sense, a series of steps involving the generating and breaking-down of substances needed for cellular activities, including those for signal transduction and regulatory circuits. In a narrow sense, metabolic pathways represent those involved in primary and secondary metabolism, providing basic flows of substances such as sugars, amino acids, lipids, and mesmerizing arrays of secondary compounds. The regulation of substance flow is intuitively a major task for genome to cope with environmental fluctuations as seen in
Pyrococcus furiosus (
Trauger et al., 2008).
Polymorphisms of multiple genes in the glycolytic pathway have been well-known in
Drosophila malanogster (
Eanes, 1999;
Flowers et al., 2007), and some of the enzyme activities are correlated across the pathway (
Pecsenye et al., 2004). Also in
Drosophila, RNAi pathway genes (
R2D2,
Ago2, and
Dcr2) participating in antiviral function displayed high levels of genetic polymorphism (
Obbard et al., 2006). Polymorphisms across pathway genes are also present in plants (
Hanson et al., 1996;
Aguade, 2001;
Whitt et al., 2002;
Rausher et al., 2008). Still, relevant data are insufficient to draw a general conclusion. We recently surveyed all major genes of the anthocyanin pathway in
Ipomoea purpurea and found polymorphisms across all of the ten loci, including the regulatory and structural genes of the pathway. Further analysis of the functional significance of the polymorphisms is underway. Until now, how combinations of multiple polymorphisms affect the carriers remains obscure.
As in the cases of SNPs, human systems have provided better-studied examples so far for understanding the functionality of pathway polymorphisms. Insulin-like growth factor-I (IGF-I) is a polypeptide hormone promoting cellular growth and normal development. The insulin/IGF-I signal response pathway may influence life longevity (
Tatar et al., 2003). In humans, polymorphisms of the insulin-like growth factor type 1 receptor (IGF-IR) and phosphoinositide 3-kinase genes (PI3KCB) affect the plasma levels of IGF-I (
Bonafè et al., 2003). The analysis of a G-to-A transition in the IGF-IR coding region and a T-to-C transition in the promoter region of PI3KCB showed that individuals bearing at least one A allele at the IGF-IR locus (IGF-IR A+) had lower plasma IGF-I levels than the rest of the population as seen in people over 85 years of age. Furthermore, the proportion of IGF-IR/PI3KCB-A+/T+ carriers is significantly higher among long-lived people (
Franceschi et al., 2005). Women of IGF-IR A+ homozygote have about half the risk of getting low bone mineral density after menopause (
Lee et al., 2008). Knowing the functional consequences of human genetic polymorphisms may help develop unprecedented personal health care that will revolutionize modern medicinal practices and eventually benefit human beings.
Investigations of genetic diversity in pathway genes may lay an important foundation for revealing and understanding how substance flows are directed and redirected in cells under various circumstances, particularly during the evolutionary processes of species, to respond to the ever-changing environment. Depending on the context of a pathway, genetic diversity among genes on the pathway may be modified by selection and constraints of various intensities. For instance, the genetic diversity of the starch pathway of maize may have been significantly reduced by human selection (
Whitt et al., 2002). As changes in one pathway may affect related pathways in the same genome and cause perturbations across transcriptome and metabolome (
Dauwe et al., 2007), pathway analysis may link different domains of genome and allow interrogations of genomic regulations in response to exterior environmental changes.
Genome-wide polymorphisms via omics approaches
The rapid developments in omics approaches appear to hold great potential for biological discoveries so far. With liquid chromatography-time of flight mass spectrometry (LC-QTOF MS), extensive variations in metabolites have been found in the seedlings of
A. thaliana accessions, and about 75% of the detected mass peaks could be assigned by genomic loci (
Keurentjes et al., 2006). The genome-wide approach may help to uncover unknown pathways and potential regulators of the pathways by correlation analysis, as exemplified for the aliphatic glucosinolate formation. A genome-wide approach for a specific tissue or organ may also clarify the relationships between metabolites and phenotypic traits such as yield and harvest index as shown for tomato (
Schauer et al., 2006).
The potential of combining genomic data with those of metabolomics (
Fiehn, 2002) is only about to emerge because a great number of cases not resolvable with the traditional methods can now be tackled with the omics approaches. For example, the impact of transgenic crops on human health can be evaluated
via these approaches. Elevated metabolites with known health hazards may then be singled out to ensure food safety. Food nutrition may also be monitored with detailed analyses and high accuracy (
Wishart et al., 2008). Many omics approaches have the hope of answering questions on biological developments (
Hennig, 2007) after sampling and inherent problems are corrected (
Lay et al., 2006). More standard statistical techniques need to be developed to ameliorate errors and improve compatibility across different types of data.
Three major categories of techniques have been in use to identify sequence polymorphisms. One category is sequencing-based techniques including both Sanger sequencing and pyrosequencing (
Ronaghi et al., 1996;
Hudson, 2008;
Morozova and Marra, 2008). The second category is based on microarray techniques that use oligonucleotide labeling and various tiling strategies (
Mockler and Ecker, 2005;
Miller et al., 2007;
Gregory et al., 2008) to get the whole genome arrays (WGAs). For instance,
Zhang et al. (2008) combined approaches of WGAs and enzyme methylome to detect polymorphic GC methylation sites across the
Arabidopsis genome and documented a strong relationship between methylation and enzyme effect. The third category involves matrix-assisted laser-desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS), which directly measures the physical masses of DNA molecules under size 100 bp but with a high throughput capacity (
Corona and Toffoli, 2004;
Ragoussis et al., 2006). The effort put into cost reduction and efficiency improvement of the sequencing techniques will harvest the greatest return when genome-level polymorphisms can be routinely monitored at individual levels.
Outlook
The rapid developments in high-throughput techniques and data-generating abilities have been challenging biologists, statisticians, and computer scientists to create new paradigms to present and interpret their investigations. Collaborations across disciplines and large research centers will continue to dominate the front scene of biological explorations on genetic polymorphisms and underpin some major discoveries in the future. Individual laboratories may be better positioned in developing insights on small-scale systems that are manageable for one group while waiting for more portable and accessible equipments to become available in the future. Despite various challenges facing scientists, we are currently at the stage of asking more fundamental questions than ever. Newly obtained details, for instance, allow us to explore the meaning of character evolution, while the wholeness of genome makes one wonder whether genome has been evolving merely as a collection of integrated characters (systems) or more than the sum. Knowing the complexity, regularity, and evolution of genome requires a balanced analysis of local and global patterns of a genome. Systems biology, hopefully, will bring in new developments for a deeper understanding of how living beings act and evolve in the ever-changing environment. Such knowledge will potentially promote the welfare of the ecosystems as well as human societies.
Higher Education Press and Springer-Verlag Berlin Heidelberg