INTRODUCTION
In the last 3–4 million years, brain volume within the human lineage has increased from less than 400 cm
3 to 1,400 cm
3, which sets us apart from other primates. Some evidence suggests that the human brain is still adaptively evolving [
1], and that positive selection may also play an essential role in the development of speech [
2], and in the enabling of adults to digest the lactose contained in milk [
3–
6]. Moreover, understanding how and why selection acts can help us to reveal the mechanisms that underlie diseases and microbial resistance [
7]. Thus, there is an increasing interest in the articles detecting recent positive Darwinian selection or adaptation. This increasing interest can be seen in the increasing number of citations of three key methodological papers [
8–
10] (Figure 1).
Positive selection is an evolutionary process that increases the frequency of mutations that confer a fitness advantage to individuals carrying those mutations. These selective events leave their footprints in the genomes of living organisms. Specifically, recent positive selection can reduce genetic diversity [
11–
15] and alter the pattern of intraspecific polymorphisms [
8,
10,
16–
18] of a neutral locus that is linked to the selected locus. These features can be used to detect positive selection. However, because demography and the population structure can cause the same patterns in DNA sequence variations, inferred selective events are often questionable, especially when there is no follow-up using a functional analysis to ensure the results [
19]. It is a challenging and severe problem. After the integration of the candidates for recent positive selection obtained from nine genome-wide scans in humans, Joshua Akey (2009) found that only 14% of loci (722/5110) are identified in two or more studies. We should be highly aware that the confounding effect of demography could be one of the major reasons for 86% of the unique candidates. This has consequences. Because the false positive rate is high, a considerably large amount of resources will be wasted in the following functional analysis. This may prevent us from thoroughly understanding the mechanisms of adaptive evolution.
There are a number of reviews about the methodologies for detecting recent positive selection in the genome of living organisms [
20–
30]. Here we review the advantages and disadvantages of the different techniques used in model species, instead of focusing on the methodologies themselves. We are disappointed to find that, to reliably detect recent positive selection, there are few available methods that are suitable for the mostly untouched non-model species. The rapid advancement of next-generation sequencing technology has made it possible to re-sequence multiple individuals from non-model species. Therefore, it is crucial for us to develop statistical tests that can reliably detect recent positive selection in the absence of knowledge of the demography and population structure.
TESTS BASED ON SNP OR MICROSATELLITE POLYMORPHISM DATA FROM A SINGLE LOCUS
When positive selection occurs recently, it often creates a star-like tree. That is, all of the lineages coalesce during a very short time period. This results in an excess of rare derived mutations, compared to the standard neutral model. This has been used as the signature of selection in the Tajima’s
D test [
8], the Fu & Li’s
D test [
16] and the
E test [
18]. These statistical tests all have the form:
where
and
are different unbiased estimators of the population mutation rate
, and an excess of rare derived mutations has distinct effects on different estimators. Here
is equal to
under the standard neutral model, so that these summary statistics tend to be 0.
tends to be smaller than
when positive selection occurs. Thus these summary statistics are expected to be negative under selection. However, population size expansion, bottlenecks and population structure can also create a star-like tree and lead to the excess of rare derived mutations. This results in a high false positive rate of the three tests [
16,
18,
31].
Recent positive selection creates not only an excess of rare derived mutations but also an excess of high-frequency-derived mutations. The latter case has been treated as the signature of selection in the Fay & Wu’s
H test [
10]. However, it has been demonstrated that the Fay & Wu’s
H test is sensitive to a number of neutral scenarios, such as bottlenecks and population structure [
32]. Moreover, the excess of high-frequency-derived mutations may result from unequal sample sizes between different subpopulations in a hidden structured population [
33], which was also confirmed by our coalescent-based simulations [
34] using the
ms program [
35]. When two subpopulations are partially isolated (
,
, where
N is the effective population size,
is the mutation rate and
m is the migration rate) and when 80 and 20 chromosomes are sampled from the subpopulations, respectively, the false positive rate of the Fay & Wu’s
H test is 0.976. Therefore, to avoid the false inference of positive selection due to unequal sampling size in the hidden structured population, it is important to equalize the sample sizes from each sampling location.
The tests mentioned above are based on the mutation frequency spectrum, and a generalized form of these tests has been described [
36]. Generally, the distributions of the summary statistics in those tests are obtained under the standard neutral model. Thus, the violations of the standard neutral assumption may cause false positives when detecting selection. It has been suggested that to reduce the number of false positives, one should use the Tajima’s
D test and the Fay & Wu’s
H test jointly [
18] because those two tests detect different signals of positive selection.
A statistical test based on tree topology (the MFDM test, named after the maximum frequency of derived mutations) has been proven analytically and empirically under conditions that are free from the confounding impact of demography [
37]. This can be achieved because the varying population size does not affect the probability of tree topology while selection does. The probability of having a more or an equally unbalanced tree under the neutral hypothesis [
37] is
where n is the number of leaves of the genealogical tree, and are the size of left- and right-basal branches.
As an alternative to SNPs, microsatellites or simple/short tandem repeats are useful markers to examine traces of positive selection in genome. To date, there are only three methods based on single microsatellite locus data to detect the departure from the standard neutral model. The first test by Cornuet and Luikart [
38], which compares the observed and the expected heterozygosity, is designed to detect population bottlenecks. Another test by Schlötterer et al. [
39] uses the number of alleles at a microsatellite locus and determines whether an excess of the number of alleles is due to positive selection. However, as the authors have noted, the test needs a reliable locus-specific estimate of the scaled mutation rate,
, that is usually difficult to obtain. Moreover, its sensitivity to population demography is still unknown. Recently, it has been found that an unbalanced allele frequency spectrum could usually be observed under the hitchhiking model, and thus, positive selection can be detected when comparing the polymorphism pattern within the locus. It avoids the difficulty in estimating the mutation rate of microsatellite loci and is robust with the wide range of demography parameters [
40].
MULTIPLE-LOCUS TESTS TO IDENTIFY SELECTION
Instead of using the DNA polymorphism data from a single locus, polymorphism data may also be collected from the surrounding loci. When a beneficial mutation goes to fixation (i.e., its allele frequency increases to 1.0), the process will not only change the mutation frequency spectrum at the surrounding loci, but will also reduce their polymorphisms. This locally reduced polymorphism due to positive selection is called selective sweep. Two likelihood ratio tests have been proposed to detect local reductions of nucleotide variation along a recombining chromosome [
13,
15]. By contrasting the standard neutral model and the hitchhiking model for recent positive selection, the likelihood ratio tests calculate the ratio of the maximum likelihoods between the two models (hypotheses) and make a choice between the two models. However, it is important to realize that these tests are based on the standard neutral model. The violation of the assumptions of the standard neutral model, for example bottlenecks, may influence the results and favor the hitchhiking model [
32,
41].
To distinguish between selective sweep and demography, a goodness-of-fit (GOF) test has been proposed [
32] for instances when the standard neutral model is rejected by the KS composite likelihood ratio test [
13]. The null hypothesis of the GOF approach is that the data are drawn from a selective sweep model, and the alternative hypothesis claims that the data are not drawn from a selective sweep scenario. However, simulations suggest that the false positive rate of the GOF approach and the value of the cut-off still depend on the demographic scenario [
23]. Thus, results obtained using the GOF test should be interpreted carefully. Moreover, the local reductions of nucleotide variation along a chromosome can be caused by a low mutation rate. Therefore, when detecting selective sweeps, it is important to avoid mutation rate mis-specification by estimating the local mutation rate from divergence [
42].
Interestingly, a strong linkage disequilibrium (LD) can also be generated in regions at both sides of the location of beneficial mutation, but a lack of LD is expected across the two sides [
43–
45]. Therefore, besides using the spatial distribution polymorphism, Kim and Nielsen [
46] extended the previous KS composite likelihood ratio test to incorporate information regarding LD. However, the improvement made by including the LD is rather small, suggesting that most of the relevant information regarding selective sweeps may be captured by the spatial distribution and the allele frequencies of polymorphisms. A neutrality test based on information regarding the LD has been proposed to detect positive selection [
47]. Simulations have shown that the power to detect strong selection is approximately 50%–60%.
GENOME-WIDE APPROACHES BASED ON SNP DATA OR MOCROSATELLITE POLYMORPHISM DATA
At least twenty-two genome-wide scans for recent and ongoing positive selection have been performed in humans [
48–
58]; see also [
59,
60]. These genome data sets have enhanced our understanding of recent positive selection in human evolutionary history. For example, more than 4,000 genes have been found with a signature of positive selection in humans from 16 genome-wide analyses [
60]. The whole-genome sequence analyses have suggested that approximately 1%–4% of non-African genomes and approximately 4%–6% of Melanesian genomes are derived from the archaic hominins Neanderthals and Denisovans, respectively [
57,
58].
EPAS1, a hypoxia pathway gene, has the strongest signature of positive selection in Tibetans, which has linked
EPAS1 in Tibetans to an adaptation to a hypoxic environment. In addition, a high-frequency haplotype of
EPAS1 in Tibetans was derived from Denisovans [
61].
A number of genomic scans were also carried out to map recent positive selection in several model species [
23], such as in mice [
62],
Drosophila [
21,
42,
63–
66] and
Arabidopsis [
67,
68]. Generally, these methods can be divided into two categories: the model-based approaches (i.e
., neutrality tests) and the outlier approaches. Based on the empirical distribution of the summary statistics, outlier approaches will always identify 5% of the surveyed loci/regions as candidates for selection if the significance level of tests is set at 0.05. That means, under neutrality, no matter how the population size varied in the past, there will be only 5% of the loci that are falsely identified as candidates for selection (and all of the candidates are false positives). In contrast, under a recurrent adaptive evolution model (assuming all of the loci are affected by selection), the power of the outlier approaches to detect selection is as low as 5%.
Tests based on the mutation frequency spectrum and selective sweeps
Although there is a confounding effect of demography, the expected mutation frequency spectrum at different loci remains the same because of the shared demographic history experienced at those loci. Selection, by contrast, can alter the mutation frequency spectrum at individual loci. Therefore, by analyzing Tajima’s
D statistic from 151 loci, the effects of demography and selection in human history may be disentangled by the outlier approach [
69]. Another application of Tajima’s
D statistic is Carlson
et al. [
52]. Particularly, attention should also be paid to the ascertainment bias of genotyped data when applying tests based on the mutation frequency spectrum [
14]. The ascertainment bias is introduced by current methods of SNP discovery. SNPs identified in a very small sample are further typed into a large sample. Therefore, the frequency spectrum of genotyped SNPs is biased towards common or intermediate frequency variants. Such a bias bears significantly on the Tajima’s
D statistic. Therefore, several methods are proposed to correct this ascertainment bias [
14,
41]. The preciseness of these corrections may depend on the available documentation describing how each of the SNPs are initially identified. Moreover, it is known that the KS composite likelihood rate test [
13] suffers from a high false positive rate due to the demography [
32]. To solve this difficulty, Li and Stephan [
42] describe a statistical method that detects footprints of selection in chromosome- or genome-wide data while taking fluctuations of the population size into account. After estimating the demography, a sliding window analysis is performed to find genomic regions that are affected by recent positive selection. In their likelihood ratio test, the null hypothesis assumes that the population has suffered an inferred bottleneck scenario, and that the alternative is that there is an inferred bottleneck scenario together with a selective sweep. Later simulations have shown that this strategy does well to control the false positive rate [
23,
41].
As an alternative, selective sweeps may be detected by a composite likelihood ratio test without inferring demography [
14]. A modified method by Kim and Stephan [
13] can be considered, which differs from the original method by using the background frequency instead of a standard neutral model to define the test statistic. However, it has been suggested that this approach is not robust for the bottleneck scenarios [
23].
Tests based on long haplotype structure
Another popular approach is known as the long haplotype-based tests, originally proposed by Sabeti
et al. [
17] (see also reviews by [
22,
24]). These tests make use of the haplotype profiles left by recombination events under partial selective sweeps. Considering that a beneficial mutation is rapidly rising in a population, recombination may not break down the association with nearby alleles (Figure 2). Therefore, the long-range associations (
i.e., long haplotype structure) can be used to detect ongoing positive selection. The long-range associations can be simply measured by the extended haplotype homozygosity (EHH) [
17], which shows that the information about the length of the haplotype block associated with the beneficial mutation. If the
N chromosomes in a sample form
G homozygous groups in the considered range, with each group
i having chromosomes, EHH is defined as [
55]:
Then, the EHH of the tested allele (with the core haplotype or chromosomes carrying the beneficial mutation) is contrasted against another allele within the same population. This ratio of EHH is then compared to simulated neutral sites or random genomic regions to determine the statistical significance [
17].
Based on similar principle several tests were developed and applied to human genome-wide scans [
53,
70]. It has also been demonstrated that the integrated haplotype score (iHS) test is robust to multiple demographic scenarios [
53]:
where stands for the integrated EHH of the ancestral allele, and for that of derived allele. Nonetheless, iHS has a markedly low power for fixed or near-fixed selective sweeps because iHS relies on the comparison between the alternative alleles of a SNP site in the same population.
When the beneficial allele goes to fixation, the information required for the LRH and iHS tests vanishes together with the alternative allele. To solve this problem, two groups independently developed the lnRsb test [
54] and the XP-EHH test [
55], which are very similar in nature. In these new approaches, the EHH values for a certain SNP are compared between two populations [
54,
55]. The population under positive selection should have a much higher EHH than another neutral evolving population. Therefore, both methods exhibit high power to detect selection when the beneficial allele is fixed or near fixation.
However, there are still several concerns for the long haplotype based tests. First, these tests are based on a heuristic idea rather than on stringent mathematical derivation. The EHH related statistics are functions of the recombination rate, the mutation rate and demography. There is however a lack of effort to fully understand the EHH related statistics. Secondly, as for many other neutrality tests, the EHH related tests encounter the multiple-test problem of large-scale data. For example, 51 human populations have been surveyed and each population has been genotyped for ~650k SNPs [
56]. In such cases, it is difficult to determine the cutoffs. Thirdly, most of the EHH related tests use an empirical
p-value, which derives from ranking the value of the candidate site against the empirical distribution (i.e.
, the outlier approach) because of the difficulty associated with obtaining the distribution of the EHH related statistics under actual demography. Like other outlier approaches, it only tells us how “extreme” a SNP is compared to the other SNPs across the genome. Regardless, this approach successfully reveals a number of well-known adaptive evolving genes, such as the
LCT gene that affects lactose tolerance in adulthood, the
SLC24A5 gene that is a major determinant of skin pigmentation, and the
CYP3A gene that is a major drug-metabolizer in the liver [
5,
71,
72]. Finally, compared to the selective sweep approaches and the tests based on the mutation frequency spectrum, the long haplotype based tests are relatively less affected by the ascertainment bias because the haplotype structure may not strongly depend on the allele frequency. However, the long haplotype based tests require densely distributed SNPs across a whole genome or chromosome, which may not be easily achieved in non-model species.
Tests based on population differentiation, Fst
Adaptation can also be identified from high differences in allele frequencies among different populations, summarized by an estimate of
Fst [
73,
74]. A Bayesian approach has been developed that can identify loci subject to positive selection when the selection coefficient is at least five times the migration rate [
75]. However, the problem of identifying statistically significant departures from neutrality is complicated by the confounding effects of the distribution of
Fst estimates [
76]. Moreover, because background selection in a structured population influences
Fst [
77], it is hard to distinguish recent hitchhiking events from background selection [
78]. Thus, attention should be paid to whether the neutral assumption has been severely violated. To address this problem, an outlier approach is proposed to use loci with extremely high
Fst as candidates for selection [
48,
50]. Another concern about the
Fst approach is the substantial heterogeneity among
Fst values. It has been argued that
Fst estimates of individual SNPs are too variable to be used as indicators of positive selection [
79]. Thus, it has been suggested that using 5-MB windows to smooth out the very high variation in
Fst at individual SNPs in human genomic scans [
79].
lnRV and lnRH tests
Based on microsatellite polymorphism data, lnRV and lnRH tests are suitable methods to detect selective sweeps. Given that the microsatellite mutation rates differ substantially among loci [
80,
81], it is always difficult to directly compare genetic diversity levels among loci. This problem can be circumvented by calculating the ratio of variability in two populations, which is believed to be independent of the mutation rate [
21,
82]. The ratio of variability in both populations (RV or RH) provides an estimator with the same expectation for neutral loci. Here, if the microsatellite variability is estimated from the variance in repeat number, the ratio of variability is named as RV. If the microsatellite variability is measured by the expected heterozygosity, the ratio of variability is called RH. Then, the logarithm of the estimators (lnRV and lnRH) will be highly positive or negative when the locus evolves adaptively in one population but neutrally in another. By further using a normalized empirical distribution (i.e., outlier-like approach) of lnRV or lnRH, the false positive rate due to demography can be controlled. The candidates for selection will be the outliers that have high variability in one population, but low variability in another. To use these approaches, multiple or genome-wide independent microsatellite polymorphism data are needed to calculate the standard deviation of lnRV and lnRH among loci. To analyze the polymorphism data from partially linked microsatellite loci, an extended lnRH has been developed [
83].
Joint analysis of multiple summary statistics
To reduce false positives, it is straight-forward to use multiple signals to detect selection. In their pioneering work, Zeng and his colleagues [
18] proposed the joint use of the Tajima’s
D and the Fay & Wu’s
H because those two summary statistics rely on different signatures of selection. This work has been extended to analyze the human genome-wide polymorphism data [
84,
85]. However, when considering joint use of the Tajima’s
D and the Fay & Wu’s
H, this strategy does work although it cannot completely avoid a high false positive rate [
37]. We further examined whether, to detect selection, combining more signals is helpful or not. After combining four different signals (i.e., the excess of rare derived mutations, the excess of high-frequency derived mutations, selective sweep, and the long-haplotype structure), the false positive rate is still high in some bottlenecked cases if the demography history has been ignored in the analysis [
86]. This provides a warning sign that the issue of high false positives may not be completely solved by a combination of different summary statistics, although the usage of different summary statistics has been seen in recent publications [
85,
87].
SOME CONFOUNDING EFFECTS
The confounding effect of mutation rate heterogeneity has been discussed extensively in the literature. It has been suggested that the estimate of the time to the most recent common ancestor of the sample is highly dependent on the mutation model used [
88], and that the heterogeneity of mutation rates appears to have the same effect on the distribution of observed pairwise differences (or mismatch distribution) as a population size expansion [
89]. A method detecting recent selective sweeps in the context of the mutation rate and background selection was developed recently [
90].
It has been suggested that the genomic variants subject to recent positive selection have had less of an opportunity to be affected by recombination. Thus, the two processes have an intimately related impact on genetic variation, and inference of either may be vulnerable to confounding by the other [
91]. This argument has been further supported by the observation that genome-wide scans that detect loci under recent selection in humans have tended to highlight loci in regions of low recombination [
91]. Moreover, because the pattern of LD at the selected locus may sometimes be complex [
44,
45], simulations have shown that recent positive selection can create false hotspots of recombination [
92].
The EHH based methods rely on the measurement of the LD, therefore the recombination rate variation across a genome may introduce bias. In view of this, most available methods take specific steps to remove the impact of variable recombination rates. This is usually conducted in two ways: the first is to contrast the EHH values of the same locus between two alleles, thereby canceling out the factor of local recombination rate [
53–
55]; the second is to use genomic distance instead of physical distance [
53,
55]. Such corrections rely on the assumption that individuals or populations share the same recombination rate across the genome. It should be noted that the EHH based method is not sensitive to fine-scale recombination rate variations because the extension of a haplotype is required to be up to hundreds of kilo-bases to be considered indicative of positive selection.
CONCLUSIONS
Selective events often leave their footprints behind them. However, it is always difficult to distinguish these footprints from the confounding signatures of demography, such as population size expansion and bottleneck [
93,
94]. According to our work, the most reliable approach to detect selection, besides functional analysis, may be a genome-wide approach. However, a genome-wide approach and a functional analysis may not be easily applied to millions of non-model species. Specifically, even if the genome-wide polymorphism data are available, it does not guarantee that researchers can reliably detect the recent selective events with the available methodologies. For example, twenty-two genome-wide scans have been performed to detect recent positive selection in humans. However, the lists of candidates for selection are incompatible within these studies. The map of positive selection across nine genome-wide scans is integrated [
59]. In total, 5,110 distinct regions were identified in one or more studies. These regions encompass 409 Mb of the sequence (14% of the human genome). Strikingly, only 722 regions (14.1%) were identified in two or more studies. Therefore, this suggests that attention should now be turned towards understanding the biological relevance and adaptive significance of the regions identified as being subject to recent positive selection [
19,
59,
95].
The issue of false acceptance of the adaptation hypothesis cannot be over-emphasized because we still have little knowledge about the demography and the molecular mechanism of adaptation in most of species. Thus, how can we detect recent positive selection in millions of natural species? How can we study the molecular mechanism of adaptation to recent environmental changes? Although we are in the age of genomics, most of the available DNA polymorphism data from those species are still from a single locus or a small number of loci. However, it is known that, based on the single locus data, the available methods, including the Tajima’s
D test and the Fay & Wu’s
H test, cannot reliably detect recent positive selection in varying size populations. To solve this problem, one approach is to utilize a
priori information about the demography of the species to help researchers choosing an appropriate test [
18].
It is clear that a large amount of data is needed to estimate the parameters of demography, and the estimation procedure demands the computational power. For example, based on the genome-wide joint mutation frequency spectrum (or multi-population allele frequency spectrum) between the African
Drosophila population and the European
Drosophila population, the parameters of the bottleneck for the European population have been estimated [
42]. Later studies have demonstrated that it is efficient to fit the demographic models to the joint mutation frequency spectrum [
96–
98].
Therefore, considering the low efficiency of combining neutrality tests with functional analysis, we would suggest that priority should be given to develop more reliable tests that detect selection. It should be noted that, when a new test is proposed, it is always difficult or even impossible to investigate all of the possible neutral scenarios by simulations because of the intricate nature of evolutionary history in natural populations. This may be one of the reasons why several papers have claimed that their tests are specific for detecting recent positive selection, but later studies with wide range simulations demonstrate that these tests are still sensitive to certain neutral scenarios. Here, to help investigate the robustness of new methods more efficiently in the future, we would summarize two neutral scenarios that often lead to false positives when detecting selection. First, the bottleneck scenarios with severity approximately 1.0, where the severity is equal to , is the current effective population size, is the size during the bottleneck, and is the duration of the bottleneck. The time is scaled such that one unit represents generations. Under bottlenecks, the variance of the summary statistics, such as Tajima’s D, Watterson’s and the mean number of nucleotide differences between two sequences ( ), can be increased, especially when the severity of the bottleneck is approximately 1.0. Second, the population structure with different sampling schemes should be examined carefully. A reasonable range of the migration parameter Nm is 0.001–10.
Future studies should be aimed at improving the robustness of tests with respect to demography. Ideally, new statistical tests should not only be free from the confounding impact of both demography and population structure, but should also have reasonable power to detect selection. Due to obvious practical reasons, they should rely on single locus DNA polymorphism data. However, it would be a great challenge for all who are working in this field to develop such tests. Recently, the (joint) analysis of demography and selection [
25,
26,
42,
90,
93,
99,
100] has become an central issue, and it is believed that we are on the right track. However, researchers should be made aware that simulations have shown that the popularly used
SweepFinder is not robust to certain demographics, for example, severe and recent bottlenecks [
23].
Higher Education Press and Springer-Verlag Berlin Heidelberg