Genome-wide analysis reveals selection for Chinese Rongchang pigs

Livestock have undergone domestication and consequently strong selective pressure on genes or genomic regions that control desirable traits. To identify selection signatures in the genome of Chinese Rongchang pigs, we generated a total of about 170 Gb of DNA sequence data with about 6.4-fold coverage for each of six female individuals. By combining these data with the publically available genome data of 10 Asian wild boars, we identified 449 protein-coding genes with selection signatures in Rongchang pigs, which are mainly involved in growth and hormone binding, nervous system development, and drug metabolism. The accelerated evolution of these genes may contribute to the dramatic phenotypic differences between Rongchang pigs and Chinese wild boars. This study illustrated how domestication and subsequent artificial selection have shaped patterns of genetic variation in Rongchang pigs and provides valuable genetic resources that can enhance the use of pigs in agricultural production and biomedical studies.


Introduction
Genome sequencing and assembly for European domestic Duroc pigs has greatly improved the genetic resources available for this important livestock species [1] , and has enhanced the potential of this pig as a model organism for biomedical studies. Biological adaptability has enabled development of over 730 current pig breeds or lines that are distributed globally across a wide range of environ-ments [2] . Modern domestic pigs have undergone strong genetic selection in specialized commercial populations; this has led to remarkable phenotypic changes and genetic adaptation, which makes these breeds an important world heritage and scientific resource for comparative genomic studies [3,4] . Recently, a list of 'domestication genes' has been compiled for silkworms [5] , chickens [6] , pigeons [7] , rabbits [8] , dogs [9] , cattle [10] and pigs [11][12][13][14] by genomic sequencing. Rongchang pigs, a Chinese indigenous breed raised only in Southwest China with a center of production in the Sichuan basin, have been intensively selected for efficient accumulation of muscle and highly prized pork traits (i.e., juiciness, flavor, tenderness, pink hue and heavy marbling). The phenotypes of Rongchang pigs, characterized by their average-size head, concave and wrinkled face, well-developed limbs, concave back, tilted haunch and a big belly, are remarkably different from wild boars.
To identify genomic selection signatures in Rongchang pigs, we performed whole-genome resequencing of six female Rongchang pigs (about 170 Gb in total) and evaluated the genomic regions under selection.  [1,11,15,16] .

SNP calling
We first filtered low-quality paired reads, which mainly resulted from base-calling duplicates and adapter conta mination. The qualified paired-end reads were mapped to the pig reference genome assembly (Sscrofa10.2) [1] using BWA software [17] . After alignment, we performed SNP calling on a population-scale for two groups (56 domestic pigs and 51 other pigs as detailed above and an African warthog) using a Bayesian approach implemented in SAMtools [18] . The genotype likelihoods from reads for each individual at each genomic location were calculated, and the allele frequencies were estimated. Only the highquality SNPs (coverage depth≥4 and£1000, RMS (root mean square) mapping quality≥20, distance of adjacent SNPs≥5 bp and missing ratio of samples within each group < 50%) were kept for the subsequent analysis.

Functional enrichment analysis
Functional enrichment analysis of Gene Ontology (GO) terms, pathway, and InterPro domains were identified using the DAVID web server [19] . Genes were mapped to their respective human orthologs, and the lists were submitted to DAVID for enrichment analysis of the significant overrepresentation of GO biological processes (GO-BP), molecular function (GO-MF) terms, and KEGG pathway and InterPro categories. In all tests, all the known genes were assigned as the background, and P values (i.e., EASE score), which indicated significance of overlap between various gene sets, were calculated using Benjamini-corrected modified Fisher's exact test. Only terms with P < 0.05 were considered significant.

Phylogenetic analyses
Phylogenetic relationships were inferred using the package TreeBeST (http://treesoft.sourceforge.net/treebest.shtml) under the p-distance model using SNPs at a populationscale. We performed principle component analysis (PCA) with population-scale SNPs using EIGENSOFT4.2 [20] . The significance level of eigenvectors was determined using the Tracy-Widom test [20] .

Identification of selected regions
A sliding window approach (100-kb windows with 10-kb steps) was applied to quantify the pooled heterozygosity (Hp), genetic differentiation (F ST ), and selection statistics (Tajima's D, which is a measure of selection in the genome) between Rongchang pigs and Asian wild boars. To detect regions with significant selective sweep signatures, we Z-transformed the resultant distributions of Hp scores and F ST values, and simultaneously selected windows with low Z(Hp) ( < -2) in Rongchang pigs and high Z(F ST ) ( > 2) as genomic regions with strong selective sweep signals that could harbor genes under selection.

Sequencing and SNP calling
We generated a total of about 170 Gb of paired-end reads, of which 85% (144.43 Gb) of high-quality reads were mapped to the pig reference genome assembly (Sscrofa10.2) with about 6.4-fold coverage for each individual (Table S1). In addition, we downloaded about 1037 Gb of genomic data from 101 publically available pig genomes in the EMBL-EBI database and about 659 Gb in the NCBI SRA database [1,11,15,16] (Table S2).
We performed SNP calling on a population-scale and identified 10.13 M SNPs from 107 individuals (Table S3). We then separately pooled and obtained SNP sets for each of two groups, which included 6.74 M from the 56 domestic pigs and 7.76 M from the 51 other individuals. We identified 6.50 M SNPs from six Rongchang pigs, of which 47205 were coding SNPs included 15662 nonsynonymous nucleotide substitutions (15540 missense, 102 stop-gain and 20 stop-loss mutations) that were detected in 6910 genes (Table S4). These nonsynonymous SNP-containing genes in Rongchang pigs were mainly related to the G-protein coupled receptor protein signaling pathway (97 genes, P = 2.14 Â 10 -10 ), especially sensory perception of chemical stimulus (68 genes, P = 1.02 Â 10 -9 ), smell (61 genes, P = 1.02 Â 10 -8 ), olfactory transduction (56 genes, P = 1.92 Â 10 -8 ) and olfactory receptor (59 genes, P = 5.73 Â 10 -8 ). Pigs have one of the largest repertoires of functional olfactory receptor genes that encode the G-protein coupled receptor superfamily [21] . In previous reports [1] , similar rapid evolution of olfactoryrelated genes with extensive nucleotide variation have been found, reflecting the importance of smell in this scavenging animal and other odor-driven behaviors, such as individual recognition and mating preferences [22,23] .

Genome-wide selective sweep signals
It has been well documented that from about 1000000 years ago, European and Chinese pigs diverged from each other, originating independently from different subspecies of ancestral wild boars around 10000 years ago [1,16,24] . To examine relatedness between Rongchang pigs and other pigs, we constructed a neighbor-joining tree (Fig. 1a), and conducted PCA (Fig. 1b; Table S5) using genomic SNPs, both of which revealed a deep phylogenetic split between European and Asian pigs. To accurately detect genomic footprints left by selection in Rongchang pigs and avoid genetic differences that resulted from geographic isolation of Europe from China, we specifically measured the genome-wide variations and frequency spectrum based on 8.32 M SNPs between six Rongchang pigs and 10 Asian wild boars (Table S6).
In total, 229772 100-kb windows with 10-kb steps across the pig genome contained≥100 SNPs within each window and covered 84.4% of the genome and were used to identify the regions that may have been affected by selection during domestication. We empirically chose to set the thresholds at Z(Hp) Rongchang < -2 and Z(F ST ) > 2, because they represent the extreme tails of the distributions and are hence likely enriched for strong selective sweep signals along the genome, which could harbor genes that underwent a selective sweep. From this we identified a total of 44.86 Mb of genomic data (1.61% of the genome containing 449 genes) with strong selective sweep signals in Rongchang pigs (Fig. 2a), which also exhibited significant differences (P < 10 -16 , Mann-Whitney U test) based on Z(Hp) , Z(F ST ), and Tajima's D when compared with the genomic background (Fig. 2b).
We also constructed a phylogenetic neighbor-joining tree (Fig. 2c) and performed PCA (Fig. 2d) exclusively using the SNPs in regions with strong selective sweep signals. Although Rongchang pigs and Asian wild boars are genetically close, based on the 8.32 M SNPs across the whole genome (Fig. 1), they form two distinct clusters with respect to these SNPs (0.62 M, 7.45%), which are potentially under adaptive evolution resulting from industrial agriculture (Fig. 2c, Fig. 2d).
The 449 genes embedded in selected regions were analyzed using DAVID to examine whether these domestic genes were enriched for specific functional gene categories (Table 1). Our findings coincide with previous reports on genes related to pig domestication [1][2][3][4][11][12][13][14][15] : genes related to growth and hormone binding, which included seven terms, were observed to be under strong selective sweep in Rongchang pigs, and may have contributed to the rapid growth and enhanced muscle development of domestic pigs.
Of note, 10 genes putatively under selection that are related predominantly to nervous system development, most with a single allele in Rongchang pigs, include CNTN4, DLL3, GHSR, LHX5, MAP1B, MBP, METRN, NUMBL, TNFRSF12A and REST, several of which affect brain development, neuronal functions, and behavior (Fig. 3). This result supports the view that altered behavior (such as reduced fear, higher levels of adult play, and tameness or aggression toward humans), in addition to the obvious dramatic changes in appearance and physiology, was also important during domestication, and that mutations affecting developmental genes may underlie these changes [25][26][27][28][29] . In addition, four genes (CYP2A6, GMPS, UPB1 and UPP2) in Rongchang pigs exhibited strong selective sweep signatures enriched for drug metabolism (Fig. S1). Given that alterations in these genes are associated with a variety of drug metabolism associated-diseases [30] , their positive selection may be attributed to constant exposure of domestic pigs in modern industry to much higher dosages of chemicals/drugs and an increased number of environmental xenobiotics, which could have accelerated evolution of drug metabolism.  (Table S5). , and corresponding Z transformations (Z(Hp)) and Z(F ST ), which were calculated in 100-kb windows with 10-kb steps (n = 229772, contain≥100 SNPs). Data points located to the right of the vertical line (where Z(F ST ) is 2) and below the horizontal line (where Z(Hp) is -2) were identified as selected regions in Rongchang pigs (red points). m, mean; s, standard deviation; (b) violin plot of Z(Hp) Rongchang , Z(F ST ), and |Tajima's D Rongchang pigs -Tajima's D Asian wild boars | in genomic regions with strong selective sweep signals for Rongchang pigs compared with the whole genome. Out of 229772 100-kb windows that contained≥100 SNPs with 10-kb steps across the pig reference genome (gray violin), 1852 windows were picked out as regions with strong selective sweep signals (green violin). Each violin with the width depicting a 90°-rotated kernel density trace and its reflection. Vertical black boxes denote the interquartile range between the first and third quartiles (25th and 75th percentiles, respectively) and the white point inside denotes the median. Vertical black lines denote the lowest and highest values within a 1.5 times interquartile range from the first and third quartiles, respectively. The statistical significance was calculated by the Mann-Whitney U test; (c) phylogenetic tree (scale bar represents p-distance); (d) two-way principle component plot of Rongchang pigs (n = 6) and Asian wild boars (n = 10) based on SNPs in regions with strong selective sweep signals with 25.0% of variance explained for eigenvector 1, (P = 0.030, Tracy-Widom test) and 13.7% for eigenvector 2 (P = 0.277, Tracy-Widom test).

Conclusions
This study examined the genetic relationships among Chinese Rongchang and other pigs, and uncovered genetic footprints of domestication and selection that provide an important resource for further improvements of this important livestock species. We envision that the data presented here will provide a representative example on which to base future deciphering of genomic footprints left by livestock domestication and selection. , which indicate significance of the overlap between various gene sets, were calculated using a Benjamini-corrected modified Fisher's exact test. Only Gene Ontology (GO) biological process (GO-BP), GO-molecular function (GO-MF) and KEGG pathway terms with P < 0.05 were considered significant and listed.