Genome-wide search for candidate genes determining vertebrae number in pigs

Longer porcine carcasses may be expected to have more vertebrae. Therefore, vertebrae number in pigs is an economically important trait. To examine the genetic basis of this trait, we genotyped 578 F2 Large White Minzhu pigs using the Porcine SNP60K BeadChip. A genome-wide association study (GWAS) identified 36 significant single nucleotide polymorphisms (SNPs) on the chromosomes SSC1 (294.28–300.32 Mb) and SSC7 (102.22–109.39 Mb). A 6.04-Mb region that contained all 13 significant SNPs on SSC1 also contained the gene NR6A1, previously reported to influence the number of vertebrae in pigs. However, the reported putative casual mutation of NR6A1 c.748C>T showed no genome-wide significant association with the trait, suggesting it was not a causal mutation in our population. The remaining 23 significant SNPs on SSC7 were concentrated in a 7.17-Mb region, which was within a quantitative trait locus interval for number of vertebrae. TMED10 was the closest gene to the most significant SNP and might be a candidate. Haplotype sharing and block analysis refined the QTL to an interval of about 3 Mb containing 29 candidate genes. Of these 29 genes, the previously reported possible casual mutation of VRTN g.19034A>C was not found to be a causal mutation in our population. Exploration of these genes via additional genetic and functional studies in mammals revealed that TGFβ3 could be a good candidate on SSC7. A mutation of TGFβ3 c.1749G>Awas detected by GWAS and could be proposed as a candidate causal mutation, or as closely linked to a causal mutation, for the number of vertebrae in pigs.


Introduction
Backbones consist of the morphologically differentiated cervical, thoracic, lumbar, sacral and caudal vertebrae, and the number of these varies in pigs [1] . Wild boars (Sus scrofa), the ancestor of domestic pigs, all have 19 thoraciclumbar vertebrae, whereas most Chinese indigenous pig breeds have 19 or 20 vertebrae (thoracic-lumbar). By comparison, Western commercial pig breeds, such as Landrace and Large White, have 20-22 vertebrae [1] . The number of vertebrae is usually associated with carcass length and is an economically important trait, because a greater number of vertebrae in pigs is associated with higher economic value. According to a study by King and Roberts, longer carcasses may be expected to have a higher number of vertebrae [1] . Understanding the genetic basis for the number of vertebrae can offer insights into the mechanism for vertebral developmental in mammals and provide genetic markers to aid molecular breeding of pigs.
Locating quantitative trait loci (QTLs) and gene-mining for number of vertebrae have recently received research attention. Wada et al. first reported two QTLs on wild boar chromosomes SSC1 and SSC2 that are associated with the number of vertebrae [2] . Subsequently, using several different pig populations, two QTLs for the number of vertebrae were mapped to SSC1 and SSC7 [3] . Using three F 2 experimental families, fine mapping of the QTL on SSC1 was performed and showed that nuclear receptor subfamily 6, group A, member 1 (NR6A1), was a strong candidate gene that appeared to influence the number of vertebrae in pigs [4] . Gene-mining the other QTL on SSC7 suggested that the vertebrae development homolog (VRTN) could be a good candidate [5] . A genome-wide association study (GWAS), based on the high density of genome-wide single nucleotide polymorphisms (SNPs), has proven to be a more efficient method to not only identify QTLs but also mine major new genes. Additional methods and different populations are required to identify loci associated with vertebrae number and detect good candidate genes. The objectives of this study were to identify SNPs associated with vertebrae number using GWAS in an F 2 Large White Â Minzhu pig population.

Population and phenotypic data
All animals used in this study were treated according to the guidelines for experimental animals established by the Council of China. Animal experiments were approved by the Science Research Department of the Institute of Animal Science, Chinese Academy of Agricultural Sciences (Beijing, China). The F 0 population was generated using four Large White boars and 16 Minzhu sows. In the F 1 generation, nine boars and 46 sows were selected to mate to produce 578 F 2 individuals. All animals in the F 2 generation were born in three parities and 94 litters. Each F 2 male was castrated 3 d after birth. All animals were maintained in uniform housing and were fed the same fodder. When F 2 animals were 240AE7 d-old, they were slaughtered in 52 batches. The numbers of thoraciclumbar vertebrae together were counted from the carcasses and used as phenotypes.

Genotyping and quality control
Genomic DNA was extracted from ear tissue using standard methods [6] . The Illumina (San Diego, CA, USA) SNP60K chip for pigs was employed to genotype all individuals. In our previous study [7] , quality control parameters for single nucleotide polymorphisms included the following: call rate > 90%, minor allele frequency > 3%, and Hardy-Weinberg equilibrium with P > 10 -6 .

Genome-wide association study
The genome-wide rapid association using the mixed model and regression-genomic control approach [8,9] was used in the present study. Sex and litter were used as fixed and random effects, respectively. DMU and GenABEL software [9] in the R environment were employed to estimate the residuals of traits for each individual and perform the GWAS.
In the first step, the mixed model was used, and data were analyzed using DMU software, using the formula: where y is the vector of the phenotypes of all F 2 individuals, b is the vector of fixed effects of the sex, c is the vector of litter effect as a random effect, c~N (0, σ c 2 ), a is the vector of random additive genetic effects with a~N (0, Aσ c 2 ) (where A is the relationship matrix calculated from the corrected pedigree and σ c 2 is the additive genetic variance). X, T and Z are incidence matrices related to records of fixed and random effects in y, e is the vector of residual errors, e~N (0, Iσ e 2 ) (where I is the identity matrix and σ e 2 is the residual variance). The vector of residuals y* is estimated as: whereb,ĉ , andâ are estimates and predictors for b, c and a, respectively.
Second, the residuals were used as the dependent trait, via the formula: where y* is the vector of the adjusted phenotypes in the first step, g is the vector of the genotypes which were coded as "0", "1" and "2" corresponding to AA, AB, and BB, k is the regression coefficient, and e* is the vector of random residuals. Based on single locus regression analysis; the analysis was performed in the R statistical environment using the GenABEL package.
Finally, the unadjusted test statistic factor of the ith SNP T i 2 was calculated in the genomic control procedure as: wherek i and varðk i Þ are the estimate and sample variance of k, respectively. The deflation factor l is estimated as l = median (T 1 2 , T 2 2 , ..., T i 2 )/0.456, where 0.456 is simply the median of c 2 distribution with one degree of freedom [10] . The association of the ith SNP with the trait was examined by comparison of T 1 2 /l with c (1) 2 . When association studies are conducted with many SNPs, the tests performed on each SNP are usually not independent, depending on the correlation structure among the SNPs. This violation of the independence assumption limits the ability of the Bonferroni correction to effectively control type I error, and the point-wise error rate has to be adjusted in order to keep the experiment-wise error rate at a nominal level. Therefore, the number of independent tests, which is regarded as number of effective SNPs, can be correctly inferred and used in the standard Bonferroni correction to rapidly adjust for multiple testing. The number of effective SNPs, which was estimated using a simpleM method [11] , was 12039 (Table S1). A lower conventional Bonferroni P was calculated as 4.15Â10 -6 (0.05/12039) [12] and was applied to avoid missing QTLs.

Haplotype sharing and linkage disequilibrium analysis
The genotypes of nine F 1 boars were determined using marker-assisted segregation analysis (MASS) [13] . According to the two SNPs of INRA0027600 and H3GA0022821, the genotype of each boar was determined from a Z-score corresponding to the log 10 likelihood ratio LH1/LH0. LH1 corresponds to the likelihood of the pedigree data assuming that the boar is of Qq genotype, and LH0 corresponds to the likelihood of the pedigree data assuming that the boar is of QQ or qq genotype. Boars were considered to be Qq when Z > 2; QQ or qq when Z < -2; and of undetermined genotype if 2 > Z > -2. According to the MASS analysis, Q-bearing chromosomes in F 1 boars segregated and haplotype sharing analysis was performed using all 23 significant SNPs on SSC7. Haplotype block detection was performed on the chromosomal region which contained all the SNPs that were significantly associated with vertebrae number. The HAPLOVIEW V4.1 program [14] was used to detect and visualize the haplotype blocks in this work.

NR6A1 and VRTN candidate causal variants genotyping
According to previous reports [4,15] , NR6A1 c.748C > T, VRTN g.19034A > C and VRTN g.20311_20312ins291 were selected as candidate causal mutations to genotype for 578 F TGF individuals. A set of primers (Table S2) was used to amplify genomic DNA. Amplification was performed in a routine way with 1.5 mmol$L -1 of MgCl 2 and optimal annealing temperatures. All PCR products were bidirectionally sequenced with original PCR primers on the 3130XL Genetic Analyzer (Applied Biosystem, Carlsbad, CA, USA).

Resequencing of the TGFβ3 and polymorphism detecting and genotyping
To resequence the porcine transforming growth factor, beta 3 (TGFβ3), a set of primers (Table S2) were used to amplify genomic DNA of the 20 F 0 individuals. The resulting polymerase chain reaction (PCR) products covered all of the untranslated regions (UTR) and exons of genes. Similar to mutations of NR6A1 and VRTN genotyping, a total of 578 F 2 individuals were genotyped according to the detected mutations by PCR.

Results and discussion
3.1 Detection 36 genome-wide significant SNPs on SSC1 and SSC7 The pigs had 19 (n = 243), 20 (n = 292), 21 (n = 42), and 22 (n = 1) vertebrae with a mean of 19.7. The final data set that passed quality control screening and was used in the analysis contained 43837 SNPs and came from 578 F 2 individuals. The distribution of SNPs after quality control and the average distance between adjacent SNPs on each chromosome are shown in Table 1. The results of the GWAS for vertebrae number showed that a total of 36 genome-wide significant SNPs were detected on SSC1 and SSC7. The Manhattan plot obtained from GWAS is shown in Fig. 1. These results are consistent with previously reported QTLs [3] . When thoracic-lumbar vertebrae were used as the phenotypes instead of total vertebrae, the QTLs for number of thoracic vertebrae were focused on SSC7 [15] . Also in their research, a putative QTL for number of lumbar vertebrae was detected on SSC1 although it did not reach a significant level. Therefore, the total number of thoracic-lumbar vertebrae used as phenotype may be the possible reason for two QTLs being detected in the current study and that of Mikawa et al. [3] .
3.2 6.04-Mb region includes all 13 genome-wide significant SNPs on SSC1 On SSC1, there were 13 genome-wide significant SNPs associated with vertebrae number, within a 6.04-Mb (294.28-300.32 Mb) region. The most significant SNP could explain 25.6% of phenotypic variance. Several previous studies have mapped the QTL for vertebrae number to a similar interval on SSC1 [2][3][4][5]16,17] . Using MASS, five F 1 boars proved to be heterozygous Qq genotypes (Fig. 2). However, haplotype sharing analysis showed that all five chromosomes in the Q pool share an identical descent haplotype spanning all 13 SNPs and the analysis did not refine this 6.04-Mb region using the 13 significant SNPs (data not shown). Previous fine mapping to the QTL showed that NR6A1 was a strong candidate for the QTL and that the most likely causative mutation is a base substitution, NR6A1 c.748C > T, which results in a proline to leucine substitution at codon 192 [4] . The strongest selection signature was observed for a locus on chromosome 1, which includes NR6A1 [18] . The distribu-tion of NR6A1 c.748C > T in different pig breeds showed that the C and T alleles were almost fixed in wild boars and western commercial breeds, respectively [19,20] . All three genotypes are represented in European and Chinese indigenous pig breeds [19,21] . Although the T allele was fixed in all F 0 Large White in the current work, the allele frequency was over 50% in Minzhu (Table S3). In the current study, the association of NR6A1 c.748C > T with trait did not reach significant genome-wide association ( Table 2). From a previous report, the NR6A1 c.748C > T was predicted as causal mutation although its further functional identification is lacking [4] . Hence more work is required to identify the function of this gene in the future.

7.17-Mb region includes all 23 genome-wide significant SNPs on SSC7
The significant genome-wide SNPs on SSC7 are shown in Table 2. A total of 23 SNPs were significantly associated within a 7.17-Mb region (102.22-109.39 Mb) on SSC7. The most significant SNP was ASGA0035537 and could explain 20.6% of the phenotypic variance. Similar to this result, a previous study reported that the QTL on SSC7 for vertebrae number could explain 4.2%-37.3% of the phenotypic variance in several distinct pig populations [3] . Several previous studies have reported similar findings when they mapped the major QTL associated with the vertebrae number on SSC7 using different populations. This QTL was first reported to be on SSC7 in an F 2 Meishan Â Duroc resource population [16] . Subsequently, research in several populations derived from crosses between Western pig breeds (Large White, Duroc, Berkshire and Landrace) and Chinese indigenous pig breeds (Meishan and Jinhua), revealed that the identical QTL was located on SSC7 [3] . Moreover, this QTL on SSC7 was repeatedly identified in an F 2 population crossed from the commercial breeds Duroc and Pietrain [22] .  15Â10 -6 )).

Fig. 2
The marker-assisted segregation analysis on SSC1 for F 1 boars in a population of Large White Â Minzhu pigs. The markerassisted segregation analysis on SSC1 for F 1 boars. The graphs show for half-sib pedigrees for five F 1 boars (700105, 706601, 721205, 724001 and 728505) the phenotypic meanAEstandard errors of the offspring sorted in two groups according to the homolog inherited from the sire. The number of offspring in each group is given above the respective error bars. The graph corresponds to the boars that were shown to be heterozygous Qq and gives a Z-score for each pedigree. Q alleles associated with a positive allele substitution effect on vertebrae number are marked by a diamond, q alleles by a circle.  3 Phenotypic variation explained by the most significant SNPs on SSC 1 and 7; 4 The reported putative casual mutation of NR6A1 c.748C > T; 5 The reported putative casual mutation of VRTN 19034A > C; 6 The detected mutation in TGFβ3 in this study.

Haplotype sharing and candidate gene analysis of SSC7
Using MASS, five of the nine F 1 boars proved to be heterozygous Qq genotypes (Fig. 3). Further, we judged the QTL status of the five F 1 boars with Qq genotypes by multiple comparisons of targeted chromosomes with the reference Q (number-increase) or q (wild-type). Visual examination of the Q pools immediately reveals that all five chromosomes in the Q pool share an identical by descent haplotype spanning the ASGA0035536-ALGA00 44124 interval (Fig. 4). A total of 29 annotated genes in GenBank were contained within the 2.69-Mb region containing 13 significant SNPs on SSC7. Of these genes, transmembrane p24 trafficking protein 10 (TMED10) was the closest gene to the most significant SNP on SSC7. TMED10 belongs to the highly conserved p24 family of type I transmembrane proteins and its main function is to regulate anterograde and retrograde transport in the early secretory pathway between the endoplasmic reticulum and Golgi apparatus [23][24][25] . According to a previous study in mouse embryos, TMED10 was found to be expressed at embryonic day 10.5 [26] , which is the critical period for vertebral differentiation [27] . Although little is known concerning the influence of TMED10 on vertebral differentiation, this gene may be a candidate as it is the most significant SNP in the current study. The Finkel-Biskis-Jinkins murine osteosarcoma viral oncogene homolog (FOS), one of 29 genes, has an essential role in the osteoclastic differentiation of precursor cells and the upregulation of Receptor Activator of Nuclear Factor k B (RANK) expression in osteoclast precursors within the bone environment [28] . This gene has been reported to be a candidate for vertebrae number [17] . However, a candidate gene of interest is TGFβ3, in which a significant SNP, SIRI0001067 was located in this GWAS. The development of the vertebral column is a consequence of a segmentspecific balance between proliferation, apoptosis and differentiation of mesoderm cells in embryos [29] . In mammals, TGFβ3 promotes chondrogenesis in posterofrontal suture-derived mesenchymal cells, influencing different stages of chondrogenic differentiation and proliferation [30] . TGFβ3 protein, in combination with its downstream factor, TGF beta receptor type I (ALK5), regulates the differentiation and proliferation of the spinal column [31] . We resequenced all of UTRs and exons of TGFβ3 and found only one SNP, which was located in the 3′-UTR (1749 bp of NM_214198 in GenBank), i.e., c.1749G > A. Also, a fixed G allele (increasing number) was detected in each F 0 Large White pig (Table S3).
Rerunning GWAS suggested that TGFβ3 c.1749G > A had significant (P = 2.98Â10 -7 ) genome-wide association with vertebrae number in the F 2 population. These findings support the proposal that TGFβ3 c.1749G > A might be a candidate causal mutation, or is closely linked to a causal mutation, on SSC7 for vertebrae number in our population. However, VRTN, which has been reported to be a strong candidate gene [5,15,32] , was not contained in this shared region. Two putative causal mutations, i.e., g.19034A > C and g.20311_20312ins291 of VRTN, have been reported previously [15] . Genotyping the two SNPs in our population revealed that only VRTN g.19034A > C has polymorphisms. Rerunning GWAS suggested that VRTN g.19034A > C showed significant genome-wide association ( Table 2). Genotype detection in the F 0 pigs showed that none of alleles of VRTN g.19034A > C were fixed in either breed (Table S3). When the VRTN g.19034A > C was treated as a fixed effect to rerun GWAS, there was also an obvious peak (Fig. S1) for association, even if the top SNP (M1GA0010629, P = 8.85Â10 -5 ) did not reach significant chromosome-wide association. These results suggest VRTN g.19034A > C could be proposed as a casual mutation influencing the vertebral number.

Conclusions
In summary, this study focused on vertebrae numbers in pigs. A genome-wide search identified 13 significant SNPs within a 6.04-Mb region containing the reported causal gene NR6A1 on SSC1. An additional 23 SNPs were identified within a 7.17-Mb region on SSC7 that showed genome-wide association with vertebrae number in our population. By identifying a haplotype sharing and haplotype block analysis, the QTL was refined to a chromosome segment of about 3 Mb on SSC7. In addition Fig. 3 The marker-assisted segregation analysis on SSC7 for F 1 boars in a population of Large White Â Minzhu pigs. The markerassisted segregation analysis on SSC7 for F 1 boars. The graphs show for half-sib pedigrees of five F 1 boars (700105, 706601, 704003, 721205 and 724001) the phenotypic meanAEstandard errors of the offspring sorted in two groups according to the homolog inherited from the sire. The number of offspring in each group is given above the respective error bars. The graph corresponds to the boars that were shown to be heterozygous Qq and reports a Z-score for each pedigree. Q alleles associated with a positive allele substitution effect on vertebrae number are marked by a diamond, q alleles by a circle.
to the reported candidates, VRTN and TMED10, TGFβ3 on SSC7 could also be good candidate for determining vertebrae number in pigs.