Introduction
Backbones consist of the morphologically differentiated cervical, thoracic, lumbar, sacral and caudal vertebrae, and the number of these varies in pigs
[1]. Wild boars (
Sus scrofa), the ancestor of domestic pigs, all have 19 thoracic-lumbar vertebrae, whereas most Chinese indigenous pig breeds have 19 or 20 vertebrae (thoracic-lumbar). By comparison, Western commercial pig breeds, such as Landrace and Large White, have 20–22 vertebrae
[1]. The number of vertebrae is usually associated with carcass length and is an economically important trait, because a greater number of vertebrae in pigs is associated with higher economic value. According to a study by King and Roberts, longer carcasses may be expected to have a higher number of vertebrae
[1]. Understanding the genetic basis for the number of vertebrae can offer insights into the mechanism for vertebral developmental in mammals and provide genetic markers to aid molecular breeding of pigs.
Locating quantitative trait loci (QTLs) and gene-mining for number of vertebrae have recently received research attention. Wada et al. first reported two QTLs on wild boar chromosomes SSC1 and SSC2 that are associated with the number of vertebrae
[2]. Subsequently, using several different pig populations, two QTLs for the number of vertebrae were mapped to SSC1 and SSC7
[3]. Using three F
2 experimental families, fine mapping of the QTL on SSC1 was performed and showed that nuclear receptor subfamily 6, group A, member 1 (
NR6A1), was a strong candidate gene that appeared to influence the number of vertebrae in pigs
[4]. Gene-mining the other QTL on SSC7 suggested that the vertebrae development homolog (VRTN) could be a good candidate
[5]. A genome-wide association study (GWAS), based on the high density of genome-wide single nucleotide polymorphisms (SNPs), has proven to be a more efficient method to not only identify QTLs but also mine major new genes. Additional methods and different populations are required to identify loci associated with vertebrae number and detect good candidate genes. The objectives of this study were to identify SNPs associated with vertebrae number using GWAS in an F
2 Large White × Minzhu pig population.
Materials and methods
Population and phenotypic data
All animals used in this study were treated according to the guidelines for experimental animals established by the Council of China. Animal experiments were approved by the Science Research Department of the Institute of Animal Science, Chinese Academy of Agricultural Sciences (Beijing, China). The F0 population was generated using four Large White boars and 16 Minzhu sows. In the F1 generation, nine boars and 46 sows were selected to mate to produce 578 F2 individuals. All animals in the F2 generation were born in three parities and 94 litters. Each F2 male was castrated 3 d after birth. All animals were maintained in uniform housing and were fed the same fodder. When F2 animals were 240±7 d-old, they were slaughtered in 52 batches. The numbers of thoracic-lumbar vertebrae together were counted from the carcasses and used as phenotypes.
Genotyping and quality control
Genomic DNA was extracted from ear tissue using standard methods
[6]. The Illumina (San Diego, CA, USA) SNP60K chip for pigs was employed to genotype all individuals. In our previous study
[7], quality control parameters for single nucleotide polymorphisms included the following: call rate>90%, minor allele frequency>3%, and Hardy–Weinberg equilibrium with
P>10
-6.
Genome-wide association study
The genome-wide rapid association using the mixed model and regression-genomic control approach
[8,
9] was used in the present study. Sex and litter were used as fixed and random effects, respectively. DMU and GenABEL software
[9] in the R environment were employed to estimate the residuals of traits for each individual and perform the GWAS.
In the first step, the mixed model was used, and data were analyzed using DMU software, using the formula:
where
y is the vector of the phenotypes of all F
2 individuals,
b is the vector of fixed effects of the sex,
c is the vector of litter effect as a random effect, c ~
N (0, s
c2),
a is the vector of random additive genetic effects with a ~
N (0,
As
c2) (where
A is the relationship matrix calculated from the corrected pedigree and s
c2 is the additive genetic variance).
X,
T and
Z are incidence matrices related to records of fixed and random effects in
y,
e is the vector of residual errors, e ~
N (0,
Is
e2) (where I is the identity matrix and s
e2 is the residual variance). The vector of residuals
y* is estimated as:
where
,
, and
are estimates and predictors for
b,
c and
a, respectively.
Second, the residuals were used as the dependent trait, via the formula:
where
y* is the vector of the adjusted phenotypes in the first step,
g is the vector of the genotypes which were coded as “0,” “1” and “2” corresponding to AA, AB, and BB,
k is the regression coefficient, and
e* is the vector of random residuals. Based on single locus regression analysis; the analysis was performed in the R statistical environment using the GenABEL package.
Finally, the unadjusted test statistic factor of the ith SNP Ti2 was calculated in the genomic control procedure as:
where
and var
are the estimate and sample variance of
k, respectively. The deflation factor
l is estimated as
l = median(
T12,
T22 …
Ti2)/0.456, where 0.456 is simply the median of
c2 distribution with one degree of freedom
[10]. The association of the
ith SNP with the trait was examined by comparison of
T12/
with
c(1)2.
When association studies are conducted with many SNPs, the tests performed on each SNP are usually not independent, depending on the correlation structure among the SNPs. This violation of the independence assumption limits the ability of the Bonferroni correction to effectively control type I error, and the point-wise error rate has to be adjusted in order to keep the experiment-wise error rate at a nominal level. Therefore, the number of independent tests, which is regarded as number of effective SNPs, can be correctly inferred and used in the standard Bonferroni correction to rapidly adjust for multiple testing. The number of effective SNPs, which was estimated using a simpleM method
[11], was 12039 (Table S1). A lower conventional Bonferroni
P was calculated as 4.15×10
–6 (0.05/12039)
[12] and was applied to avoid missing QTLs.
Haplotype sharing and linkage disequilibrium analysis
The genotypes of nine F
1 boars were determined using marker-assisted segregation analysis (MASS)
[13]. According to the two SNPs of INRA0027600 and H3GA0022821, the genotype of each boar was determined from a Z-score corresponding to the log
10 likelihood ratio LH1/LH0. LH1 corresponds to the likelihood of the pedigree data assuming that the boar is of Qq genotype, and LH0 corresponds to the likelihood of the pedigree data assuming that the boar is of QQ or qq genotype. Boars were considered to be Qq when
Z>2; QQ or qq when
Z<- 2; and of undetermined genotype if 2>
Z>-2. According to the MASS analysis, Q-bearing chromosomes in F
1 boars segregated and haplotype sharing analysis was performed using all 23 significant SNPs on SSC7. Haplotype block detection was performed on the chromosomal region which contained all the SNPs that were significantly associated with vertebrae number. The HAPLOVIEW V4.1 program
[14] was used to detect and visualize the haplotype blocks in this work.
NR6A1 and VRTN candidate causal variants genotyping
According to previous reports
[4,
15],
NR6A1 c.748C>T,
VRTN g.19034A>C and
VRTN g.20311_20312ins291 were selected as candidate causal mutations to genotype for 578 F
TGF individuals. A set of primers (Table S2) was used to amplify genomic DNA. Amplification was performed in a routine way with 1.5 mmol·L
−1 of MgCl
2 and optimal annealing temperatures. All PCR products were bidirectionally sequenced with original PCR primers on the 3130XL Genetic Analyzer (Applied Biosystem, Carlsbad, CA, USA).
Resequencing of the TGFβ3 and polymorphism detecting and genotyping
To resequence the porcine transforming growth factor, beta 3 (TGFβ3), a set of primers (Table S2) were used to amplify genomic DNA of the 20 F0 individuals. The resulting polymerase chain reaction (PCR) products covered all of the untranslated regions (UTR) and exons of genes. Similar to mutations of NR6A1 and VRTN genotyping, a total of 578 F2 individuals were genotyped according to the detected mutations by PCR.
Results and discussion
Detection 36 genome-wide significant SNPs on SSC1 and SSC7
The pigs had 19 (
n = 243), 20 (
n = 292), 21 (
n = 42), and 22 (
n = 1) vertebrae with a mean of 19.7. The final data set that passed quality control screening and was used in the analysis contained 43837 SNPs and came from 578 F
2 individuals. The distribution of SNPs after quality control and the average distance between adjacent SNPs on each chromosome are shown in Table 1. The results of the GWAS for vertebrae number showed that a total of 36 genome-wide significant SNPs were detected on SSC1 and SSC7. The Manhattan plot obtained from GWAS is shown in Fig. 1. These results are consistent with previously reported QTLs
[3]. When thoracic-lumbar vertebrae were used as the phenotypes instead of total vertebrae, the QTLs for number of thoracic vertebrae were focused on SSC7
[15]. Also in their research, a putative QTL for number of lumbar vertebrae was detected on SSC1 although it did not reach a significant level. Therefore, the total number of thoracic-lumbar vertebrae used as phenotype may be the possible reason for two QTLs being detected in the current study and that of Mikawa et al.
[3].
6.04-Mb region includes all 13 genome-wide significant SNPs on SSC1
On SSC1, there were 13 genome-wide significant SNPs associated with vertebrae number, within a 6.04-Mb (294.28–300.32 Mb) region. The most significant SNP could explain 25.6% of phenotypic variance. Several previous studies have mapped the QTL for vertebrae number to a similar interval on SSC1
[2–
5,
16,
17]. Using MASS, five F
1 boars proved to be heterozygous Qq genotypes (Fig. 2). However, haplotype sharing analysis showed that all five chromosomes in the Q pool share an identical descent haplotype spanning all 13 SNPs and the analysis did not refine this 6.04-Mb region using the 13 significant SNPs (data not shown). Previous fine mapping to the QTL showed that
NR6A1 was a strong candidate for the QTL and that the most likely causative mutation is a base substitution,
NR6A1 c.748C>T, which results in a proline to leucine substitution at codon 192
[4]. The strongest selection signature was observed for a locus on chromosome 1, which includes
NR6A1[18]. The distribution of
NR6A1 c.748C>T in different pig breeds showed that the C and T alleles were almost fixed in wild boars and western commercial breeds, respectively
[19,
20]. All three genotypes are represented in European and Chinese indigenous pig breeds
[19,
21]. Although the T allele was fixed in all F
0 Large White in the current work, the allele frequency was over 50% in Minzhu (Table S3). In the current study, the association of
NR6A1 c.748C>T with trait did not reach significant genome-wide association (Table 2). From a previous report, the
NR6A1 c.748C>T was predicted as causal mutation although its further functional identification is lacking
[4]. Hence more work is required to identify the function of this gene in the future.
7.17-Mb region includes all 23 genome-wide significant SNPs on SSC7
The significant genome-wide SNPs on SSC7 are shown in Table 2. A total of 23 SNPs were significantly associated within a 7.17-Mb region (102.22–109.39 Mb) on SSC7. The most significant SNP was ASGA0035537 and could explain 20.6% of the phenotypic variance. Similar to this result, a previous study reported that the QTL on SSC7 for vertebrae number could explain 4.2%–37.3% of the phenotypic variance in several distinct pig populations
[3]. Several previous studies have reported similar findings when they mapped the major QTL associated with the vertebrae number on SSC7 using different populations. This QTL was first reported to be on SSC7 in an F
2 Meishan × Duroc resource population
[16]. Subsequently, research in several populations derived from crosses between Western pig breeds (Large White, Duroc, Berkshire and Landrace) and Chinese indigenous pig breeds (Meishan and Jinhua), revealed that the identical QTL was located on SSC7
[3]. Moreover, this QTL on SSC7 was repeatedly identified in an F
2 population crossed from the commercial breeds Duroc and Pietrain
[22].
Haplotype sharing and candidate gene analysis of SSC7
Using MASS, five of the nine F
1 boars proved to be heterozygous Qq genotypes (Fig. 3). Further, we judged the QTL status of the five F
1 boars with Qq genotypes by multiple comparisons of targeted chromosomes with the reference Q (number-increase) or q (wild-type). Visual examination of the Q pools immediately reveals that all five chromosomes in the Q pool share an identical by descent haplotype spanning the ASGA0035536-ALGA0044124 interval (Fig. 4). A total of 29 annotated genes in GenBank were contained within the 2.69-Mb region containing 13 significant SNPs on SSC7. Of these genes, transmembrane p24 trafficking protein 10 (
TMED10) was the closest gene to the most significant SNP on SSC7. TMED10 belongs to the highly conserved p24 family of type I transmembrane proteins and its main function is to regulate anterograde and retrograde transport in the early secretory pathway between the endoplasmic reticulum and Golgi apparatus
[23–
25]. According to a previous study in mouse embryos,
TMED10 was found to be expressed at embryonic day 10.5
[26], which is the critical period for vertebral differentiation
[27]. Although little is known concerning the influence of
TMED10 on vertebral differentiation, this gene may be a candidate as it is the most significant SNP in the current study. The Finkel-Biskis-Jinkins murine osteosarcoma viral oncogene homolog (
FOS), one of 29 genes, has an essential role in the osteoclastic differentiation of precursor cells and the upregulation of Receptor Activator of Nuclear Factor k B (
RANK) expression in osteoclast precursors within the bone environment
[28]. This gene has been reported to be a candidate for vertebrae number
[17]. However, a candidate gene of interest is, in which a significant SNP, SIRI0001067 was located in this GWAS. The development of the vertebral column is a consequence of a segment-specific balance between proliferation, apoptosis and differentiation of mesoderm cells in embryos
[29]. In mammals,
TGFβ3 promotes chondrogenesis in posterofrontal suture-derived mesenchymal cells, influencing different stages of chondrogenic differentiation and proliferation
[30]. TGFβ3 protein, in combination with its downstream factor, TGF beta receptor type I (
ALK5), regulates the differentiation and proliferation of the spinal column
[31]. We resequenced all of UTRs and exons of
TGFβ3 and found only one SNP, which was located in the 3′-UTR (1749 bp of NM_214198 in GenBank), i.e., c.1749G>A. Also, a fixed G allele (increasing number) was detected in each F
0 Large White pig (Table S3). Rerunning GWAS suggested that
TGFβ3 c.1749G>A had significant (
P = 2.98×10
−7) genome-wide association with vertebrae number in the F
2 population. These findings support the proposal that
TGFβ3 c.1749G>A might be a candidate causal mutation, or is closely linked to a causal mutation, on SSC7 for vertebrae number in our population. However,
VRTN, which has been reported to be a strong candidate gene
[5,
15,
32], was not contained in this shared region. Two putative causal mutations, i.e., g.19034A>C and g.20311_20312ins291 of
VRTN, have been reported previously
[15]. Genotyping the two SNPs in our population revealed that only
VRTN g.19034A>C has polymorphisms. Rerunning GWAS suggested that
VRTN g.19034A>C showed significant genome-wide association (Table 2). Genotype detection in the F
0 pigs showed that none of alleles of
VRTN g.19034A>C were fixed in either breed (Table S3). When the
VRTN g.19034A>C was treated as a fixed effect to rerun GWAS, there was also an obvious peak (Fig. S1) for association, even if the top SNP (M1GA0010629,
P = 8.85×10
−5) did not reach significant chromosome-wide association. These results suggest
VRTN g.19034A>C could be proposed as a casual mutation influencing the vertebral number.
Conclusions
In summary, this study focused on vertebrae numbers in pigs. A genome-wide search identified 13 significant SNPs within a 6.04-Mb region containing the reported causal gene NR6A1 on SSC1. An additional 23 SNPs were identified within a 7.17-Mb region on SSC7 that showed genome-wide association with vertebrae number in our population. By identifying a haplotype sharing and haplotype block analysis, the QTL was refined to a chromosome segment of about 3 Mb on SSC7. In addition to the reported candidates, VRTN and TMED10, TGFβ3 on SSC7 could also be good candidate for determining vertebrae number in pigs.
The Author(s) 2017. Published by Higher Education Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0)