Introduction
QTL linkage analyses and fine mapping studies have achieved remarkable results in recent decades [
1-
3]. However, the low density markers of the genetic variation in the complex economic traits cannot be captured using this method [
4-
7]. Genome wide association study (GWAS), which utilizes a large number of high-density genetic markers throughout the entire genome, provides a new approach to detect causal variations underlying complex traits [
8,
9]. So far, GWAS has been successfully applied to identify genes involved in human diseases [
10,
11], economical traits and various complex traits in animals [
12,
13]. Our previous GWAS with an illumina 50K chip detected 105 SNPs which were significantly associated with one or multiple milk production traits <FootNote>
The Author(s) 2014.This article is published with open access at http://engineering.cae.cn
</FootNote> in dairy cattle [
14]. As the first step in gene discovery [
15,
16], the results from GWAS still need further functional annotation and validation by use of genetic association studies. Thus, we selected 38 highly significant SNPs detected with high confidence by two statistical methods from these 105 significant SNPs. Through bioinformatics and comparative genomics analysis, a total of 26 genes were found to contain or be near to at least one of 38 significant SNPs, including the well-known
DGAT1 and
GHR genes [
17,
18]. Of these, the chromosome 14 open reading frame 33 ortholog (
C14H8orf33) gene had the nearest location to the significant SNP, Hapmap30383-BTC-005848 [
14], and was considered as a promising candidate gene for milk production traits.
The
C14H8orf33 gene is located on BTA14, which includes a large number of QTLs for milk production traits, i.e.
DGAT1 [
19-
22]. The bovine
C14H8orf33 gene spans 2054 bp and contains 6 exons and 5 introns. The cDNA consists of 1220 bp with an open reading frame encoding a 188-amino acid protein. It is 313 kb away from the causal mutation K232A of the
DGAT1 gene. However, until recently, almost no relevant reports have been available for the
C14H8orf33 gene. In this research, an association study was conducted to confirm our previous GWAS result and to search for potential variants of the
C14H8orf33 gene affecting milk production traits in dairy cattle.
Materials and methods
Bioinformatics and comparative genomics analysis
To further validate the exact physical location of the 38 SNPs selected from the 105 significant SNPs, we separately compared each of the 60 bp upstream and downstream nucleotide sequences with NCBI (http://www.ncbi.nlm.nih.gov) and UCSC (http://genome.ucsc.edu/) website Btau 3.1 databases. From the exact physical location, we inferred the gene that the SNP was located within or near to.
The potential biochemistry and physiology of the gene based on its genome sequence was predicted by searching for the homologous and similar sequences from cattle, human and mouse. For the purpose of precise and accurate prediction, we used the websites; NCBI (http://www.ncbi.nlm.nih.gov), Ensemble (http://asia.ensembl.org/index.html), Uniprot (http://www.uniprot.org), KEGG (http://www.genome.jp/KEGG), GeneCards (http://www.genecards.org) and wikipathways (http://www.wikipathways.org) to achieve functional annotations for each gene.
Animal resource and DNA extraction
A daughter design was employed in this study. A total of 742 daughters from 14 corresponding sires were selected to construct the study population. The numbers of daughters for each of the 14 sires ranged from 22 to 125. These daughters were from 15 dairy farms in Beijing Sanyuanlvhe Dairy Farming Center. The official estimated breeding values (EBVs) for the five milk production traits, including milk yield (MY), fat yield (FY), protein yield (PY), fat percentage (FP) and protein percentage (PP) were provided by the Dairy Data Center of the Dairy Association of China (DAC) (http://www.holstein.org.cn). Genomic DNA was isolated from whole blood samples of cows and frozen semen of sires. A DNA pool was constructed from the DNA of 14 sires at the same concentration of 50 ng·μL-1.
SNP identification and genotyping
A total of 18 pairs of PCR primers (Appendix A, Table S1) were designed with Primer Premier 3 (Premier, Canada), according to the genomic sequence of the bovine
C14H8orf33 gene, to amplify all exons plus 5′ and 3′ franking regions. The SNPs identified using the pooled DNA from daughters of 14 sires, further SNPs were genotyped for all experimental cows using the iPLEX MassArray system (Sequenom Inc.). In addition, the SNP Hapmap30383-BTC-005848 from a previous GWAS [
14] was genotyped for the purpose of replication in this study.
Statistical analyses
Allele and genotype frequencies were compared between the mutant and wild type through a chi-square test. The chi-square tests were also used to determine whether individual variants were in equilibrium at each locus by comparing the expected and observed genotype frequencies (Hardy–Weinberg equilibrium). Pedigrees of the population were traced back for three generations to create the relationship matrix. We calculated linkage disequilibrium between all pairs of biallelic loci using HAPLOVIEW 4.2. For single locus and haplotype analyses, the mixed procedure in SAS 9.1.3 with the animal model was fitted as follows:
Where
y is the vector of EBVs for each trait,
μ is the overall mean, b is the regression coefficient of EBVs on SNP genotypes,
x is the fixed effect vector, a is the vector of polygenetic effects with
a~N (0,
) (where A is the additive kinship matrix and
is the additive variance), and
e is the vector of residual errors distributed as
e~N (0,
) [
23].
Total RNA isolation and cDNA synthesis
Total RNA from 8 different tissues, i.e., heart, liver, small intestine, kidney, mammary gland, ovary, uterus and gluteus, was extracted using Trizol Reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s protocols and DNA contamination removed from RNA extracts with RNase-free DNase I for 30 min at 37°C. RNA integrity was checked by 1% agarose gels and the quantity was detected with NANODROP 2000 (Thermo Scientific, DE, USA). One microgram of total RNA for each tissue was reverse transcribed by PrimeScript® RT reagent Kit (TaKaRa, Ostn, Japan) to obtain the cDNA. Each cDNA sample was amplified to ascertain its quality with a pair of specific primers for GAPDH, which covers two partial adjacent exons and the whole intron between those two exons.
Real-time quantitative RT-PCR
With the primers as shown in Appendix A (table S1), quantitative real-time RT-PCR was carried out with a LightCycler 480 Real-Time PCR System (Roche, Hercules, CA, USA). The reaction condition were as follows: pre-denaturation at 95°C for 10 s; amplification 45 cycles of 95°C for 10 s, 60°C for 10 s, and 72°C for 10 s. The relative expression level was normalized by the GAPDH with 2ΔΔCT method as described previously (Livak and Schmittgen, 2001). All the measurements of C14H8orf33 gene expression in different tissues were performed in triplicate, and the average values obtained. These data were analyzed by a t-test using the SAS9.0 program (SAS Institute, Inc., Cary, NC, USA), with a P value of<0.05 considered significant.
Results
Function annotation
Based on the 38 SNPs selected with high confidence from the 105 significant SNPs identified by our initial GWAS, a total of 26 genes were obtained through bioinformatics and comparative genomics analysis, and their functions placed into seven major categories: including body metabolism and nutrient balance; cytoskeleton or extracellular matrix components; regulation of cell proliferation and apoptosis; cell signal transduction and salt ion channel composition; kinase activity; mRNA transcription and translation regulation (Table 1).
SNP identification and selection
By sequencing of pooled DNA from daughters of 14 unrelated sires in a Chinese Holstein population, a total of 18, including seven novel SNPs were identified (Table 2). Among them, nine SNPs were in the 5′ regulatory region, one in exon 6 and the remainder in the 3′ UTR and 3′ regulatory region. The SNP in exon 6 was a non-synonymous SNP with the amino acid alteration from proline (CCC) to histidine (CAC). Nine out of these identified SNPs were successfully genotyped by mass spectrometry and analyzed for association with five milk production traits in an independent resource population. Chi-square test showed all nine SNPs were in Hardy–Weinberg equilibrium (P>0.05). The genotypic and allele frequencies are shown in Table 3.
Associations analyses
The association results are shown in Table 4. These SNPs were significantly associated with protein yield [
P<(0.0001-0.0267)] but not associated with other milk production traits (
P>0.05) in present study. In addition, the SNP Hapmap30383-BTC-005848 identified in our initial GWAS [
14] was successfully confirmed to have significant associations with MY, PY, FP and PP in this independent dairy cattle population. This provided convincing statistical evidence for our previous study.
Linkage disequilibrium analysis
The LD block generated by all nine SNPs within 5 kb (Fig. 1), consisted of three haplotypes, TCACCGTTT, AACTGACAC and TCACCGCTT with frequencies of 0.44, 0.39 and 0.15, respectively. The statistical analysis of the haplotypes with EBVs of five milk production traits showed that the haplotypes were associated with PY (P = 2.31×104) (Table 5). The results were consistent with the associations of single SNPs.
Expression analysis of the bovine C14H8orf33 gene
The relative mRNA expression of C14H8orf33 in eight different tissues was determined by quantitative real-time PCR. The results revealed that C14H8orf33 was ubiquitous in these eight tissues, and at a relatively higher expression level in the mammary gland than in other tissues. In addition, the expression of C14H8orf33 in small intestine, kidney, ovary and uterus was also relatively higher than in three other tissues (Fig. 2).
Discussion
In this study, we annotated the function of 26 genes that correspond to 31 of 38 SNPs identified as highly significant via bioinformatics and comparative genomics analysis, and identified several novel C14H8orf33 variants associated with milk production traits.
In previous studies, the known functional genes
DGAT1 [
24],
ABCG2 [
25] and
SCD1 for milk production traits [
26] have been observed to have high expression in mammary tissue of mammals. In this study, we found that the
C14H8orf33 gene was also expressed at a relatively higher level in the mammary gland compared with seven other tissues, indicating its importance in mammary biologic processing in dairy cattle. Furthermore, our association data showed that the nine identified SNPs, including the SNP Hapmap30383-BTC-005848 identified by our initial GWAS [
14], in the
C14H8orf33 gene were significantly associated with at least one milk trait. Therefore, it was inferred that the
C14H8orf33 gene showed relatively independent effects on the milk traits. At the same time, our findings provided convincing evidence for our previous GWAS study by a replication study. In conjunction with association analyses, the SNP Hapmap30383-BTC-005848 in the 3′ UTR in the
C14H8orf33 gene could be selected to examine whether this mutation is involved in interaction with some miRNA in a follow-up investigation.
In addition, the C14H8orf33 gene is 313 kb away from the causal mutation K232A of the true QTL for milk composition, i.e., the DGAT1 gene. Although the significant associations of the C14H8orf33 gene with milk production traits were associated with higher expression in mammary gland of lactating cows, it is suspected that these significant associations could be due to the linkage disequilibrium (LD) between C14H8orf33 and DGAT1. We found a total of 25 genes between C14H8orf33 and DGAT1 and further investigations are needed in order to verify whether the strong LD exist between these two genes.
Conclusions
This study provided strong evidence for association of C14H8orf33 variants with milk yield and milk composition traits and may be applied in Chinese Holstein breeding programs.
Higher Education Press and Springer-Verlag Berlin Heidelberg