Introduction
Innate immunity is the first line of defense against pathogenic microorganisms. However, when pathogens break this early defense, the adaptive humoral immune system produces specific antibodies (Abs) from terminally differentiated B lymphocytes, which bind to specific targets or antigens derived from the pathogen to defend against further infection. At the same time, long-term memory is developed for this pathogen, such that it will be recognized upon reinfection. The <FootNote>
The Author(s) 2015.This article is published with open access at http://engineering.cae.cn
</FootNote> structure of conventional antibodies, which are assembled from two identical large molecular weight heavy (H)-chain and two identical small molecular weight light (L)-chain polypeptides, is highly conserved. The two heavy chains are linked to each other by disulfide bonds, and each heavy chain is linked to a single light chain by another disulfide bond[
1]. In camelids, rather unusually, HCAbs are found, which are completely devoid of light chains. The heavy chain is also missing its first constant domain (CH1)[
2]. These HCAbs have also been found to occur in humans in the early years, where large and various parts of the VH and CH1 can be absent, leading to antibodies devoid of light chains[
3]. New antigen receptors (NARs) are similar to camelid HCAbs and have been reported in nurse and wobbegong sharks and in spotted ratfish[
4]. These IgNARs exhibit structural and functional convergent evolution with camelid HCAbs[
5].
HCAbs are highly stable and are more stable than conventional antibodies in some cases. These antibodies are able to respond to antigens at high temperatures and at high concentrations of denaturants[
6]. Sequence and structure analyses have revealed several HCAb features that are distinct from those of their conventional counterparts. First, the highly conserved hydrophobic amino acids, Val42, Gly49, Leu50 and Trp52, which occur at the second framework (FR2) in typical VH domains, are frequently substituted by the residues Phe42 (or Tyr42), Glu49, Arg50 (or Cys50), and Gly52, in the VHH domain[
7]. The side chains of these four residues in the FR2 region of the VHH domain deviate from the corresponding positions in the VH domain, and this more hydrophilic surface abrogates the conserved, hydrophobic binding with a VL[
8]. Furthermore, these amino acid substitutions might enhance the solubility of the isolated VHH domain[
6]. The absence of a light chain in a VHH is compensated for by an extended complementarity-determining region-1 (CDR1) and a longer CDR3. Due to the increased length of the H3 loop, the antigen binding surface is enlarged in the VHH domain. This added flexibility enables greater diversity despite the lack of a VL[
9]. In addition, unlike in VHs, an interloop disulfide bond, composed of two cysteines, is frequently observed between the CDR1 and CDR3 in dromedary VHHs. This interloop disulfide bond plays an important role in reducing the conformational flexibility of the long CDR3 loop[
10]. Finally, CH1, which typically forms covalent bonds with CL domains in conventional Abs (cAbs), is present in the genome of the heavy chain isotype for the HCAbs but is absent after mRNA splicing. This finding might be due to the presence of a point mutation at the conserved GT in the splice signal at the 3′ end of the CH1 exon to AT[
11]. Due to a relatively long and protruding H3 loop and the absence of light chains, HCAbs can interact with concave antigen binding surfaces and might be better adapted to target epitopes that are normally inaccessible for conventional antibodies[
12]. Due to the long CDR3, HCAbs can also insert into the active site of certain enzymes to inhibit their activity[
13].
Bactrian camels are phylogenetically very close to dromedaries and are important livestock in north-west China. Whole-genome sequencing of the Bactrian camel has been reported[
14,
15]. The existence of partial immunoglobulin genes and a comparative analysis of conventional Abs in the Bactrian camel and the dromedary have also been reported[
16], but the analysis of HCAbs in Bactrian camels was not comprehensive. In this study, we provide a more thorough analysis of Bactrian camel IgH genes based on our own sequence data and on previous studies.
Materials and methods
Animals, RNA isolation, and reverse transcription
The spleen of a single camel (Camelus bactrianus) was obtained from Xilinhaote abattoir, Inner Mongolia, China. Total RNA was isolated from spleen tissue using a TRIzol kit (CWBio, Beijing, China) following the manufacturer’s instructions and was treated with RNase-free DNase I (Qiagen, Beijing, China). Reverse transcription was conducted using Moloney Murine Leukemia Virus Reverse Transcriptase according to the manufacturer’s instructions (Promega, Madison, WI, USA). The oligo (dT) adapter primer Not I-d (T)18 was used (Appendix A, Table S1).
Amplification of the IgM portion of the CH1 domain using gene-specific primers
Using the previously published IgM CH1 genomic sequences of dromedary and alpaca[
14,
17], we designed two gene-specific primers from conserved sequences in both species. RT-PCR using these primers, plus Bactrian camel spleen cDNA and
Taq DNA polymerase (CWBio, Beijing, China) was conducted under the following conditions: 94°C for 5 min; 35 cycles of 94°C for 30 s, 60°C for 30 s, and 72°C for 30 s; and a final extension at 72°C for 7 min. The resultant PCR product (approximately 200 bp) was cloned into the pMD19-T vector (Takara) for sequencing.
5′ and 3′ RACE PCR reaction and construction of Bactrian camel mini-IgH cDNA libraries
Primers derived from IgHM (IgM heavy chain) constant regions were used for 5′ RACE amplification, with spleen RNA as the template. The 5′ RACE System kit was used (Version 2.0, Invitrogen, NY, USA). After screening with the most frequently used IgH (immunoglobulin gene heavy chain) joining-segment coding sequence, 3′ RACE (Invitrogen, NY, USA) was used. Every expressed IgG sub-isotype, IgM and IgE was screened with specific primers that were designed based on their unique regions. A second 5′ RACE System (Invitrogen, NY, USA) was used to acquire diversity specific non-redundant VH and VHH sequences, which belong to IgG1a/b and IgG3. The resultant PCR products were cloned into a pMD19-T vector and sequenced. First-strand cDNA was synthesized using Not I-d (T) 18 primers (Appendix A, Table S1). One pair of primers, IgHMC1SP and IgHMC1AP, was designed using alpaca conserved IgM CH regions. Primers IgHMVGSP1 and IgHMVGSP2 were used to perform the 5′ RACE, whereas primers JHSP and RT-P1 were used to perform the 3′ RACE. Primers IgHMCSP and IgHMCAP were used to screen for IgM clones, and primers IgHGCAP, IgHG1aHSP, IgHG1bHSP, IgHG2aHSP, IgHG2cHSP and IgHG3HSP were used to screen for different isotypes of IgG.
According to the previously published constant domain sequences of alpaca and dromedary IgE, JHSP, IgHECAP, IgHECSP and RT-P1 were used to amplify the IgE constant region.
To investigate the different variable region libraries derived from HCAbs and cAbs, two additional 5′ RACE experiments were performed. Primer IgHGVGSP1 was designed to amplify the CH2 region. In one experiment, the γ1a and γ1b gene constant sequences were aligned, and a primer was designed based on the CH1 identity region (IgHG1a/bVGSP2). In the other 5′ RACE experiment, the specific primer IgHG3VGS-P2 was designed based on the hinge region encoded by the γ3 gene.
Genome sequence confirmation
High molecular weight genomic DNA was isolated from spleen using a standard phenol-chloroform extraction method. A sense primer (IgHGC1SP) was designed based on the Bactrian camel conserved IgG CH1 region (Appendix A, Table S1); IgHG1aHAP, IgHG1bHAP, IgHG2aHAP, IgHG2cHAP and IgHG3VGSP2 (anti-sense primers) were used to amplify part of the CH1 exon and part of the hinge exon. The resultant PCR products were cloned into pMD19-T and sequenced.
Sequence computations
Homologous sequences were identified using the Basic Local Alignment Search Tool (BLAST) and the National Center for Biotechnology Information database. DNA and protein sequence editing, alignments, and comparisons were performed using Bioedit software. Multiple sequence alignments were performed using ClustalW, and MEGA 5.0 was used to assist the software. FUZZNUC software was used to search for the RSS signal on the genome sequences. Statistical analysis was performed using the R language and SPSS 21. Images were prepared using Origin 8.6, Photoshop 12.0.1 and GeneDoc 2.6.0.2. The online software ExPASy and WEBLOGO (http://weblogo.ber-keley.edu/logo.cgi) were used to analyze the acquired database.
Phylogenetic analysis
Phylogenetic trees were generated using MrBayes3.1.2 and were viewed in FigTree1.4.1. A phylogenetic tree of mammalian VH genes was constructed using the nucleotide sequences of FR1-FR3 as defined by the IMGT numbering system. One representative VH sequence was selected for each of the VH families based on our study of the IgH-specific mini-cDNA libraries in Bactrian camel. Functional or potentially functional germline sequences and cDNA sequences in other species were included. Nurse shark VH1 was used as an outgroup (Appendix A, Table S2). Another phylogenetic tree of Bactrian camel IgH constant genes was constructed using amino acid sequences. Nurse shark IgM was used as anoutgroup in this instance (Appendix A, Table S2).
Results and discussion
Construction of IgH-specific mini-cDNA libraries
Construction of the variable region mini-cDNA library of IgM
The variable region library of IgM was investigated using 5′ RACE of spleen RNA that was isolated from one Bactrian camel. The conserved CH1 fragment was sequenced, and 98% identity was confirmed between alpaca and Bactrian camel by BLAST analysis. Two hundred and 12 unique clones were further analyzed after removing duplicate, incomplete and nonfunctional sequences.
Construction of immunoglobulin heavy chain gene constant region libraries
A 3′ RACE experiment was performed to obtain the maximum number of different immunoglobulin heavy chain gene isotypes. The most common JH sequence, 5′-GGCCAGGGGACCCAGGTCACCGTCTCCTCAGAG-3′, was used to design the sense primer for subsequent experiments. This experiment resulted in two PCR fragments (approximately 1000 and 1600 bp), which were subjected to further PCR analysis and sequencing. One
μ-encoding gene, five
γ-encoding genes and one
α-encoding gene were screened from two IgH-specific mini-libraries, as described above. We also identified one
ε-encoding gene. According to the published literature and the BLAST alignment, five
γ-encoding genes exhibit obvious differences in the hinge region and are considered to generate two conventional Abs (
γ1a and
γ1b) and three HCAbs (
γ2a,
γ2c and
γ3) because their cDNA was devoid of the entire CH1-exon[
18].
Construction of variable region libraries derived from HCAb and cAb
One hundred and 90 two unique clones representing IgG3 and 188 unique clones representing IgG1a/b were used for further analysis.
Analysis of heavy chain variable region sequences
Family classification and phylogenetic analysis of VH/VHH segments
Phylogenetic analysis revealed two VH families associated with IgM, according to the 75% identity criterion[
19]. Among 212 unique sequences, 169 exhibited homology to the human VH3 family, and the remaining 43 exhibited homology to the human VH4 family. To analyze the variable region cDNA mini-library of IgG1a/b, 188 unique sequences were divided into two families; 139 sequences were allocated to the VH3 family, and the remaining sequences were allocated to the VH4 family.
In the variable region cDNA mini-library of IgHG3, all sequences exhibited homology to the human VH3 family. However, 187 sequences were considered VHH3 sequences because four hallmark amino acids were found in FR2[
7]. The remaining 5 clones exhibited conventional
V gene features (<CitationRef CitationID="fig000301"/>). A previous study reported that HCAbs that contain normal VH sequences are functionally active against pathogens[
20].
When analyzing the VH3, VHH3 and VH4 genes, we found that the Kozak sequences were highly conserved in these sequences. Statistical analysis of the results revealed that 97.6%, 92.6% and 92.7% clones of these three gene families, respectively, contain T/ACACC or GGAAG in the corresponding positions (<CitationRef CitationID="fig000302"/>).
As reported in previous genomic analyses, the V-regions containing the MELG leader sequence belong to the VH3 family, whereas the MRLL leader sequence is associated with V-regions of VH4 family members[
21]. This was also observed in our study (Appendix A, Table S3). Furthermore, we observed that the Kozak sequence is unique to individual VH families and is significantly related to individual leader sequences. Clones containing the Kozak sequence T/ACACC were all VH3 family members and contained the MELG leader signal. Clones containing the GGAAG sequence were generally VH4 family members (only one member of the VH3 family contained GGAAG) and contained the MRLL leader signal (Appendix A, Table S4).
A phylogenetic tree of VH genes showed that the Bactrian VH3 family was closely related to the dromedary VH3 family and that both belong to clan III (<CitationRef CitationID="fig000303"/>). Bactrian VHH3 sequences also clustered most closely with clan III but were distinct from the Bactrian VH3 family. VH4 clones clustered most closely with the dromedary VH4 family and fell into clan II.
Hypervariability of Bactrian camel VH/VHH cDNA
Based on the special structure and function of
VHH3 genes, the sequences of VH3 clones from IgM or IgG1a/b libraries were compared separately to VHH3 sequences from IgG3. The most frequently occurring sequences were chosen, and the sizes of both CDR1 and CDR2 were not more than eight residues. An additional highest variability index value was observed in the CDR1 of the VHH3 sequences was higher than that for the VH3 sequences derived from IgM and IgG1a/b at residues 27−30 (<CitationRef CitationID="fig000304"/>). This result was an agreement with previously published data that were based on the Kabat system of amino acid numbering, showing that an additional hypervariable region was present in the CDR1 of the VHH sequences[
22]. However, no major differences were found in the VH3 sequences associated with IgM and IgG1a/b (<CitationRef CitationID="fig000304"/>). Two-way ANOVA analysis of the variability index revealed that the VHH3 sequences were significantly different from the VH3 sequences of IgM or IgG1a/b (Appendix A, Table S5).
Analysis of CDR3 length
According to the IMGT CDR definition, the CDR3 length derived from VHH3 ranged from 9 to 31 aa (one clone CDR3 length was 5 aa) but was most commonly between 18 and 22 aa (<CitationRef CitationID="fig000305"/>). The average CDR3 derived from HCAbs (20 aa) was longer than the corresponding region of VH3 sequences derived from IgM (12.7 aa) or IgG1a/b (13.8 aa). The average length of the Bactrian camel CDR3 domains in VHH3 was longer than that of the VHHs in dromedary (15 aa), llama (14.9 aa) or alpaca (17.8 aa)[
23]. No major differences were detected in CDR3 length between the VH4 sequences from IgM (ranging from 10 to 23 aa) and IgG1a/b (ranging from 10 to 24 aa).
Analysis of the potential interloop disulfide bond
Two cysteine residues (one of which is usually located in the CDR1 region and the other of which is located in the CDR3 region) were included in a number of VHH3 sequences and are known to form an interloop disulfide bond that enhances structural stability. Platypus and shark antibodies also contain an interloop disulfide bond, which is located in the variable domains, to restrict the flexibility of the CDR loops[
24,
25]; however, this extra disulfide bond is observed less often in llama VHH[
26]. We found that 133 sequences included two conserved Cys at the corresponding positions and that 59 VHH3 sequences contained no more than one Cys. A chi-square test was used to investigate the variability of the CDR3 length in relation to the presence or absence of the interloop disulfide bond. No correlation was found between CDR3 length and the presence of the interloop disulfide bond (Appendix A, Table S6). Among 212 VH sequences obtained from IgM, only two sequences contained two conserved Cys at the corresponding positions. No potential interloop disulfide bond was found among 188 VH sequences from IgG1a/b.
Analysis of positions that are critical for the variable domain immunoglobulin fold
A statistical analysis of the key positions that shape the structure of antibodies[
27] was performed using WebLogo[
28]. An alignment of the amino acid sequences of VH3DJ and VHH3DJ revealed that nearly all residues that are important for the variable domain structure are conserved. As indicated in <CitationRef CitationID="fig000306"/>, there is no major difference between the VH3DJ sequences from IgM and IgG1a/b. However, a clear difference was observed between the VH3DJ and VHH3DJ sequences (<CitationRef CitationID="fig000306"/>), in particular at positions 12, 15, 24, 30, 42, 49, 50, 52, 54, 96 and 106. Previous reports have shown that VH3 and VHH3 are distinguished by substitutions from highly conserved hydrophobic amino acids (Val42, Gly49, Leu50, Trp52) in classical Abs to amino acids (Phe or Tyr 42, Glu49, Arg or Cys 50, Gly or Phe52) in HCAbs; this is consistent with our results (<CitationRef CitationID="fig000306"/>).
Analysis of DH and JH genes
Recently, the whole genome of the Bactrian camel has been published[
15], and alpaca
DH and
JH genes and Bactrian JH sequences were used to search the Bactrian camel genome[
17]. Seven DH sequences and six JH genes were located in scaffold 131 (accession number: NW_011509943) (<CitationRef CitationID="fig000307"/>). Analysis of these DH and JH sequences indicated that a high homology exists between Bactrian camel and alpaca, and the same
DH and
JH genes were observed (<CitationRef CitationID="fig000307"/>). We speculated that these genes might be derived from a common ancestor. Due to the occurrence of somatic hypermutation in these regions, the discrimination of specific
DH/JH genes was difficult. However,
JH3 and
JH5 genes occurred most frequently.
Structural and phylogenetic analysis of the heavy chain constant region
μ gene
The Bactrian camel μ gene was found in scaffold 131 (<CitationRef CitationID="fig000307"/>) of the whole-genome sequence, and the secreted form was obtained using 3′ RACE. Sequence alignment showed that this gene had overall amino acid sequence identities of 64% and 60% to the human and mouse μ genes and of 97% and 95% to the dromedary and alpaca μ genes, respectively. The identity of the Bactrian camel μ gene was also confirmed by phylogenetic analysis (<CitationRef CitationID="fig000308"/>), the results of which suggested that the Bactrian camel μ gene was most closely related to its counterparts in dromedary and alpaca.
The alignment of the
μ gene between Bactrian camel and other mammals showed that the distribution of cysteine and tryptophan residues is conserved. The Cys residues that form intra- and inter-H chain disulfide bonds and the covalent bonding of the L chains were also conserved (Supplemental Fig. S1). The Bactrian camel
μ gene also presented conserved Cys residues that were found in CH3 and a secretory tail that covalently polymerizes pentamericIgM in mammals[
29]. Utilizing ExPASy software, five putative N-linked glycosylation sites (N-X-S/T) were discovered at the same position in Bactrian camel and dromedary, and one of these sites was highly conserved in camelids.
γ genes
Five isotypes of
γ genes, which can be distinguished based on their hinge regions, were screened from 531 clones within the CH mini-libraries. Comparison of the Bactrian camel
γ1a gene with the corresponding alpaca sequence showed 94.4% nucleotide sequence identity. Comparison of the Bactrian camel
γ1b gene with the corresponding alpaca sequences showed 95.8% sequence identity. By analyzing the genome sequences of the Bactrian camel,
γ1a and
γ1b were not located in any scaffolds[
15]; however,
γ2a and
γ2c were discovered in scaffold 27 (NW_011517098) and in scaffold 131. The hinge domains of Bactrian camel IgG2a and IgG2c were identical to those of dromedary IgG2a and IgG2c, respectively (<CitationRef CitationID="fig000309"/>).
The hinge exon of IgG3 was found in contig 15740 (JARL01015740) (<CitationRef CitationID="fig000307"/>). One hundred percent nucleotide sequence identity was found with the γ3 sequence that was screened from the cDNA mini-library. Comparison of Bactrian camel γ3 gene with the corresponding dromedary and alpaca genes indicated that two amino acids differed (GG to EV) in the hinge region.
The phylogenetic trees of the constant region of these HCAbs and of conventional Abs showed that the camelid IgG isotypes are closely related both to each other and to bovine and sheep IgGs (<CitationRef CitationID="fig000308"/>).
To confirm whether the mutations that have occurred on the bactrian camel genome corresponded to regions that have been reported in dromedary[
11], the CH1-Hinge genomic sequences of all five
γ genes were amplified from a bactrian camel. As indicated in Fig. 8, a comparison of two conventional Abs (
γ1a and
γ1b) with three other Bactrian camel genes (
γ2a,
γ2c and
γ3) showed that a mutation has occurred at the canonical splice site that is located at the 3′ CH1/intron border, changing GTAAG to ATAAG (<CitationRef CitationID="fig000309"/>).
α and ε genes
Of the sequenced 3′ RACE products, some were highly homologous to human and alpaca IgA, with 62.8% and 91.3% nucleotide sequence identities, respectively. The phylogenetic tree also indicated that this sequence represented the Bactrian camel α gene (<CitationRef CitationID="fig000308"/>).
Based on a previously published alpaca IgE heavy chain constant sequence, we confirmed that the Bactrian camel ε gene shared 98.1% and 55.2% nucleotide identity with the corresponding regions in alpaca and humans, respectively. According to the phylogenetic tree, the Bactrian camel ε gene is most closely related to the alpaca ε gene.
Conclusions
In summary, we analyzed the Bactrian camel immunoglobulin heavy chain genes and obtained new knowledge regarding their structure. Phylogenetic analysis indicated that Bactrian camel immunoglobulin genes are similar to dromedary genes.
Endnote The sequences presented in this article have been submitted to the National Center for Biotechnology Information’s GenBank database (http://www.ncbi.nlm.nih.gov/nucco-re) underaccession numbers KP999944 – KP999951.
Higher Education Press and Springer-Verlag Berlin Heidelberg