Introduction
In the humoral immune system, a diverse antibody repertoire is essential to produce effective and specific immune responses. During the development of B cells, the antibody (or B cell receptor, BCR) repertoire is generated by somatic joining of the variable (V), diversity (D), and joining (J) gene segments at immunoglobulin heavy (IgH) chain and light (IgL) chain loci [
1]. Diversity of each chain is determined by various V
H-D-J
H or V
κ/λ-J
κ/λ combinations and unpredictable junctional sequences, which are created
de novo by the imprecise joining of gene segments and the varied insertion of non-templated (N) and palindromic (P) nucleotides. The most diverse V
H-D-J
H junction encodes the third complementarity-determining region of the IgH chain variable region (CDRH3), in which both length and amino acid composition play a vital role in defining the BCR specificity [
2].
In theory, the potential BCR repertoire has been estimated to surpass 10
8 potential sequence variants. However, the majority of the initially generated IgH and/or IgL chains are successively removed from the whole repertoire, when the developing B cells pass through several checkpoints depending on positive and/or negative selections [
3,
4]. During the first checkpoint, pre-B cells are positively selected for the expression of functional μ heavy chains (μHCs) encoded by the productively rearranged IgH alleles [
5]. The μHC has the ability to pair with the surrogate light chain to form the pre-BCR, which is transported to the cell surface and induces the proliferation of the pre-B cells [
6,
7]. A few studies have indicated that autoreactivity of the pre-BCR is crucially important for inducing the expansion of pre-B cells with a productive IgH chain rearrangement [
8,
9]. The second checkpoint is usually referred to as central B cell tolerance, a process that negatively selects the autoreactive immature B cells in the bone marrow environment with high efficiency and stringency [
10]. Receptor editing, anergy, and deletion are three known mechanisms utilized to establish the tolerance [
11-
15]. In wild-type mice, receptor editing at the IgL chain loci play a major role in silencing of autoreactive B cells (~50%), and a small proportion of low avidity autoreactive B cells become anergic, whereas the very few remaining undergo deletion [
16,
17]. When the newly formed immature B cells migrate to peripheral lymphoid tissues as transitional B cells, they continue to experience multiple checkpoints before finally develop entering the mature B cell pool [
18]. In contrast to the detailed investigations of the central B cell tolerance, relatively little is known about the mechanisms underlying the peripheral B cell tolerance. However, several reports suggest that the transitional B cells probably experience both negative and positive selection [
19-
23].
The effect of positive and/or negative selection could be reflected in differences between BCR repertoires of different B cell populations. Several studies by Schroeder Jr et al. elucidated detailed features of the CDRH3 repertoire in various murine B cell populations, in terms of length, amino acid composition, and average hydrophobicity. Among them, the key feature of CDRH3 repertoire development in both BALB/c and C57BL/6 mice is an increase in average CDRH3 length with B cell maturation [
24-
26]. However, use of all the above features as accurate reflections of the entire BCR repertoire has been challenged, as they were deduced from the analysis of a limited number of μ chain transcripts containing only V
H7183 family members.
In the last few years, high-throughput sequencing technologies have been widely utilized to describe the repertoire of antibody and T cell receptor (TCR) in humans and zebrafish [
27-
33]. Here, to comprehensively understand the effects of selections on BCR repertoire during the B cell development, we describe and compare the IgH repertoire of pre-, immature and spleen B cells in C57BL/6 mice using 454 high-throughput pyrosequencing with particular attention to the usage profiles of V
H, D, and J
H genes and the repertoire of CDRH3. From the viewpoint of comparative studies, these results will provide the basis for further investigation of the BCR development in domestic animals.
Materials and methods
Mice
C57BL/6 mice of clean grade were purchased from Vital River Laboratories (Beijing, China) and housed and cared for in a barrier environment according to the national standard (GB14925-2001) issued by General Administration of Quality Supervision, Inspection and Quarantine of China.
B cell isolation and cell sorting
The pooled bone marrows and spleens were collected from eight 6-8 weeks old mice (four male and four female). Single-cell suspensions were prepared by passing tissue samples through 200 μm nylon mesh and resuspending the filtered cells in phosphate buffered saline (PBS). Erythrocytes were lysed in red-blood-cell (RBC) lysis buffer (eBioscience, San Diego, CA, USA). The mixture of pre-B and immature B (IM-B) cells were isolated from bone marrow single-cell suspension by depletion of CD43+ pro-B cells and non-B cells, using MACS B cell Isolation Kit (Miltenyi Biotec, Bergisch Gladbach, Germany). The IM-B cells were than positively selected from the mixture with the Anti-Mouse IgM MicroBeads (Miltenyi Biotec), and the pre-B cells were the negative fraction. Spleen total B (S-B) cells were positively selected from a spleen single-cell suspension using CD45R(B220) MicroBeads (Miltenyi Biotec). Cells (~105) of each B cell subset were stained with PE-Cy5.5-anti-mouse B220, FITC-anti-mouse IgM, and PE-anti-mouse CD43, and the purity of each B cell subset was analyzed on a MoFLo High-performance cell sorter (DakoCytomation, Fort Collins, CO, USA).
mRNA preparation, cDNA synthesis, and PCR
Total RNA samples were isolated from each B cell subset using
mirVana
TM miRNA Isolation Kit (Ambion, Austin, TX, USA), and reverse transcription was conducted using mouse C
μ-specific primer (RTC
μ) [
34] and M-MLV Reverse Transcriptase (Promega, Madison, WI, USA) following the user’s instructions. Recombined IgH VDJ regions were amplified from each cDNA sample using a multiplex of 17 mouse V
H family-specific upstream primers and a mouse C
μ-specific downstream primer (C
μ0) (a 454-adaptor sequence was added at the 5′ end of each primer) [
34]. A touchdown PCR was performed with 25 μL 2 × Phusion HF PCR Master Mix (NEB, Beverley, MA, USA) and 10 μL cDNA per sample, using the following protocol: 95°C for 2 min, then touchdown PCR for 6 cycles (94°C for 1 min, from 67°C to 57°C decreasing 2°C per cycle for 1 min and 72°C for 1 min), followed by 14 cycles of 94°C for 1 min, 57°C for 1 min, and 72°C for 1 min, and a final extension step of 68°C for 10 min. To minimize PCR amplification bias, for each B cell subset sample, three C
μ-specific downstream primers with three different barcoding index sequences (C
μ1 to C
μ3) were separately utilized to repeat the preceding PCR. All primer sequences are given in Appendix-Table S1.
Amplicon purification and sequencing
PCR amplicons (350-450 bp) were purified by 1.2% agarose gel electrophoresis and MinElute Gel Extraction Kit (Qiagen, Hilden, Germany). For each B cell subset sample, 4 separate amplicons using Cμ0 to Cμ3 were mixed in equal amount before sequencing on the 454 GS FLX sequencer (Roche, Basel, Switzerland). For sequencing, the DNA libraries were treated according to the operation manual.
Sequence analysis
A local BLAST database was constructed using the C57BL/6 reference germline V
H, D, and J
H sequences [
35-
37], and the BLAST algorithm was prepared according to the NCBI IgBLAST (http://www.ncbi.nlm.nih.gov/igblast/). All reads were aligned by the local BLAST to each germline V
H and J
H gene, and the highest scored V
H and J
H genes were used for the following analysis. The algorithm to determine the V
H-D-J
H junctions and to identify the 3′ V-regions, 5′ J-regions, D-regions, P-regions, and N-regions were designed on the basis of the IMGT/JunctionAnalysis [
38]. Reads were considered to be informative antibody sequences if they passed the following quality control criteria: a minimum length of 300 bp; identified V
H and J
H genes; and absence of ambiguous nucleotides in the junction region. A non-redundant rearrangements library was constructed when the clonally related sequences were reduced. From these unique sequences, the V
HDJ
H combination repertoire, antibody repertoire and CDRH3 characteristics were calculated. The definitions of CDRH3 and CDRH3-loop were described in the reference [
24,
39].
Statistical analysis
Data were combined and subsequently analyzed in Excel (Microsoft, USA). Differences between B cell populations were assessed where appropriate by two-tailed Student’s t-test and χ2 test. Correlations were performed either by weighting the read-abundance for a particular VH, VH family, D, JH, and VHDJH class or by weighting the total number for each amino acid in the CDRH3 loop.
Results
Deep sequencing of the mouse IgH repertoire in different B cell populations
Pre-B (B220+, CD43–, and IgM–), IM-B (B220+, CD43–, and IgM+) and S-B (B220+) cells from eight C57BL/6 mice were isolated using MACS. FACS analysis of the purified cells demonstrated that more than 90% of cells in each population were consistent with the correspondent phenotypes (Appendix-Fig. S1). A total of 397654, 333360, and 298987 reads identified by barcoding index were obtained from pre-B, I-B, and S-B cells, respectively. After selection according to the quality control criteria, 256532 (64.5%) pre-B cell sequences, 174593 (52.4%) I-B cell sequences, and 145764 (48.8%) S-B cell sequences with a complete and identifiable set of VHDJH gene assignments were utilized for the following analysis.
The number of functional VH (110), D (10, including two identical DSP2.x segments), and JH (4) gene segments of the C57BL/6 mice could theoretically produce 3960 (110 × 9 × 4) possible VHDJH combinations. Saturation analysis of the VHDJH combination diversity demonstrated that the VHDJH repertoire was close to saturation (nearly 3000 VHDJH combinations) when the reads increased to as many as 150000 in all three cell populations (Appendix-Fig. S2a), indicating that the data covered almost all actual VHDJH combinations in the C57BL/6 mice repertoire.
In all three B cell populations, the majority of the rearranged IgH sequences were productive (pre-B 219469, 85.6%; IM-B 146227, 83.8%; and S-B 117643, 80.7%). Productive rearranged IgH sequences with the same VH, D, and JH segments usage and identical CDRH3 nucleotide sequences were identified as clonally related sequences, which probably arose from PCR amplification of a single sequence or multiple cDNA copies from clonal expanded cells. So, clonally related sequences actually encoded almost identical IgH variable regions. The frequency of clonally related sequences gradually decreased with increasing clone size, and unique sequences with no clonally related sequences (clone size is 1) were predominate in all three B cell populations (Appendix-Fig. S3). However, compared to the IM-B and S-B cell populations, the pre-B cell population had significantly smaller clone size, when the reads approached 30000 (Appendix-Fig. S3). As the reads increased, clone size showed a similar trend in all three B cell populations (data not shown). To avoid repeatedly calculating the antibody repertoire, only one sequence was chosen at random from each group of clonally related sequences to represent a unique (unredundant) antibody molecule. More than 84% (99165) of the productive IgH sequences from S-B cell populations were unique IgH rearrangements, compared to 77.9% (170933) in pre-B cells and 75.0% (109741) in IM-B cells. The modest increase of the unique rearrangements percentage in S-B cells is due to the antigen-encountered B cells, which underwent SHM in CDRH3. Saturation analysis of the antibody diversity with the number of unique rearrangements demonstrated that the antibody repertoire was far from saturation in all three B cell populations (Appendix-Fig. S2b). This indicated that during the VDJ recombination nucleotide addition and deletion in VH-D and D-JH junctions greatly enriched the antibody repertoire.
The usage profile of the VH, D, and JH genes was not altered with B cell development
The VH gene segment usage was highly uneven among the 110 functional VH genes, with a few VH segments having greater opportunities to participate in the VHDJH recombination than the majority (Appendix-Fig. 1a). In each B cell population, the top 20 frequently utilized VH genes occupied more than 60% of the total antibody repertoire (pre-B 63.7%, IM-B 67.6%, and S-B 68.0%). There was a strong overlap (16/20) between the top 20 VH genes in three B cell populations. Of the overlapped VH genes, nine were from VHJ558, two from VHQ52, two from VH7183, and the remaining three from VH36-60, VHS107, and VHSM7, respectively. In addition, 10 functional VH segments (VHJ558.9, VHJ558.64, VHQ52.11, VHJ606.2, VHJ606.3, VH3609.1, VH3609.3, VH3609.11, VHVGAM3.8-4, and VH16.1) were not found to participate in recombination in any of the three B cell populations. It is noteworthy that some pseudo-VH genes participated in the rearrangement, although the proportion was extremely low (data not shown). In nearly half of the expressed VH genes (45/100), χ2 analysis showed a significantly different proportion among three B cell populations (P < 10-5, Appendix-Table S2).
The usage proportion of the VH family J558 was overdominant and proportionally correlated with its germline complexity (52/110), but in fact only nine VHJ558 genes (VHJ558.50, VHJ558.52, VHJ558.53, VHJ558.55, VHJ558.59, VHJ558.67, VHJ558.72, VHJ558.75, and VHJ558.77) contributed to about 80% of the VHJ558 repertoire (Fig. 1a and Fig. 1b). Two D-proximal VH families, VH7183 and VHQ52, also showed a usage proportion compatible with their family size (10/110 for VH7183 and 9/110 for VHQ52) (Fig. 1b). Both families were found to be overrepresented in adult bone marrow relative to adult spleen, but showed different changing trends in our analysis. In agreement with the previous studies, VH7183 was significantly less frequently used between pre-B and S-B cells (P < 10-33), but for VHQ52 the situation was just the reverse (pre-B vs. S-B, P < 10-14) (Fig. 1b). In contrast to the above three VH families, the contributions of VH36-60 and VH3609 were notably in inverse proportion to their family size; the VH36-60 had a complexity similar to VH3609 (6/110 vs. 8/110), yet it was the second most highly expressed family in IM-B and S-B cells, and ten times more frequent than VH3609 (average 11.6 vs. 1.05%) (Fig. 1b). Notably, the most frequently utilized VH gene in IM-B and S-B cells was VH36-60.6, which constituted more than 60% of the VH36-60 repertoire (Fig. 1a).
In all three B cell populations, the JH genes proportional usage from high to low was JH2, JH4, JH3 and JH1. Compared with the pre-B cells, the JH2 usage continuously increased with B cell development (pre-B vs. IM-B, P < 10-6; and IM-B vs. S-B, P < 10-14), but the JH3 usage was significantly decreased (pre-B vs. IM-B, P < 10-11, and IM-B vs. S-B, P < 10-11)(Fig. 1c). Since the D genes could be greatly shortened by nucleotide deletion at both ends, D-regions could not be identified in about 8% of unique sequences from three B cell populations (Fig. 1d). In the remaining unique sequences, the longest D gene, DFL16.1, was most frequently present in all three B cell populations (> 30%), although its expression was dramatically decreased with B cell development (pre-B vs. IM-B, p < 10-30, and IM-B vs. S-B, p < 10-28) (Fig. 1d). In terms of the DH families, the usage proportion was DSP2 > DFL16 > DQ52 > DST4 in all three B cell populations.
Although the majority of VH genes, VH families (11/16), JH (4/4) and D (6/9) genes were observed to have significantly different usage frequencies (P < 10-5) between the three B cell populations, the usage profile of the VH, D, and JH genes showed a high correlation (correlation coefficient r > 0.94 in all pairs) (Fig. 1 and Fig. 2). The VH- and JH- profiles of IM-B and S-B cells were more similar than those of pre-B and IM-B cells, and the D-profiles were almost the same among three B cell populations (Fig. 2).
VHDJH recombination profile was not altered during B cell development
The number of VHDJH combination patterns detected in pre-B, IM-B, and S-B cells was 2999, 2848, and 2791, respectively, which covered more than 70% of the possible VHDJH combinations. A total of 2576 VHDJH combination patterns were shared among all three B cell populations, however, the number expressed specifically in a single B cell population, was only 146, 59, and 59 in pre-B, IM-B, and S-B cells, respectively (Fig. 3a). From the 3D representations of the VHDJH repertoires, it is evident that the overall profile of the VHDJH combinations in the three B cell populations exhibited distinct similarity: the vast majority of the VHDJH repertoires were occupied by only a few VHDJH combinations, which were almost identical among the B cell populations (Fig. 4). The similarity could also be calculated by the high degree of correlation (r > 0.92 in all pairs), and the combination profiles of the IM-B and S-B cells, which showed more commonalities (Fig. 2).
The nine combination patterns: VHJ558.53/DFL16.1/JH2, VHJ558.55/DFL16.1/JH2, VHJ558.67/DFL16.1/JH2, VHJ558.67/DFL16.1/JH1, VHJ558.53/DFL16.1/JH2, VH36-60.6/DFL16.1/JH2, VH36-60.6/DFL16.1/JH1, VH36-60.6/DFL16.1/JH4 and VHJ558.75/DFL16.1/JH4, each contributed more than 0.5% of the antibody repertoire in all three B cell populations. Of these, the most diverse recombination was VHJ558.55/DFL16.1/JH2 (1.06%) identified in pre-B cells, with as many as 1662 unique CDRH3 sequences filtered from 2167 sequences. In contrast to the high similarity of the combination profiles were the dramatic differences between antibody repertoires of the three B cell populations. The sequenced antibody repertoires were not saturated, however, and the number of unique antibodies shared among all three B cell populations was only 564, with the vast majority of unique antibodies expressed specifically in a single B cell population (pre-B 158690, IM-B 97769, and S-B 94828)(Fig. 3b).
CDRH3 length shortened when B cells enter the periphery
To investigate how the size of CDRH3 changed during mouse B cell development, we compared the distribution of CDRH3 lengths in unique productive rearrangements in three B cell populations (Fig. 5a). In general, the average length of CDRH3 showed a small and not significant increment (about 0.13 nt) between pre-B and IM-B cells (P > 10-5), but shortened considerably on average (about 0.8 nt) during the development from IM-B to S-B cells (P < 10-90). When compared to pre-B cells, the variance in CDRH3 length of IM-B cells narrowed. The prevalence of both short (≤ 27 nt) and long (≥ 45 nt) CDRH3 decreased, but the prevalence of middle CDRH3 (30-39 nt) increased. When the B cells entered the periphery, the distribution of CDRH3 lengths changed again. During the progression from IM-B to S-B cells, the prevalence of CDRH3 with 33 nt or less increased, but the prevalence of CDRH3 with 39 nt or more decreased.
Decrease in CDRH3 length reflects deletion in N region and altered usage of shorter D gene segments
Deconstruction of productively rearranged unique CDRH3 sequences that contained identifiable D gene segments (Table 1) allowed further evaluation of the relative contribution of VDJ germline sequence and N (including P) to CDRH3 length. From pre-B to IM-B cells, the average CDRH3 length was unchanged. Minor decreases in the D germline contribution (0.06 nt) and N addition at the 5′ and 3′ junctions (–0.04 nt and –0.06 nt) were offset by the increase in the JH germline contribution (+0.16 nt). The decrease of N addition at the 3′ junction and the increase of the JH germline contribution reached a statistically significant level (P < 10-5 and P < 10-19). From IM-B to S-B cells, the average CDRH3 length significantly decreased by 0.76 nt (P < 10-104). The marked decrease in N addition at the 5′ and 3′ junctions (–0.24 nt and –0.27 nt) reflected two-thirds of the total decrease (P < 10-54 and P < 10-93). Also, the contribution of D germline sequence decreased by 0.11 nt, which was statistically significant (P < 10-7). From pre-B to S-B cells, increased use of DQ52, the shortest D segment, and decreased use of DFL16.1, the longest D segment, was a major factor in the decrease of contribution of D germline sequence (Fig. 1c). However, the VH and JH germline sequences did not appear to contribute to the decrease in CDRH3 length during mouse B cell development.
In unique nonproductive sequences of all three B cell populations, the frequency of three CDRH3 lengths decreased successively in the order 3n-2, 3n-1 and 3n
Because nonproductive sequences are not expressed as functional IgH proteins on the cell surface, they can be used to estimate the CDRH3 length distribution of IgH rearrangements before selection. The number of unique nonproductive rearrangements obtained from pre-B, IM-B and S-B cell populations was 31899, 24979 and 25956, respectively, which is much fewer than the number of productive rearrangements, indicating that the nonproductive rearrangements might be inefficiently transcribed or degraded more rapidly than productive rearrangements. Comparison of the average CDRH3 length between productive and nonproductive rearrangement in each B cell populations revealed no significant differences (pre-B, 35.44 vs. 35.39, P = 0.34; IM-B, 35.57 vs. 35.38, P = 0.00098, and S-B, 34.77 vs. 34.87, P = 0.11). Comparison of the average CDRH3 length of nonproductive rearrangements among three B cell populations revealed no significant difference between pre-B and IM-B cells (35.39 nt vs. 35.38 nt, P = 0.88). However, as with productive rearrangements, nonproductive rearrangements in S-B cells contained shorter CDRH3 sequences than IM-B cells, with mean 34.87 nt vs. 35.38 nt (P < 10-10). Unexpectedly, the CDRH3 length distribution of unique nonproductive rearrangements did not follow a Gaussian distribution. In all three B cell populations, the frequency of CDRH3 length decreased successively in the order 3n-2, 3n-1, and 3n (Fig. 5b), indicative of a direct correlation between mRNA stability and CDRH3 length patterns of nonproductive rearrangements.
Increased use of polar amino acids and decreased use of positively charged and nonpolar amino acids in CDRH3 loops during B cell development
Although the majority of amino acids (except glutamine and methionine) were observed to have significantly different usage proportions (P < 10-6) among the three B cell populations, the overall usage profiles of the 20 amino acids showed high correlations (correlation coefficient r > 0.997 in all pairs) (Fig. 2). In all three B cell populations, the top five preferred amino acids in CDRH3 loops were tyrosine, glycine, serine, asparagines, and alanine, which contributed to more than 60% of the total amino acids (Fig. 6a). Division of the amino acids into four subsets according to the chemical properties of their side chains and comparison of their proportions in three B cell populations are shown in Fig. 6b. The number of positively charged amino acids was significantly decreased during B cell development (pre-B vs. IM-B, P < 10-48; and IM-B vs. S-B, P < 10-58), and the same was true for nonpolar amino acids (pre-B vs. IM-B, P < 10-43; and IM-B vs. S-B, P < 10-114). Conversely, a significant increase in the proportion of polar amino acids was observed with the development of B cells (pre-B vs. IM-B, P < 10-18; and IM-B vs. S-B, P < 10-161). A greater change in proportion was observed between the repertoires of IM-B and S-B cells compared with those of pre-B and IM-B cells. In addition, the number of negatively charged amino acids increased during the development from pre-B to IM-B cells (P < 10-81), but did not change significantly between IM-B and S-B cells (P = 0.099).
Discussion
A strain-dependent developmental difference in the V
H gene usage was intensively studied in BALB/c and C57BL/6 mice. In BALB/c, the two most D-proximal V
H families, V
H7183 and V
HQ52, were overrepresented in fetal liver B cells and early B cell development in adult bone marrow, compared with their germline complexity, but this preference was lost when the B cells matured and migrated to the periphery [
26,
40-
42]. In contrast, usage bias of the 3′ V
H genes in C57BL/6 strain was clearly observed in fetal liver B cells [
42], but in our research was not apparent in the precursor B cells during B cell development in adult bone marrow. The mouse pre-B could be further divided into three subclasses, pre-B-I (c-kit
+, CD25
– and CD43
+), large pre-B-II (c-kit
–, CD25
+ and partially CD43
+), and small pre-B-II (c-kit
–, CD25
+ and CD43
–), within which the latter is predominant (60-70%) [
43,
44]. The research of ten Boekel et al. demonstrated that in normal C57BL/6 mice a V
H repertoire shift happens as cells matured from the pre-B-I to the pre-B-II, since some V
H domains encoded by the V
H7183 and V
HQ52 families are not able to form preBCR with the surrogate light chain or the preBCR formed is incapable of efficiently inducing pre-B cell clonal proliferation [
45]. In our present study, the overall expression profile of the V
H families did not show significant changes during B cell maturation in adult C57BL/6 mice. The detection of the unbiased V
H repertoire in our study is probably due to the pre-B cell population we used, which mainly belonged to the pre-B-II subclass (CD43
–), which developes from preBCR
+ cells and has already passed the positive selection step.
Due to the death of the
/
B cells, nonproductive rearrangements could only be detected from the
/
B cells. Thus the presence of a given V
H gene in the nonproductive rearrangements cannot contribute positively or negatively to selection. In contrast, the proportion of a given V
H gene in the productive rearrangements could be determined by the efficiency of participation in rearrangement as well as the property to be selected. Therefore, in previous a study, the selection of the V
H repertoire during B cell development was measured by the shift in the proportion of productive to nonproductive rearrangements or the shift in the fractions of in-frame rearrangements (IF fractions) of a given V
H gene (or V
H family) in various B cell populations [
46]. Alternatively, whatever the stage of development, a B cell could only express one kind of H chain with unique specificity. Consequently, in our study, the effect of selection, either to a V
H gene or to a V
HDJ
H combination, was determined by the diversity change, which was calculated from the substantial sequence data composed of the expressed productive V
HDJ
H. Although most of the pre-B cells utilized here are pre-B-II, compared with the other V
H families, the combination repertoire of V
H7183 family does show more obvious variation between the pre-B and IM-B cells than that between IM-B and S-B cells (
r = 0.79 vs.
r = 0.95). This finding is consistent with previous reports that the V
H7183 family was more susceptible to positive selection in early B cell development, especially the most D-proximal V
H7183.2 (equivalent to V
H81x in BALB/c strain). In both adult BALB/c and C57BL/6 mice, more than 70% of the V
H7183.2 rearrangements derived from sIgM
– bone marrow B cells were non-functional due to out-of-frame rearrangement or in-frame stop codons in the D region. It is noteworthy that, the combination repertoire of V
H36-60 family seemed not to be selected against during the overall development of the adult B cells (
r = 0.995 vs.
r = 0.999). This is consistent with the finding of Decker et al. (1991), who observed an unexpectedly high representation (more than 80%) of the productive rearrangements of the V
HM315, a member of V
H36-60 family in BALB/c strain [
47]. In the sequence of the entire Igh V locus of strain C57BL/6, the V
H gene segment corresponding to V
HM315 is the V
H36-60.6, was always predominantly utilized in the combination repertoire of V
H36-60 family in our investigation. Meng et al. suggested that the IF fraction of a rearranged V
H gene could be used as a measure of the V
H selection at the pro-B to pre-B cell transition [
46]. However, for a V
H gene, a high IF fraction does not always mean the high diversity contributed to the whole repertoire, nearly all of the V
HJ558 genes had a high IF fraction.
Unlike the suggestions resulting from previous observations, which used a limited number of V
H7183DJCμ transcripts, the average CDRH3 length generated by the rearrangement machinery was found to be reduced when B cells entered the periphery, and similar results have been observed in the human [
29,
48]. Of the individual components of CDRH3, the D and J
H elements comprised the bulk of the length of the region by contributing more than 10 nt each, whereas the N (both 5′ and 3′of D segment) and V
H elements added about 7 nt and 5 nt each. Notably, the reduction in average CDRH3 length was largely due to the fewer N additions but not the decrease of the germline contribution, indicating that the selection for B cells might favor more frequently the short CDRH3 encoded by the nonrandom germline sequences. Furthermore, there were no significant difference in the length of CDRH3 between productive and nonproductive rearrangements in each B cell populations, and the CDRH3 of nonproductive rearrangements was also significantly shorter in S-B cells versus IM-B cells. These results indicated that the longer CDRH3 seemed to be removed ahead of the transcription process.
It is well-known that somatic rearrangement of V
H, D, and J
H genes usually results in the acquisition of premature translation-termination codons (PTCs) in the Ig genes. In our cDNA database, the nonproductive rearrangements were much fewer than the productive rearrangements. This result is in line with previous publications indicating that the PTC-containing mRNAs of TCR and Ig genes are degraded more efficiently by the process of nonsense-mediated mRNA decay (NMD) [
49,
50]. In mammalian cells, the widely accepted mechanism for PTC recognition is the exon junction complex (EJC) model, which proposed that only a PTC located at least 55 nt upstream from the terminal intron can trigger NMD [
51,
52]. However, this rule is not applicable to some genes, such as the TCRβ and Igμ transcripts, which were downregulated even when the PTCs were located downstream of the –55 nt boundary [
49,
53]. Moreover, a polar effect of the NMD efficiency was also observed in both TCRβ and Igμ genes. The efficiency of NMD increased gradually as the PTC moved further downstream in Igμ gene [
49], but the effect is opposite for the TCRβ [
53]. Importantly, in our investigations, the novel feature of the CDRH3 length distribution of the nonproductive rearrangements was the regular and distinct reduction of the frequency in the order 3n-2, 3n-1 and 3n. An earlier report indicated that the productively (PTC–) and nonproductively (PTC + ) rearranged Igμ heavy chain alleles seemed to be equally well transcribed [
54]. Hence, the difference in the frequency of the three kinds of CDRH3 lengths (3n-2, 3n-1 and 3n) probably reflects the difference in the sensitivity to NMD. Due to the usage of RF3 in D
H segments, PTC(s) in the overwhelming majority of the nonproductive rearrangements with 3n-length-CDRH3 are located several nt downstream of the V
H-D junction, where they can just trigger strong NMD [
49]. Each of the other two length types (3n-2 and 3n-1) cause frameshifts downstream of the D-J junction, resulting in the accumulation of multiple PTCs in Cμ exons. The first PTC appears ~180 nt or ~60 nt downstream of the J-C junction, when the length of CDRH3 is 3
n-2 or 3
n-1. Therefore, the polarity of NMD efficiency observed in our study is similar to that reported for TCRβ transcripts: 5′ PTCs triggered more effective NMD than did 3′ PTCs [
53].
The presence of excess positively charged amino acids in the CDRH3 loop is an important feature of autoreactive antibodies, especially the dsDNA binding antibodies [
10,
55,
56]. Sequences of CDRH3 containing positively charged amino acids have been reported to be sequentially removed from the population during B cell development [
10], and another study also showed that mice forced to express an antibody repertoire enriched with positively charged amino acids showed impaired B cell development and antibody production [
57]. Consistent with these previous studies, we observed that the repertoire moved toward less positively charged residues in the CDRH3 loop in the transition from pre-B to S-B cells, accompanied by an increase in the negatively charged amino acids, which might be beneficial for B cell development. Furthermore, a shift in average hydrophobicity of the CDRH3 loops from nearly neutral to mildly hydrophilic has been observed from early-pre B to mature B cells when the V
H7183DJ
HCμ transcripts were analyzed [
39]. A decrease of aliphatic index in CDRH3 in the development from transitional to naive B cells has also been observed in the human [
29]. Here we found a significant reduction in nonpolar amino acids and an increase in polar amino acids during B cell development, which was compatible with the previous findings.
Conclusions
The data presented show that the expressed μ chain repertoire, including gene segment usage, VHDJH combination profile, and especially the length and amino acid composition of CDRH3 are fine-tuned during B cell development in order to establish an optimal humoral immune response to antigen.
Higher Education Press and Springer-Verlag Berlin Heidelberg