Development of the expressed immunoglobulin μ chain repertoire during maturation of mice B cells

In the bone marrow and spleen, the developing B cell populations undergo both negative and positive selections to shape their B cell receptor repertoire. To gain insight into the shift of the immunoglobulin heavy (IgH) chain repertoire during B cell development, we undertook large scale Ig μ chain repertoire analysis of pre-B, immature B and spleen B cell populations. We found that the majority of VH gene segments, VH families, JH and D gene segments, were observed to have significantly different usage frequencies when three B cell populations were compared, but the usage profile of the VH, D, and JH genes between different B cell populations showed high correlations. In both productive and nonproductive rearrangements, the length of CDRH3 shortened significantly on average when B cells entered the periphery. However, the CDRH3 length distribution of nonproductive rearrangements did not follow a Gaussian distribution, but decreased successively in the order 3n – 2, 3n – 1 and 3n, suggesting a direct correlation between mRNA stability and CDRH3 length patterns of nonproductive rearrangements. Further analysis of the individual components comprising CDRH3 of productive rearrangements indicated that the decrease in CDRH3 length was largely due to the reduction of N addition at the 5′ and 3′ junctions. Moreover, with development, the amino acid content of CDRH3 progressed toward fewer positively charged and nonpolar residues but more polar residues. All these data indicated that the expressed Ig μ chain repertoire, especially the repertoire of CDRH3, was fine-tuned when B cells passed through several checkpoints of selection during the process of maturation.


Introduction
In the humoral immune system, a diverse antibody repertoire is essential to produce effective and specific immune responses. During the development of B cells, the antibody (or B cell receptor, BCR) repertoire is generated by somatic joining of the variable (V), diversity (D), and joining (J) gene segments at immunoglobulin heavy (IgH) chain and light (IgL) chain loci [1]. Diversity of each chain is determined by various V H -D-J H or V κ/λ -J κ/λ combinations and unpredictable junctional sequences, which are created de novo by the imprecise joining of gene segments and the varied insertion of non-templated (N) and palindromic (P) nucleotides. The most diverse V H -D-J H junction encodes the third complementarity-determining region of the IgH chain variable region (CDRH3), in which both length and amino acid composition play a vital role in defining the BCR specificity [2].
In theory, the potential BCR repertoire has been estimated to surpass 10 8 potential sequence variants. However, the majority of the initially generated IgH and/or IgL chains are successively removed from the whole repertoire, when the developing B cells pass through several checkpoints depending on positive and/or negative selections [3,4]. During the first checkpoint, pre-B cells are positively selected for the expression of functional μ heavy chains (μHCs) encoded by the productively rearranged IgH alleles [5]. The μHC has the ability to pair with the surrogate light chain to form the pre-BCR, which is transported to the cell surface and induces the proliferation of the pre-B cells [6,7]. A few studies have indicated that autoreactivity of the pre-BCR is crucially important for inducing the expansion of pre-B cells with a productive IgH chain rearrangement [8,9]. The second checkpoint is usually referred to as central B cell tolerance, a process that negatively selects the autoreactive immature B cells in the bone marrow environment with high efficiency and stringency [10]. Receptor editing, anergy, and deletion are three known mechanisms utilized to establish the tolerance [11][12][13][14][15]. In wild-type mice, receptor editing at the IgL chain loci play a major role in silencing of autoreactive B cells (~50%), and a small proportion of low avidity autoreactive B cells become anergic, whereas the very few remaining undergo deletion [16,17]. When the newly formed immature B cells migrate to peripheral lymphoid tissues as transitional B cells, they continue to experience multiple checkpoints before finally develop entering the mature B cell pool [18]. In contrast to the detailed investigations of the central B cell tolerance, relatively little is known about the mechanisms underlying the peripheral B cell tolerance. However, several reports suggest that the transitional B cells probably experience both negative and positive selection [19][20][21][22][23].
The effect of positive and/or negative selection could be reflected in differences between BCR repertoires of different B cell populations. Several studies by Schroeder Jr et al. elucidated detailed features of the CDRH3 repertoire in various murine B cell populations, in terms of length, amino acid composition, and average hydrophobicity. Among them, the key feature of CDRH3 repertoire development in both BALB/c and C57BL/6 mice is an increase in average CDRH3 length with B cell maturation [24][25][26]. However, use of all the above features as accurate reflections of the entire BCR repertoire has been challenged, as they were deduced from the analysis of a limited number of μ chain transcripts containing only V H 7183 family members.
In the last few years, high-throughput sequencing technologies have been widely utilized to describe the repertoire of antibody and T cell receptor (TCR) in humans and zebrafish [27][28][29][30][31][32][33]. Here, to comprehensively understand the effects of selections on BCR repertoire during the B cell development, we describe and compare the IgH repertoire of pre-, immature and spleen B cells in C57BL/6 mice using 454 high-throughput pyrosequencing with particular attention to the usage profiles of V H, D, and J H genes and the repertoire of CDRH3. From the viewpoint of comparative studies, these results will provide the basis for further investigation of the BCR development in domestic animals.
2 Materials and methods 2.1 Mice C57BL/6 mice of clean grade were purchased from Vital River Laboratories (Beijing, China) and housed and cared for in a barrier environment according to the national standard (GB14925-2001) issued by General Administration of Quality Supervision, Inspection and Quarantine of China.

B cell isolation and cell sorting
The pooled bone marrows and spleens were collected from eight 6-8 weeks old mice (four male and four female). Single-cell suspensions were prepared by passing tissue samples through 200 μm nylon mesh and resuspending the filtered cells in phosphate buffered saline (PBS). Erythrocytes were lysed in red-blood-cell (RBC) lysis buffer (eBioscience, San Diego, CA, USA). The mixture of pre-B and immature B (IM-B) cells were isolated from bone marrow single-cell suspension by depletion of CD43 + pro-B cells and non-B cells, using MACS B cell Isolation Kit (Miltenyi Biotec, Bergisch Gladbach, Germany). The IM-B cells were than positively selected from the mixture with the Anti-Mouse IgM MicroBeads (Miltenyi Biotec), and the pre-B cells were the negative fraction. Spleen total B (S-B) cells were positively selected from a spleen singlecell suspension using CD45R(B220) MicroBeads (Miltenyi Biotec). Cells (~10 5 ) of each B cell subset were stained with PE-Cy5.5-anti-mouse B220, FITC-anti-mouse IgM, and PE-anti-mouse CD43, and the purity of each B cell subset was analyzed on a MoFLo High-performance cell sorter (DakoCytomation, Fort Collins, CO, USA).
2.3 mRNA preparation, cDNA synthesis, and PCR Total RNA samples were isolated from each B cell subset using mirVana TM miRNA Isolation Kit (Ambion, Austin, TX, USA), and reverse transcription was conducted using mouse C μ -specific primer (RTC μ ) [34] and M-MLV Reverse Transcriptase (Promega, Madison, WI, USA) following the user's instructions. Recombined IgH VDJ regions were amplified from each cDNA sample using a multiplex of 17 mouse V H family-specific upstream primers and a mouse C μ -specific downstream primer (C μ 0) (a 454-adaptor sequence was added at the 5′ end of each primer) [34]. A touchdown PCR was performed with 25 μL 2 Â Phusion HF PCR Master Mix (NEB, Beverley, MA, USA) and 10 μL cDNA per sample, using the following protocol: 95°C for 2 min, then touchdown PCR for 6 cycles (94°C for 1 min, from 67°C to 57°C decreasing 2°C per cycle for 1 min and 72°C for 1 min), followed by 14 cycles of 94°C for 1 min, 57°C for 1 min, and 72°C for 1 min, and a final extension step of 68°C for 10 min. To minimize PCR amplification bias, for each B cell subset sample, three C μ -specific downstream primers with three different barcoding index sequences (C μ 1-C μ 3) were separately utilized to repeat the preceding PCR. All primer sequences are given in Appendix-Table S1.

Amplicon purification and sequencing
PCR amplicons (350-450 bp) were purified by 1.2% agarose gel electrophoresis and MinElute Gel Extraction Kit (Qiagen, Hilden, Germany). For each B cell subset sample, 4 separate amplicons using C μ 0-C μ 3 were mixed in equal amount before sequencing on the 454 GS FLX sequencer (Roche, Basel, Switzerland). For sequencing, the DNA libraries were treated according to the operation manual.

Sequence analysis
A local BLAST database was constructed using the C57BL/6 reference germline V H , D, and J H sequences [35][36][37], and the BLAST algorithm was prepared according to the NCBI IgBLAST (http://www.ncbi.nlm.nih.gov/ igblast/). All reads were aligned by the local BLAST to each germline V H and J H gene, and the highest scored V H and J H genes were used for the following analysis. The algorithm to determine the V H -D-J H junctions and to identify the 3′ V-regions, 5′ J-regions, D-regions, Pregions, and N-regions were designed on the basis of the IMGT/JunctionAnalysis [38]. Reads were considered to be informative antibody sequences if they passed the following quality control criteria: a minimum length of 300 bp; identified V H and J H genes; and absence of ambiguous nucleotides in the junction region. A non-redundant rearrangements library was constructed when the clonally related sequences were reduced. From these unique sequences, the V H DJ H combination repertoire, antibody repertoire and CDRH3 characteristics were calculated. The definitions of CDRH3 and CDRH3-loop were described in the reference [24,39]  In all three B cell populations, the majority of the rearranged IgH sequences were productive (pre-B 219469, 85.6%; IM-B 146227, 83.8%; S-B 117643, 80.7%). Productive rearranged IgH sequences with the same V H , D, and J H segments usage and identical CDRH3 nucleotide sequences were identified as clonally related sequences, which probably arose from PCR amplification of a single sequence or multiple cDNA copies from clonal expanded cells. So, clonally related sequences actually encoded almost identical IgH variable regions. The frequency of clonally related sequences gradually decreased with increasing clone size, and unique sequences with no clonally related sequences (clone size is 1) were predominate in all three B cell populations (Appendix- Fig. S3). However, compared to the IM-B and S-B cell populations, the pre-B cell population had significantly smaller clone size, when the reads approached 30000 (Appendix- Fig. S3). As the reads increased, clone size showed a similar trend in all three B cell populations (data not shown). To avoid repeatedly calculating the antibody repertoire, only one sequence was chosen at random from each group of clonally related sequences to represent a unique (unredundant) antibody molecule. More than 84% (99165) of the productive IgH sequences from S-B cell populations were unique IgH rearrangements, compared to 77.9% (170933) in pre-B cells and 75.0% (109741) in IM-B cells. The modest increase of the unique rearrangements percentage in S-B cells is due to the antigenencountered B cells, which underwent somatic hypermutation (SHM) in CDRH3. Saturation analysis of the antibody diversity with the number of unique rearrangements demonstrated that the antibody repertoire was far from saturation in all three B cell populations (Appendix- Fig. S2b). This indicated that during the VDJ recombination nucleotide addition and deletion in V H -D and D-J H junctions greatly enriched the antibody repertoire.

The usage profile of the V H, D, and J H genes was not altered with B cell development
The V H gene segment usage was highly uneven among the 110 functional V H genes, with a few V H segments having greater opportunities to participate in the V H DJ H recombination than the majority (Appendix- Fig. 1a). In each B cell population, the top 20 frequently utilized V H genes occupied more than 60% of the total antibody repertoire (pre-B 63.7%, IM-B 67.6%, and S-B 68.0%). There was a strong overlap (16/20) .1) were not found to participate in recombination in any of the three B cell populations. It is noteworthy that some pseudo-V H genes participated in the rearrangement, although the proportion was extremely low (data not shown). In nearly half of the expressed V H genes (45/100), χ 2 analysis showed a significantly different proportion among three B cell populations (P < 10 -5 , Appendix- Table S2).
The usage proportion of the V H family J558 was overdominant and proportionally correlated with its germline complexity (52/110), but in fact only nine V H J558 genes (V H J558. 50 Fig. 1a and Fig. 1b). Two D-proximal V H families, V H 7183 and V H Q52, also showed a usage proportion compatible with their family size (10/110 for V H 7183 and 9/110 for V H Q52) (Fig. 1b). Both families were found to be overrepresented in adult bone marrow relative to adult spleen, but showed different changing trends in our analysis. In agreement with the previous studies, V H 7183 was significantly less frequently used between pre-B and S-B cells (P < 10 -33 ), but for V H Q52 the situation was just the reverse (pre-B vs. S-B, P < 10 -14 ) (Fig. 1b). In contrast to the above three V H families, the contributions of V H 36-60 and V H 3609 were notably in inverse proportion to their family size; the V H 36-60 had a complexity similar to V H 3609 (6/110 vs. 8/110), yet it was the second most highly expressed family in IM-B and S-B cells, and ten times more frequent than V H 3609 (average 11.6 vs. 1.05%) (Fig. 1b). Notably, the most frequently utilized V H gene in IM-B and S-B cells was V H 36-60.6, which constituted more than 60% of the V H 36-60 repertoire (Fig. 1a).
In all three B cell populations, the J H genes proportional usage from high to low was J H 2, J H 4, J H 3 and J H 1. Compared with the pre-B cells, the J H 2 usage continuously increased with B cell development (pre-B vs. IM-B, P < 10 -6 ; and IM-B vs. S-B, P < 10 -14 ), but the J H 3 usage was significantly decreased (pre-B vs. IM-B, P < 10 -11 , and IM-B vs. S-B, P < 10 -11 ) (Fig. 1c). Since the D genes could be greatly shortened by nucleotide deletion at both ends, D-regions could not be identified in about 8% of unique sequences from three B cell populations (Fig. 1d). In the remaining unique sequences, the longest D gene, DFL16.1, was most frequently present in all three B cell populations ( > 30%), although its expression was dramatically decreased with B cell development (pre-B vs. IM-B, P < 10 -30 , and IM-B vs. S-B, P < 10 -28 ) (Fig. 1d). In terms of the D H families, the usage proportion was DSP2 > DFL16 > DQ52 > DST4 in all three B cell populations.
Although the majority of V H genes, V H families (11/16), J H (4/4) and D (6/9) genes were observed to have significantly different usage frequencies (P < 10 -5 ) between the three B cell populations, the usage profile of the V H, D, and J H genes showed a high correlation (correlation coefficient r > 0.94 in all pairs) ( Fig. 1 and Fig. 2). The V H -and J H -profiles of IM-B and S-B cells were more similar than those of pre-B and IM-B cells, and the D-profiles were almost the same among three B cell populations (Fig. 2).

V H DJ H recombination profile was not altered during B cell development
The number of V H DJ H combination patterns detected in pre-B, IM-B, and S-B cells was 2999, 2848, and 2791, respectively, which covered more than 70% of the possible V H DJ H combinations. A total of 2576 V H DJ H combination patterns were shared among all three B cell populations, however, the number expressed specifically in a single B cell population, was only 146, 59, and 59 in pre-B, IM-B, and S-B cells, respectively (Fig. 3a). From the 3D representations of the V H DJ H repertoires, it is evident that the overall profile of the V H DJ H combinations in the three B cell populations exhibited distinct similarity: the vast majority of the V H DJ H repertoires were occupied by only a few V H DJ H combinations, which were almost identical among the B cell populations (Fig. 4). The similarity could also be calculated by the high degree of correlation (r > 0.92 in all pairs), and the combination profiles of the IM-B and S-B cells, which showed more commonalities (Fig. 2).

CDRH3 length shortened when B cells enter the periphery
To investigate how the size of CDRH3 changed during mouse B cell development, we compared the distribution of CDRH3 lengths in unique productive rearrangements in three B cell populations (Fig. 5a). In general, the average length of CDRH3 showed a small and not significant increment (about 0.13 nt) between pre-B and IM-B cells (P > 10 -5 ), but shortened considerably on average (~0.8 nt) during the development from IM-B to S-B cells (P < 10 -90 ). When compared to pre-B cells, the variance in CDRH3 length of IM-B cells narrowed. The prevalence of both short (£27 nt) and long (≥45 nt) CDRH3 decreased, but the prevalence of middle CDRH3 (30-39 nt) increased. When the B cells entered the periphery, the distribution of CDRH3 lengths changed again. During the progression from IM-B to S-B cells, the prevalence of CDRH3 with 33 nt or less increased, but the prevalence of CDRH3 with 39 nt or more decreased.

Decrease in CDRH3 length reflects deletion in N region and altered usage of shorter D gene segments
Deconstruction of productively rearranged unique CDRH3 sequences that contained identifiable D gene segments (Table 1) allowed further evaluation of the relative contribution of VDJ germline sequence and N (including P) to CDRH3 length. From pre-B to IM-B cells, the average CDRH3 length was unchanged. Minor decreases in the D germline contribution ( -0.06 nt) and N addition at the 5′ and 3′ junctions ( -0.04 nt and -0.06 nt) were offset by the increase in the J H germline contribution (+ 0.16 nt). The decrease of N addition at the 3′ junction and the increase of the J H germline contribution reached a statistically significant level (P < 10 -5 and P < 10 -19 ). From IM-B to S-B cells, the average CDRH3 length significantly decreased by 0.76 nt (P < 10 -104 ). The marked decrease in N addition at the 5′ and 3′ junctions (-0.24 nt and -0.27 nt) reflected two-thirds of the total decrease (P < 10 -54 and P < 10 -93 ). Also, the contribution of D germline sequence decreased by 0.11 nt, which was statistically significant (P < 10 -7 ). From pre-B to S-B cells, increased use of DQ52, the shortest D segment, and decreased use of DFL16.1, the longest D segment, was a major factor in the decrease of contribution of D germline sequence (Fig. 1c). However, the V H and J H germline sequences did not appear to contribute to the decrease in CDRH3 length during mouse B cell development. 3.6 In unique nonproductive sequences of all three B cell populations, the frequency of three CDRH3 lengths decreased successively in the order 3n -2, 3n -1 and 3n Because nonproductive sequences are not expressed as functional IgH proteins on the cell surface, they can be used to estimate the CDRH3 length distribution of IgH rearrangements before selection. The number of unique nonproductive rearrangements obtained from pre-B, IM-B and S-B cell populations was 31899, 24979 and 25956, respectively, which is much fewer than the number of productive rearrangements, indicating that the nonproductive rearrangements might be inefficiently transcribed or degraded more rapidly than productive rearrangements. Comparison of the average CDRH3 length between   of CDRH3 length decreased successively in the order 3n -2, 3n -1, and 3n (Fig. 5b), indicative of a direct correlation between mRNA stability and CDRH3 length patterns of nonproductive rearrangements. Note: ① The number of sequences analyzed for pre-B, IM-B and S-B cells was 154365, 99629 and 89653, respectively; ② 5′ of D and 3′ of D indicate the N (and including P) nucleotides added between V H and D and between D and J H , respectively; ③ Standard errors in parentheses; ④ Different superscript letters (a, b and c) indicate statistically significant difference at P < 10 -5 within each line. Fig. 5 The CDRH3 length distribution of unique productive (a) and nonproductive (b) rearrangements transcribed from pre-B, IM-B and S-B cells. Each bar in subfigure a and point in subfigure b indicates the percentage of the unique productive (or nonproductive) rearrangements containing the corresponding length of CDRH3 relative to the total unique productive (or nonproductive) rearrangements.
In subfigure a, an asterisk indicates significant differences (P < 10 -5 ) compared with two corresponding B cell populations.

Increased use of polar amino acids and decreased use of positively charged and nonpolar amino acids in CDRH3 loops during B cell development
Although the majority of amino acids (except glutamine and methionine) were observed to have significantly different usage proportions (P < 10 -6 ) among the three B cell populations, the overall usage profiles of the 20 amino acids showed high correlations (correlation coefficient r > 0.997 in all pairs) (Fig. 2). In all three B cell populations, the top five preferred amino acids in CDRH3 loops were tyrosine, glycine, serine, asparagine, and alanine, which contributed to more than 60% of the total amino acids (Fig. 6a).

Discussion
A strain-dependent developmental difference in the V H gene usage was intensively studied in BALB/c and C57BL/6 mice. In BALB/c, the two most D-proximal V H families, V H 7183 and V H Q52, were overrepresented in fetal liver B cells and early B cell development in adult bone marrow, compared with their germline complexity, but this preference was lost when the B cells matured and migrated to the periphery [26,[40][41][42]. In contrast, usage bias of the 3′ V H genes in C57BL/6 strain was clearly observed in fetal liver B cells [42], but in our research was not apparent in the precursor B cells during B cell development in adult bone marrow. The mouse pre-B could be further divided into three subclasses, pre-B-I (c-kit + , CD25and CD43 + ), large pre-B-II (c-kit -, CD25 + and partially CD43 + ), and small pre-B-II (c-kit -, CD25 + and CD43 -), within which the latter is predominant (60%-70%) [43,44]. The research of ten Boekel et al. demonstrated that in normal C57BL/6 mice a V H repertoire shift happens as cells matured from the pre-B-I to the pre-B-II, since some V H domains encoded by the V H 7183 and V H Q52 families are not able to form preBCR with the surrogate light chain or the preBCR formed is incapable of efficiently inducing pre-B cell clonal proliferation [45]. In our present study, the overall expression profile of the V H families did not show significant changes during B cell maturation in adult C57BL/6 mice. The detection of the unbiased V H repertoire in our study is probably due to the pre-B cell population we used, which mainly belonged to the pre-B-II subclass (CD43 -), which developes from preBCR + cells and has already passed the positive selection step. Due to the death of the V H DJ -H /V H DJ -H B cells, nonproductive rearrangements could only be detected from the V H DJ þ H /V H DJ -H B cells. Thus the presence of a given V H gene in the nonproductive rearrangements cannot contribute positively or negatively to selection. In contrast, the proportion of a given V H gene in the productive rearrangements could be determined by the efficiency of participation in rearrangement as well as the property to be selected. Therefore, in a previous study, the selection of the V H repertoire during B cell development was measured by the shift in the proportion of productive to nonproductive rearrangements or the shift in the fractions of in-frame rearrangements (IF fractions) of a given V H gene (or V H family) in various B cell populations [46]. Alternatively, whatever the stage of development, a B cell could only express one kind of H chain with unique specificity. Consequently, in our study, the effect of selection, either to a V H gene or to a V H DJ H combination, was determined by the diversity change, which was calculated from the substantial sequence data composed of the expressed productive V H DJ H . Although most of the pre-B cells utilized here are pre-B-II, compared with the other V H families, the combination repertoire of V H 7183 family does show more obvious variation between the pre-B and IM-B cells than that between IM-B and S-B cells (r = 0.79 vs. r = 0.95). This finding is consistent with previous report that the V H 7183 family was more susceptible to positive selection in early B cell development, especially the most D-proximal V H 7183.2 (equivalent to V H 81x in BALB/c strain) [45]. It is noteworthy that, the combination repertoire of V H 36-60 family seemed not to be selected against during the overall development of the adult B cells (r = 0.995 vs. r = 0.999). This is consistent with the finding of Decker et al. (1991), who observed an unexpectedly high representation (more than 80%) of the productive rearrangements of the V H M315, a member of V H 36-60 family in BALB/c strain [47]. In the sequence of the entire Igh V locus of strain C57BL/6, the V H gene segment corresponding to V H M315 is the V H 36-60.6, was always predominantly utilized in the combination repertoire of V H 36-60 family in our investigation. Meng et al. suggested that the IF fraction of a rearranged V H gene could be used as a measure of the V H selection at the pro-B to pre-B cell transition [46]. However, for a V H gene, a high IF fraction does not always mean the high diversity contributed to the whole repertoire, nearly all of the V H J558 genes had a high IF fraction.
Unlike the suggestions resulting from previous observations, which used a limited number of V H 7183DJCμ transcripts, the average CDRH3 length generated by the rearrangement machinery was found to be reduced when B cells entered the periphery, and similar results have been observed in the human [29,48]. Of the individual components of CDRH3, the D and J H elements comprised the bulk of the length of the region by contributing more than 10 nt each, whereas the N (both 5′ and 3′of D segment) and V H elements added about 7 nt and 5 nt each. Notably, the reduction in average CDRH3 length was largely due to the fewer N additions but not the decrease of the germline contribution, indicating that the selection for B cells might favor more frequently the short CDRH3 encoded by the nonrandom germline sequences. Furthermore, there were no significant difference in the length of CDRH3 between productive and nonproductive rearrangements in each B cell populations, and the CDRH3 of nonproductive rearrangements was also significantly shorter in S-B cells versus IM-B cells. These results indicated that the longer CDRH3 seemed to be removed ahead of the transcription process.
It is well known that somatic rearrangement of V H , D, and J H genes usually results in the acquisition of premature translation-termination codons (PTCs) in the Ig genes. In our cDNA database, the nonproductive rearrangements were much fewer than the productive rearrangements. This result is in line with previous publications indicating that the PTC-containing mRNAs of TCR and Ig genes are degraded more efficiently by the process of nonsensemediated mRNA decay (NMD) [49,50]. In mammalian cells, the widely accepted mechanism for PTC recognition is the exon junction complex (EJC) model, which proposed that only a PTC located at least 55 nt upstream from the terminal intron can trigger NMD [51,52]. However, this rule is not applicable to some genes, such as the TCRβ and Igμ transcripts, which were downregulated even when the PTCs were located downstream of the -55 nt boundary [49,53]. Moreover, a polar effect of the NMD efficiency was also observed in both TCRβ and Igμ genes. The efficiency of NMD increased gradually as the PTC moved further downstream in Igμ gene [49], but the effect is opposite for the TCRβ [53]. Importantly, in our investigations, the novel feature of the CDRH3 length distribution of the nonproductive rearrangements was the regular and distinct reduction of the frequency in the order 3n -2, 3n -1 and 3n. An earlier report indicated that the productively (PTC-) and nonproductively (PTC +) rearranged Igμ heavy chain alleles seemed to be equally well transcribed [54]. Hence, the difference in the frequency of the three kinds of CDRH3 lengths (3n -2, 3n -1 and 3n) probably reflects the difference in the sensitivity to NMD. Due to the usage of RF3 in D H segments, PTC(s) in the overwhelming majority of the nonproductive rearrangements with 3n-length-CDRH3 are located several nt downstream of the V H -D junction, where they can just trigger strong NMD [49]. Each of the other two length types (3n -2 and 3n -1) cause frameshifts downstream of the D-J junction, resulting in the accumulation of multiple PTCs in Cμ exons. The first PTC appears~180 nt or 60 nt downstream of the J-C junction, when the length of CDRH3 is 3n-2 or 3n-1. Therefore, the polarity of NMD efficiency observed in our study is similar to that reported for TCRβ transcripts: 5′ PTCs triggered more effective NMD than did 3′ PTCs [53].
The presence of excess positively charged amino acids in the CDRH3 loop is an important feature of autoreactive antibodies, especially the dsDNA binding antibodies [10,55,56]. Sequences of CDRH3 containing positively charged amino acids have been reported to be sequentially removed from the population during B cell development [10], and another study also showed that mice forced to express an antibody repertoire enriched with positively charged amino acids showed impaired B cell development and antibody production [57]. Consistent with these previous studies, we observed that the repertoire moved toward less positively charged residues in the CDRH3 loop in the transition from pre-B to S-B cells, accompanied by an increase in the negatively charged amino acids, which might be beneficial for B cell development. Furthermore, a shift in average hydrophobicity of the CDRH3 loops from nearly neutral to mildly hydrophilic has been observed from early-pre B to mature B cells when the V H 7183DJ H Cμ transcripts were analyzed [39]. A decrease of aliphatic index in CDRH3 in the development from transitional to naive B cells has also been observed in the human [29]. Here we found a significant reduction in nonpolar amino acids and an increase in polar amino acids during B cell development, which was compatible with the previous findings.

Conclusions
The data presented show that the expressed μ chain repertoire, including gene segment usage, V H DJ H combination profile, and especially the length and amino acid composition of CDRH3 are fine-tuned during B cell development in order to establish an optimal humoral immune response to antigen.