Introduction
In mammalian B cells, immunoglobulin (Ig) genes undergo two DNA alteration events, somatic hypermutation (SHM) and class switch recombination (CSR), to enhance antibody diversity [
1–
3]. SHM and CSR each require the activation-induced deaminase (AID) [
4]. AID deaminates cytosine (C) in DNA and converts it into uracil (U), resulting in U:G mismatch lesions [
5]. These AID-initiated DNA lesions are subsequently converted into point mutations during SHM or into DNA double-stranded breaks (DSBs) during CSR [
6–
8]. AID has also been implicated in generating chromosomal translocations of both Ig and non-Ig loci in leukemia and lymphoma [
3,
9–
11].
Although AID has the ability to deaminate any transcribed substrate
in vitro and could potentially access the genome widely to induce genomic instability in B cells, its physiological targets during SHM and CSR are almost exclusively restricted to variable (V) or switch (S) regions of
Ig loci. A major unresolved question is how AID-mediated DNA alterations are specifically targeted to
Ig loci, yet refrain from causing genome-wide damages in B lymphocytes. We recently reviewed the progress in the AID field, focusing on the specificity of AID targeting and its role in genomic instability [
3]. In this review, we discuss how AID-initiated lesions are generated and repaired.
Somatic hypermutation and class switch recombination in B lymphocytes
We encounter millions of different antigens daily, including infectious pathogens, which are constantly recognized by our antibody molecules. This tremendous diversity of antibody molecules is first achieved during B cell development via V(D)J recombination [
12]. Although V(D)J recombination generates an almost infinite primary repertoire of antibodies, a secondary diversification process in mature B cells is still essential for generating antigen-specific high-affinity switched antibodies [
2]. In mammalian B cells, this secondary diversification process includes SHM and CSR (Fig. 1). During SHM, point mutations are introduced into V region exons and immediate downstream intronic J regions, thereby enhancing DNA sequence diversity and allowing selection of B cell clones with higher affinity for antigen [
5]. During CSR, the constant regions of the
Igh locus are switched and B cells acquire different effector functions. Newly generated naïve B cells initially express IgM encoded by Cμ exons. Upon CSR, the assembled V(D)J exon maintains its antigen-specificity but is juxtaposed next to one of the sets of downstream C
H exons (referred to as C
H genes) to produce different classes of antibodies (e.g., IgG, IgE, or IgA), which are encoded by different C
H genes (e.g., Cγ, Cϵ, and Cα) [
7] (Fig. 1). CSR is a specific DNA recombination process that occurs between highly repetitive and evolutionarily conserved sequences termed switch (S) regions [
13]. S regions are located at 5′ of each set of C
H exons except Cδ [
13] and undergo AID-mediated DSB generation [
14]. The broken upstream donor Sμ and downstream acceptor S regions are rejoined via non-homologous end-joining (NHEJ), while the intervening DNA sequence is excised as a circle (Fig. 1) [
15]. CSR does not affect antigen specificity of antibody molecules since V region exons are not altered during CSR, but it generates different classes of antibodies that interact with different effector molecules [
3].
T cell-dependent antigens induce B cells to form specialized structures termed germinal centers (GCs) [
16]. In GC B cells, robust SHM targets the assembled V region exons of the
Igh and
Igl loci and S regions of the
Igh locus [
17,
18]. CSR can be induced by T cell-dependent and independent antigens
in vivo, and thus can occur in both GC and extra-follicular B cells [
19]. In addition, different combinations of activators and cytokines such as anti-CD40 or bacterial lipopolysaccharide (LPS) and IL-4 can induce CSR in B cells
in vitro by enabling the accessibility of a given S region for recombination [
7,
20]. Moreover, the S regions in the cytokine-activated B cells can also harbor a relatively high level of point mutations [
21,
22]. Since B cells activated with different stimuli undergo distinct differentiation pathways and display unique gene expression signatures [
23], it is likely that the process that generates AID-mediated point mutations or DSBs is differentially regulated in distinct B cell subpopulations.
AID-initiated DNA lesions and their processing repair pathways
When AID was originally discovered, it was proposed to function as an RNA editing enzyme [
4]. Although it remains likely that AID might target cellular or viral RNAs to mediate deamination [
24], convincing genetic and biochemical evidence has shown that AID functions as a DNA deaminase during SHM/CSR to convert cytosine (C) to uracil (U) [
25], thus creating U:G mismatch lesions in DNA (Fig. 2). Furthermore, AID only acts on single-stranded (ss) DNA and cannot access double-stranded (ds) DNA [
26–
32]. During SHM, it has been proposed that ssDNA is probably generated during transcription in the form of transcription bubbles [
27]. During CSR, ssDNA might be generated via a special structure called an “R-loop” [
33,
34]. R-loops are nucleic acid structures in which an RNA strand forms a RNA:DNA hybrid molecule by displacing one strand of DNA in an duplex DNA molecule for a limited length. The formation of R-loop structures is observed at sequences that generate a G-rich transcript such as prokaryotic origins of replication [
35] or mitochondrial origins of replication [
36]. Mammalian S regions are unusually G-rich on the non-template strand [
7,
20], thereby producing G-rich RNA transcripts that can stably associate with the template strand of the DNA to form R loops, in which the non-template DNA strand is displaced and exists as ssDNA [
14,
33,
34]. It has been proposed that evolutionarily conserved mammalian S regions are prone to form ssDNA and thus serve as the main targets of AID [
7].
AID-initiated U:G mismatches can be resolved through several competing pathways (Fig. 2) [
5,
37–
42]: (1) The general replication machinery can interpret the U as if it were a thymine (T). One of the daughter cells will acquire a C→T transition mutation; (2) Uracil glycosylase (UNG) can remove the U, leaving behind an abasic site. Error-prone polymerases such as Rev1 can incorporate any nucleotide in place of the U, leading to transitions or transversions at C:G base pairs; (3) MSH2/MSH6 (mutS homolog 2/6), components of the mismatch repair (MMR) pathway, can recognize the U:G mismatch. The U-bearing strand is excised and, at loci that undergo SHM, error-prone polymerases are recruited to fill the gap, leading to transition or transversion mutations at A:T base pairs. Thus, the mutations in the V region are not directly the result of AID deamination, but rather depend on the UNG and MMR recognition and processing of the AID-induced mismatches; (4) After MMR or UNG recognition, error-free repair could also correct the U:G mismatched DNA lesions. In the absence of MSH2 and UNG, AID-initiated U:G mismatches cannot be recognized by either pathway and are converted to C→T or G→A mutations during replication. Thus, in MSH2
-/- UNG
-/ - mice, almost all the mutations are either C→T or G→A transitions that represent the footprint of AID deamination [
39,
43].
It is not completely understood how the recognition of AID-initiated U:G mismatches results in DSBs. Uracils can be removed by UNG, which generates abasic sites, and it is presumed that apurinic/apyrimidinic (AP) endonucleases 1 and 2 (APE1 and APE2) create nicks on the DNA strands at these abasic sites [
5,
8,
20]. Data from konckout or knockdown studies have been controversial since it is difficult to directly assess APE1’s role in CSR [
44–
46]. This is because APE1 deletion leads to embryonic lethality in mice [
47,
48]. Recently, it was reported that deletion of the
Ape1 gene in a mouse B cell line (CH12F3) did not affect cell viability or growth, yet it dramatically decreased the level of CSR, thereby proving an essential role for APE1 in DSB generation and CSR [
49]. In addition, it was shown that deletion of
Ape2 had no effect on CSR in either APE1-proficient or-deficient cells [
49]. U:G mismatches can also be recognized by the MMR pathway [
5]. MLH1 is a component of the MMR pathway with ATPase activity [
6]. A recent study showed that MLH1 may function to regulate whether a U:G mismatch progresses toward point mutations or DSBs [
50]. Mice deficient in ATPase activity of MLH1 show a significant reduction in CSR, which is mainly caused by decreased DSB generation; in contrast, these mice have a normal level of SHM [
50]. Therefore, these data demonstrate a specific role of the ATPase domain of MLH1 in CSR, thereby distinguishing it from other members of the MMR pathway such as MSh2, in that MLH1’s ATPase activity influences the processing manner of AID-initiated lesions and promotes the generation of DSBs.
The role of target DNA sequences in regulating AID targeting specificity and efficiency
A central unresolved question in SHM is how AID specifically targets to the V region exons of
Igh and
Igl loci. Although AID indeed targets a group of non-Ig genes during SHM, the mutation frequency of these genes is several orders of magnitudes lower than that of V regions [
5,
51–
58]. The specificity and efficiency of AID targeting to the V region may be regulated at multiple levels [
8,
59], including but not limited to regulation by specific sequence motifs,
cis regulatory elements, histone modification pattern, and AID-cofactors. Correlative studies have long suggested that certain hotspot motifs such as RGYW or AGCT may influence mutation frequency [
5,
60,
61]. However, it remains unclear whether and how target DNA sequences regulate mutation frequency. Prior studies using artificial substrates driven by an Ig specific promoter and enhancer suggested that nucleotide sequence context might influence mutation frequency [
62–
64], whereas others showed that sequences might not play an important role in targeting mutations to V regions since several tested non-Ig genes mutated at a similar frequency to the V region exon [
65]. However, due to technical limitations of the transgenic approach employed [
66], these studies were, to some extent, inconclusive and left the role of nucleotide sequence in SHM unresolved.
Our recent studies attempted to address these fundamental questions by establishing a knock-in model via a gene-targeting approach to introduce a core Sγ1 region into the first intron of
Bcl6 [
67]. We found that, consistent with previous analysis of a similar region [
68], the mouse
Bcl6 first intron region mutated at a frequency of approximately 2×10
-4 [
67]. More importantly, the mutation frequency of the inserted Sγ1 region was found to be 10-fold higher than that of the adjacent endogenous
Bcl6 sequence [
67]. Thus, our studies demonstrate that S region sequence per se, independent of
Igh cis regulatory elements, enhances AID targeting efficiency [
67]. We conclude that nucleotide sequence of a gene locus itself directly influences its mutability. Mechanistically, we showed that the enhanced recruitment of RNA polymerase II (RNAPII) and AID might explain the observed mutational phenotypes of the inserted S region [
67]. More interestingly, we found that the higher level of RNAPII accumulation is not only closely correlated with the inserted Sγ1 region but also detected at both the 5′ and 3′ ends of this region [
67]. This would suggest a higher density of RNAPII across the entire knock-in S region, which appears to be highly consistent with the previous findings regarding the endogenous S region in the
Igh locus [
69]. We propose that RNAPII pausing at S regions likely facilitates the repositioning of repressive nucleosomes to create a permissive chromatin architecture that allows AID to access target DNA sequences [
67]. Consistent with our hypothesis, the occupancies of RNAPII, Spt5, and RPA were enriched at the
Igh locus in cytokine-activated B cells [
70]; this was not only detected in the S regions but also in some of the C regions such as Cμ [
70]. Thus, we propose that these AID cofactors including RNAPII, Spt5, and RPA are more likely to act as “accessibility factors” that facilitate the establishment of a permissive chromatin architecture for AID’s access to the
Igh locus [
67]. It remains to be addressed whether these factors are actively bringing AID to the
Igh locus. It is possible that there are unidentified AID targeting factors that might act more specifically at
Ig loci since these “accessibility factors” are prevalently involved in the transcriptional control of many other loci. Our hypothesis is also in line with involvement of the histone chaperone “facilitates chromatin transcription” (FACT) complex [
71] and distinct chromatin modifications in CSR [
69,
71–
73]. Overall, these studies reveal a complex picture of chromatin modification patterns in the
Igh locus during CSR, with active or repressive histone marks associated with S or C
H regions [
69,
71–
73]. Thus, we propose that a specific combination of histone modifications may be responsible for the observed mutational phenotypes in the inserted Sγ1 region [
67].
Taken together, we propose that nucleotide sequences, as the targets of AID, function actively to determine their own mutability, possibly by forming higher-order structures, recruiting sequence-specific co-factors, altering chromatin context, and/or regulating the transcriptional process. In addition, the nucleotide sequence preference may serve as an additional layer of AID regulation by restricting its mutagenic activity to specific sequences such as evolutionarily conserved S regions. This regulatory mechanism may ensure that AID deamination frequency remains relatively low at most loci in the genome, below the threshold of repair capacity, so that these AID-initiated lesions can be efficiently repaired, thereby protecting the integrity of the B cell genome. Such a notion is also consistent with the observation that although AID is recruited to 5910 target genes in
in vitro activated B cells [
74], most of these loci would not display mutations in the presence of a normal repair mechanism.
Cis regulatory elements in SHM targeting
Cis-acting elements of
Ig loci have been investigated extensively for their effects in SHM [
52,
75–
78]. The transcription of the assembled V region is absolutely required for SHM since deletion of V region promoter abolishes SHM [
79]. However, the V region promoter could be replaced by a RNA Pol I dependent promoter, which supported about 60% of the mutation level induced by the endogenous RNA Pol II promoter [
79]. These studies imply that transcription driven by any promoter might support SHM. This notion is consistent with later studies suggesting that the level of transcription is not necessarily correlated with the level of mutations [
80,
81]. It has been long proposed that the mutator enzyme (now known as AID) may associate with RNAP II to mediate SHM [
82]. However, it remains unclear which aspects of transcription regulate SHM and through what mechanisms. A recent study knocked-in a transcription terminator into an
Ig gene V region in the DT40 chicken B cell line [
83]. The knock-in human β-globin terminator reduced mutations downstream of the poly(A) signal, which was accompanied by efficient inhibition of downstream transcription [
83]. These data suggest that target DNA sequences gain better access to AID when RNAP II is in the elongating rather than terminating mode [
83]. Transcription may also guide AID in targeting SHM to
Ig genes via regulating histone-exchanging dynamics at these loci [
84]. Previously, the FACT complex was shown to be required for a normal level of CSR [
71]. In a more recent study, the FACT complex was found to promote SHM [
84]. In addition, the most abundant deposition of FACT and H3.3 have been identified at the assembled V region, the 5′ flanking sequence of the Sμ region, and the light chain Jκ5 segment region of the
Ig loci [
84], which are the most efficient targets of SHM. Therefore, these data suggest an important role for transcription in regulating chromatin dynamics in SHM.
Deletion of the
Igh Eμ intronic enhancer and a portion of the 3′
Igh regulatory region showed that these elements are not required for SHM [
52,
78,
85]. However, recent studies showed that deletion of the 30 kb region encompassing the entire 3′
Igh regulatory region almost completely abrogated SHM [
86]. Thus, this study clarified a long-standing controversy over the role of
cis elements in targeting SHM [
59]. Furthermore, novel
cis-acting elements were identified in the
Igl locus of the DT40 chicken B cell line that directly regulate AID-mediated sequence diversification [
87–
90]. In addition, it was shown that the E box motif CAGGTG in the context of
Ig enhancers were sufficient and essential to target mutations to a nearby transcribed gene in transgenic DT40 cell lines [
91]. In this regard, it has been reported that the context sequences of E box motifs play a more important role in targeting SHM [
87]. Overall, these studies highlight an important role of
cis regulatory elements in AID targeting to
Ig loci [
77]. Our recent studies also suggest that such elements might also be important contributors to mutability of other preferred non-Ig targets of AID, such as
Bcl6 [
67].
Locus-dependent repair of AID-initiated DNA lesions
AID initiates U:G mismatches in DNA, which are converted into point mutations in the V regions and DSBs in the S regions of
Ig loci [
5]. In addition, AID can target non-Ig gene loci to mediate point mutations or DSBs [
92]. These AID-initiated lesions must be processed appropriately to prevent them from causing genome-wide damage. Prior studies proposed a differential DNA repair mechanism that protects non-Ig loci from an excessive amount of mutations [
53]. For example, the mutation frequency of the
c-myc locus is much higher in MSH2
-/-UNG
-/- or UNG
-/ - Peyer’s patch (PP) GC B cells than in wildtype (wt) controls [
53]. These data suggest that AID-initiated lesions at certain non-Ig loci such as
c-myc are normally processed by error-free repair and only manifest in the absence of MSH2 and UNG or UNG alone in GC B cells [
53]. Consistently, mutational analysis of other non-Ig loci in MSH2
-/-UNG
-/- or single deficient mice revealed that error-prone or error-free repair could differentially target to distinct loci in GC B cells [
53]. In particular, AID-initiated lesions are repaired in an error-free manner in most of the tested non-Ig loci, including
c-myc, whereas a few of them appear to undergo error-prone repair [
53], including
Bcl6 and
Cd83, which are among the non-Ig genes initially identified to undergo SHM. The error-free repair mechanism operating at most non-Ig loci probably protects the genome of GC B cells.
However, it remains unknown how the differential DNA repair of AID-initiated U:G lesions is regulated at non-Ig loci. It has been well documented that B cells activated with different stimuli (e.g., T cell-dependent or independent antigens or cytokines) undergo distinct differentiation pathways and display unique signatures of gene expression [
23]. Thus, we propose that the AID-initiated lesions may be differentially processed in distinct B cell sub-populations. In addition, as discussed above, locus-specific regulatory elements might regulate not only SHM targeting efficiency but also the repair manner of AID-initiated lesions. Further investigations of such regulatory mechanisms will certainly help us better understand how non-Ig loci are targeted for error-prone repair. Elucidating the locus-dependent repair mechanism is highly relevant to genomic instability and lymphomagenesis of B lymphocytes, since many of these loci such as
Bcl6 are frequently targeted by mutations or translocations in human B cell lymphomas [
93].
c-myc is also a frequent translocation partner of
Ig loci in human mature B cell lymphomas [
94], the majority of which are thought to derive from GC B cells [
95]. However, extensive sequencing studies of the
c-myc locus in human memory B cells showed little SHM activity [
55,
57,
96]. Although the mutation frequency is higher in mouse PP GC B cells and even higher in MSH2
-/-UNG
-/- GC B cells [
53], the frequency of AID deamination at the
c-myc locus is still several orders of magnitude lower than that at the
Igh locus. Thus, these data suggest that
c-myc is not an efficient AID target in GC B cells, consistent with prior studies [
55,
57,
96]. If so, then how can
c-myc become a frequent target of translocations, and are those translocations truly derived from GC B cells? Recent studies have shown that
c-myc is expressed in B cell subpopulations in immature and mature GCs, and genetic deletion of the
c-myc gene demonstrated that it plays essential roles in the formation and maintenance of GCs [
97]. These data suggest that this particular subset of c-MYC
+ GC B cells might be susceptible to translocation and subsequent malignant transformation [
97]. It is likely that the DNA repair mechanisms at the
c-myc locus become dysregulated in GC B cells, and instead of error-free repair, the AID-initiated lesions are repaired in an error-prone way. Another possibility is that AID-initiated lesions at the
c-myc locus are generated in a sub-population of B cells other than GC B cells and that these lesions undergo error-prone repair leading to DSBs/translocations in this particular subset of B cells.
Extensive studies of the
Igh locus demonstrate that AID-initiated lesions are processed in an error-prone manner that leads to mutations or DSBs at the
Igh locus [
3,
9]. This error-prone processing seems to be independent of B cell populations because both GC and cytokine activated B cells harbor frequent mutations at the S regions and undergo efficient CSR that requires DSB generation [
39,
43]. It has been shown that the
Igh locus DSBs progress into chromosomal breaks/translocations in the absence of a normal DSB response using mice deficient for DNA repair factors (see below). However, almost all of these studies assess the
Igh locus abnormalities in cytokine-activated B cells. It remains unknown whether GC B cells indeed harbor an increased level of
Igh locus abnormalities when DSB responses are defective.
DSB responses during CSR
AID-initiated lesions can be converted into DSBs at the
Igh locus [
20]. These AID-dependent DSBs are the essential intermediates of CSR and are rapidly sensed and processed by DNA damage response (DDR) factors [
9,
10]. One prominent DDR factor is ATM, a phosphatidylinositol 3-kinase, which rapidly phosphorylates histone H2AX, MDC1, 53BP1, and NBS1 [
98,
99]. Subsequently, these activated DDR factors form large γH2AX-dependent foci that are potentially involved in cell cycle checkpoints, recruitment/activation of repair proteins, and/or tethering broken DNA ends for repair [
9,
10,
100–
102]. The DSB response is required for normal CSR, as mice deficient for DDR factors are impaired, to variable extents, in CSR [
9,
10]. In this regard, more players in DDR have been found to be essential for a normal level of CSR, such as Rif1 [
103–
105]. Moreover, in the absence of a normal DSB response, AID-dependent S region DSBs separate and progress into chromosomal breaks and translocations [
106,
107]. Of note, all present analyses of DSB formation have been performed in cytokine-activated B cells
in vitro. Whether specific antigen-stimulated GC B cells harbor increased chromosomal breaks/translocations in the absence of a normal DSB response remains to be determined.
Among the DDR factors involved in CSR, 53BP1 has attracted attention because 53BP1 deficient B cells have the most dramatic reduction in CSR [
108,
109]. Interestingly, in the absence of 53BP1, intra-switch recombination (ISR) within a single S region is preferred over the long-range joining of two distant S regions, as evidenced by a much higher level of ISR in 53BP1 deficient B cells compared to wt B cells or B cells deficient in other DSB repair factors [
110]. Although unusual rearrangements (a high level of insertions) occur in 53BP1
-/- B cells, sequence analysis of endogenous Sµ-Sγ1 junctions revealed no significant differences in the extent of homology employed in the donor and acceptor S sequences [
110]. Based on these data, it was concluded that absence of 53BP1 favors short-range over long-range recombination [
110], consistent with the dramatic reduction of CSR in 53BP1
-/- B cells [
108,
109], whereas end joining of endogenous S regions seems to be “qualitatively unaffected” [
110]. Later studies employing the I-SceI-based system suggest a role for 53BP1 in protecting DNA ends against resection [
111,
112], a process by which 5′-3′ nucleolytic degradation generates ssDNA overhangs [
113], albeit the I-SceI-mediated switching seems to be much less robust than the endogenous S region-mediated CSR [
111,
112]. In the absence of 53BP1, the extent of end-resection during CSR is increased [
111,
112], which depends on CtBP-interacting protein (CtIP, Rbbp8) and exonuclease 1 [
114]. Inhibition of CtIP by shRNA partially rescues the CSR defect in 53BP1-deficient B lymphocytes, which, along with other data, leads to the conclusion that CtIP-mediated DSB resection is in part responsible for the profound CSR defect in 53BP1-deficient B cells [
114]. However, shRNA knockdown of CtIP has no effects on CSR level in wt B cells, consistently, genetic deletion of
CtIP does not affect CSR level or the extent of microhomology at Sμ-Sγ1 junctions based on analysis of wt and CD19
cre/+CtIP
co/- primary B cells [
114]. In addition, recent studies highlight a role of 53BP1 in recruiting other DSB response factors such as RIF1 to the broken DNA ends during CSR [
103–
105].
Rif1-deficient B cells have a profound CSR defect due to the inability to prevent end resection [
103–
105]. Finally, 53BP1 also plays a critical role in defining DSB repair pathway choice in G
1 and S/G
2 cell-cycle phases [
113] (see below). Of note, in addition to 53BP1 and its interacting partners, there are components of the MMR pathway that also contribute to DNA end-processing during CSR, which was extensively reviewed previously [
115].
Non-homologous end joining of DSBs during CSR
DSBs are the essential intermediates during antigen receptor diversification, including V(D)J recombination and CSR [
9]. DSBs can also be induced by exogenous or endogenous stimuli such as irradiation or reactive oxygen species. To minimize the potential deleterious effects of DSBs on genome integrity, eukaryotic cells have evolved two distinct DSB repair pathways, non-homologous end joining (NHEJ) [
116] and homologous recombination (HR), to repair DSBs [
113]. NHEJ catalyzes direct ligation of minimally processed DNA ends with little or no homology, thus functioning independently of homologous sequence template [
113]. NHEJ is required for immune system development, including V(D)J recombination and CSR [
15]. It is generally thought that NHEJ is inherently error-prone and responsible for generating chromosomal translocations formed by joining DSBs from heterologous chromosomes [
116–
118]. NHEJ can act throughout the cell cycle but is preferentially active in G
1 phase. In contrast, HR requires a homologous sequence as a repair template, and is thus a more faithful repair mechanism. HR is mainly active in the S/G
2 phases of the cell cycle [
119].
The choice of DNA repair pathways is critical for the cell due to the distinct outcomes of NHEJ and HR pathways. End resection seems to play a critical role in determining the choice of repair pathway [
120]. Since NHEJ only acts on minimally processed DNA ends that do not harbor long ssDNA overhangs, NHEJ is inhibited by end resection [
113]. In contrast, end resection is a prerequisite for HR, which generates the ssDNA for homology searching. As in the DNA repair pathway, DSB end resection is also regulated during the cell cycle [
120]. During G
1 phase of the cell cycle, 53BP1 protects the DNA ends by antagonizing BRCA1, which is activated during S/G
2 phase to mediate end resection [
103,
105,
121]. Recent studies reveal that 53BP1 recruits RIF1 to DSBs in a phosphorylation-dependent manner that suppresses the accumulation of BRCA1 at DSBs; this 53BP1-RIF1 interaction is required for protecting DNA ends during G
1 phase [
103–
105,
121]. During S/G
2 phase, RIF1 is prevented from accumulating at DSBs by BRCA1 and its interacting protein CtIP [
103,
105,
121]. Thus, these studies suggest a model in which a cell cycle-regulated network controls the DSB repair pathway choice that is composed of 53BP1-RIF1 and BRCA1-CtIP; the former dominates during G
1 and favors NHEJ while the latter is active during S/G
2 and stimulates HR [
105].
The components of NHEJ include Ku70 (XRCC6), Ku80 (XRCC5), DNA-dependent protein kinase catalytic subunit (DNA-PKcs), Artemis, polymerase μ, polymerase λ, XRCC4-like factor (XLF, also called Cernunnos), XRCC4, and DNA ligase IV (Lig 4) [
117]. The DNA binding subunits Ku70 and Ku80, together with DNA-PKcs that form the DNA-PK holoenzyme (DNA-PK), are responsible for recognizing DSBs [
116,
122]. The protected DSBs are eventually ligated by a complex of XRCC4 and Lig4, probably together with XLF, which catalyzes end-ligation specifically in NHEJ [
116]. During DNA end resection, Ku proteins are removed from DNA ends by the CtIP/MRE11-RAD50-NBS1 (MRN) complex to generate ssDNA overhangs that are refractory to canonical NHEJ and favor HR [
113].
NHEJ catalyzes the joining step of CSR [
7], which has been extensively reviewed recently [
15]. Classical NHEJ (C-NHEJ), defined as end-joining dependent on Ku proteins and the other known NHEJ factors, repairs both programmed DSBs generated during V(D)J recombination and general DSBs induced by DNA damaging agents such as ionizing radiation [
123]. In the absence of C-NHEJ factors such as XRCC4 or Lig4, the level of CSR is reduced in cytokine activated primary B cells [
124]. Furthermore,
Xrcc4- or
Lig4-deficient B cells harbor a high level of
Igh locus chromosomal breaks and translocations due to inability to repair AID-initiated DSBs [
124–
127]. These data demonstrate that C-NHEJ is required for a normal level of CSR and for the maintenance of genomic stability in B cells. Interestingly, there is still about 20%–50% of the wt level of CSR observed in
Xrcc4- or
Lig4-deficient primary B cells; this demonstrates that, apart from the C-NHEJ pathway, there is an alternative end-joining (A-EJ) pathway that could catalyze the end-ligation during CSR [
124]. Additionally, in the absence of both Ku70 and Lig4, primary B cells can undergo CSR at a level similar to that of Ku70- or Lig4-deficient cells [
126]. These data convincingly show that A-EJ is a distinct repair pathway from C-NHEJ at both stages of DSB recognition and joining [
126].
Consistent with the phenotypes of single deficient C-NHEJ B cells [
124,
125], the Ku70
-/- Lig4
-/- double deficient B cells harbor an extremely high level of genomic instability upon
in vitro cytokine activation [
127]. Thus, although NHEJ is inherently error-prone compared to HR, it is also required to maintain genome stability. Deletion of C-NHEJ factors and p53 simultaneously in B lymphocytes often leads to B cell lymphoma development accompanied with clonal translocations of
Ig loci [
9,
11,
128–
131]. It has been proposed that the translocations observed in the NHEJ deficient pro-B [
131] and peripheral B cell lymphomas [
130] are catalyzed by the A-EJ pathway. Apart from functioning in tumor cells, it was found that the A-EJ pathway can operate in primary non-transformed B cells to promote translocations of
Igh and
Igλ loci [
125]. Junctions catalyzed by the A-EJ pathway display a signature of microhomology (MH), therefore, this process is often referred to as microhomology mediated end-joining (MMEJ) [
132]. However, MH is not a prerequisite for the A-EJ pathway because not all of the A-EJ catalyzed junctions have MH and a fraction of them are actually direct joins [
124–
127]. The molecular components of the A-EJ pathway remain incompletely characterized. Several factors have been reported to function in chromosomal A-EJ, including Nbs1 [
133], Mre11 [
134–
136], and CtIP [
137,
138]. As discussed above, all of these factors are involved in DNA end resection to uncover MH, thereby promoting A-EJ. Since A-EJ is not dependent on Lig4, Lig3 and its co-factor XRCC1 have been widely assumed to perform the end-ligation during A-EJ [
115,
139–
141]. XRCC1 and Lig3 form a complex similar to XRCC4/Lig4, which operates in short-patch base excision repair and single-strand break repair [
142]. Based on analysis of
Xrcc1+/- heterozygous B cells, it was concluded that XRCC1 functions in A-EJ in the context of CSR and formation of
Igh-c-myc translocations [
140]. In contrast, a later study definitively showed that XRCC1 is not essential for A-EJ repair of I-SceI-induced DSBs in
Xrcc4-deficient pro-B cell lines [
143]. Furthermore, conditional inactivation of
Xrcc1 in
Xrcc4-deficient primary B cells did not affect the formation of CSR junctions catalyzed by A-EJ or the frequency of
Igh-
c-myc translocations [
143]. Thus, XRCC1 is not an indispensable factor for major pathways of chromosomal A-EJ [
143].
AID in human cancers
The primary antibody diversification process, namely V(D)J recombination, generates an almost infinite repertoire of antibodies [
12,
144]. However, B lymphocytes still need to undergo a secondary diversification process to produce the high affinity antigen-specific class-switched antibodies that are essential for battling infectious diseases. In mammalian B cells, the secondary diversification of antibody is initiated by AID [
4]. In this regard, B cells pay a high price for the benefits of utilizing AID to generate point mutations or DSBs, since about 95% of human lymphomas are B cell-derived [
95]. This evolutionarily conserved mechanism is beneficial because the efficient AID-mediated protection against pathogen infection is probably far more important for species survival, thereby outweighing its negative impact on the B cell genome as a mutator. However, dysregulated AID targeting or imparied DSB responses likely result in cancer development in individuals.
Human mature B cell lymphomas often harbor clonal translocations involving
Ig gene loci and oncogenes such as
c-myc or
Bcl6 [
93]. Mechanisms promoting such translocations have been extensively reviewed [
9,
11,
117,
145]. AID has a requisite role in initiating
Igh locus breaks/translocations [
125,
146]. Many of these translocations are proposed to occur during the SHM/CSR process [
93]; consistently, the translocation junctions at
Igh locus often fall within S regions [
93]. Data from mouse models seem to support this hypothesis in that the junctions of
Igh-c-myc translocations often occur within or around S regions, such as in the Xrcc4/p53 deficient peripheral B cell lymphomas [
130]. In addition to
Ig loci as the physiological targets of AID, non-Ig loci such as
Bcl6,
c-myc, and
Pim1 are targeted by SHM in human B cell lymphomas [
147], which is presumably induced by AID. In addition, human B cell lymphomas harbor translocations between non-Ig genes such as
Bcl6 with various partners [
148,
149]. How AID induces DSBs in non-Ig gene partners is not completely understood. Prior studies suggest that AID could target the
c-myc locus for somatic mutations [
53] and DSBs at a very low frequency [
150]. In addition, overexpression of AID using an Igκ promoter and enhancer leads to an increased genomic instability in B lymphocytes [
151]. Several more recent studies clearly show that AID can access the genome widely to induce DSBs or translocations in
in vitro cytokine activated B cells [
74,
152–
154]. Taken together, these data collectively validate a role for AID in compromising the B cell genome. Although AID may play a critical role in generating DSBs at these loci, it remains unclear at which stage of B cell differentiation such translocations originate. In addition, it is puzzling how AID induces DSBs in non-Ig loci such as
c-myc in which the AID-initiated lesions are normally repaired in an error-free manner in GC B cells [
53].
AID expression was originally detected in GC B cells [
155]. Studies using an AID reporter mouse showed that AID could be expressed at variable levels in different stages of B cell differentiation, predominantly in activated B cells [
156]. More interestingly, AID could be induced in bone marrow (BM) B cells (specifically pro-to-pre-B cells) by the Abelson murine leukemia virus (Ab-MLV) [
157]. In this context, AID protects pro-B cells against transformation by Ab-MLV [
157]. The
Abl1 (V-abl Abelson murine leukemia viral oncogene homolog 1) gene encodes a protein tyrosine kinase [
158,
159]. The
Abl1 gene can be fused to the Breakpoint Cluster Region (
Bcr) to form a fusion gene
Bcr-Abl1 by the t(9;22) translocation, present in many cases of chronic myelogenous leukemia (CML) [
160]. The oncogenic BCR-ABL1 kinase induced aberrant AID expression in pre-B acute lymphoblastic leukemia (ALL) and lymphoid CML blast crisis [
161,
162]. Transduction of human CML cell lines with AID seems to confer Imatinib resistance, which presumably is caused by AID-mediated mutations in the
bcr-abl1 fusion gene that render the BCR-ABL1 kinase refractory to Imatinib treatment [
162]. Using a murine model of BCR-ABL1-induced ALL, the same group showed that AID deficiency prolonged the survival of recipient mice transplanted with BCR-ABL1-transduced BM cells [
163]. This observation is consistent with AID’s role in promoting genomic instability, which is further evidenced by a lower frequency of amplifications/deletions in the genome and mutations in non-Ig genes of the AID
-/- leukemia cells [
163]. Therefore, it is likely that AID also contributes to leukemogenesis. However, it is not well understood how aberrant AID expression is regulated in a pathological context.
It has also been proposed that AID might play a role in the development of cancers with non-B cell origin. Indeed, AID is capable of inducing mutations in fibroblasts [
164] or contributing to cancer development in multiple tissue such as lung when constitutively expressed via chicken β-actin promoter [
165]. However, these studies only show a sufficient role of exogenous AID in promoting cancers in non-B cells. It remains less well characterized whether non-B cells express AID endogenously to a biologically significant level that executes any functionality. Solid tumors are also found to harbor cancer type-specific chromosomal translocations, such as the
TMPRSS2-ETS translocation in prostate cancer [
166] and
ALK translocations in lung cancer [
167]. AID has been implicated in promoting the
TMPRSS2 translocation in a prostate cancer cell line [
168] and point mutations in
TP53 in gastric cells
in vitro [
169]. However, we did not find any detectable level of AID transcripts in AGS cell lines upon testing a similar TNF treatment employed previously [
169]. Furthermore, we failed to consistently detect AID transcripts via RT-PCR in a collection of cancer cell lines from NCI60 panels, whose microarray data shows a relatively higher level of AID expression. Thus, to prove a definitive role of AID in promoting genomic instability in non-B cells, additional data are required for supporting this intriguing hypothesis.
AID has also been implicated in reprogramming or epigenetic regulation of transcription [
170–
174]. However, a recent study shows that the transcriptome alterations in AID deficient mice may not be directly caused by AID deficiency, but may instead be related to a CBA mouse strain derived region around the targeted
Aicda locus [
175]. In addition, it is likely that deletion of the
Aicda locus might affect long-range enhancers present in this locus that could regulate the expression of distantly located genes. Thus, these confounding parameters might provide alternative interpretations for AID’s role in the genotype-phenotype correlations previously reported in numerous studies using the AID
-/- mouse strain.
Recent controversies in the field of AID targeting
Given the dangerous role of AID as a mutator, its occupancy in the B cell genome has always been a critical question in the AID field. A recent study by Yamane
et al. used a ChIP-seq approach to show that 5910 transcribed genes are targeted by AID in the
in vitro activated B cells [
74]. Furthermore, it was shown that many of these non-Ig loci harbor a high level of mutations in these
in vitro activated B cells [
74]. However, a correspondence published by Hogenbirk
et al. in 2012 [
176] challenged the conclusions drawn by Yamane
et al. [
74]. According to the response by Yamane
et al., the inconsistency between the two groups’ findings seems to be related to “the inappropriate use of linear normalization, which underestimates enrichment for AID relative to background and makes it impossible to measure the binding of AID at genes encoding immunoglobulins or across the genome” [
176]. Since we do not have prior experience with ChIP-seq experiments and subsequent data analysis, we cannot comment on which way of normalization (linear vs. library-size) is more appropriate in this situation. However, we notice that, with regard to AID recruitments to different genomic loci, previous studies have shown that AID is enriched in switch regions at
Ig loci as evidenced by conventional ChIP analysis [
69,
177]. In addition, we found that AID was enriched at the inserted S region targeted into the first intron of
Bcl6 [
67]. Therefore, specific sequences of
Ig loci such as S regions likely can enhance AID recruitments. It is agreed on by both groups that AID probably associates with chromatin weakly as stated in the author’s reply by Yamane
et al., “That weak association of AID with chromatin was consistent with measurements obtained through the use of conventional ChIP analysis of AID with the same or similar antibodies
3-11and probably reflected the low concentration of nuclear AID in B lymphocytes
12” [
176]. Therefore, the global assessment of AID recruitment by ChIP-seq analysis may be helpful to provide a general landscape of AID occupancy in the genome on a large scale. However, the situation at specific loci might differ and require more detailed analysis by independent methodology.
With regard to SHM at non-Ig loci, the reason for Yamane
et al. to find ample evidence of SHM at these loci is the employment of Igκ-AID Ung
-/- B cells [
74]. These mutant B cells overexpress AID driven by the Igκ promoter and enhancer [
151] and are defective in the repair of AID-initiated lesions due to the absence of
Ung. Therefore, the mutation frequency at the non-Ig loci was found to be dramatically high in these mutant B cells compared to pure wt B cells (without any genetic modification). Of course, it can be argued that ectopically expressed AID might or might not target non-Ig loci “properly” as the endogenously expressed AID. However, other independent studies showed that some of the genes mutated in Igκ-AID Ung
-/- cultures (such as Pax5, Mir142, Pim1, etc.) are also translocated in activated B cells that express physiological amounts of AID [
152], and translocation frequency seems to correlate to mutation frequency [
153]. Therefore, it is possible that at least a fraction of these mutated genes in Igκ-AID Ung
-/- B cells can indeed be targeted by endogenously expressed AID. Although AID might display a promiscuous pattern in terms of accessing the B cell genome, we propose that most of these non-Ig loci targeted by AID are protected by several layers of regulatory mechanisms to prevent genome-wide damage in B lymphocytes, including DNA repair pathways [
53], nucleotide sequence preference [
67], post-translational regulation of AID [
177–
179], or other unidentified mechanisms.
One of our previous studies initially proposed that mechanistic factors including DSB frequency and spatial proximity play an important role in promoting recurrent translocations [
125]. However, only a few specific loci were investigated in that particular study [
125]. Later studies address the role of mechanistic factors in translocation formation at a global level, including Hakim
et al. [
180] and Rocha
et al. [
181], although inconsistent conclusions were drawn by these two studies. Hakim
et al. concluded that “In the absence of recurrent DNA damage, translocations between
Igh or
Myc and all other genes are directly related to their contact frequency. Conversely, translocations associated with recurrent site-directed DNA damage are proportional to the rate of DNA break formation” [
180]. In contrast, Rocha
et al. concluded that “Our studies indicate that the vast majority of known AID-mediated
Igh translocation partners are found in chromosomal domains that contact this locus during class switching” [
181]. Again, the inconsistency seems to be related to the analysis of 4C-seq data. The discrepancies were discussed by Rocha
et al., “Hakim
et al. used nonoverlapping 200 kb fixed windows to analyze the 4C-seq signal, while we used a TSS-centric as well as a domain-centric approach” [
181]. As discussed above, we are not in a position to judge which analysis approach is more appropriate for their 4C-seq data. However, we did notice that when a TSS-centric analysis approach was employed, the data only suggest a relatively weak correlation between proximity and
Igh translocations (Fig. 1D in Ref. [
181]), as the authors concluded “These data indicate a trend linking proximity with
Igh translocations.” Moreover, when the authors focused specifically on validated AID targets, they “found that only 32%–50% interact significantly with
Igh in the nucleus of class-switching B cells” (Fig. 1E in Ref. [
181]). Therefore, these data do not seem to be contradictory to the data presented in Hakim
et al. [
180] since their analysis also showed that a large fraction of loci carrying
Igh translocation hotspots were involved in frequent long range interactions with this locus in activated B cells (Fig. 3b, c in Ref. [
180]). However, Hakim
et al. also identified “thousands of genes that interacted repeatedly with
Igh but that were not associated with translocation hotspots” (Fig. 3c in Ref. [
180]). Thus, in order to identify a better predictor for translocation, the authors suggest an alternative strategy that uses replication protein A (RPA)-seq as a surrogate to measure AID-mediated DNA damage across the B cell genome [
180]. Their data convincingly show that translocations were significantly more correlated to RPA-seq than
Igh 4C-seq [
180]. In the end, one can certainly question whether RPA accumulation indeed reflects AID-induced DNA damage in the B cell genome. Nevertheless, it has been well established that AID plays a critical role in promoting translocations in B cells [
3,
9–
11]. In addition, conclusions drawn by Hakim
et al. [
180] appear to be consistent with another independent study [
182].
A recent study from Alt’s group revealed the global landscape of translocation and spatial proximity by employing elegant experimental systems that allow the well-controlled induction of DSBs [
182]. Consistent with our previous study [
125], DSB frequency and spatial proximity were both shown to be important in translocation generation. In particular, DSB frequency was a rate-limiting factor in promoting translocation. When ample DSBs were induced randomly by ionizing radiation (IR), the translocation correlated well to spatial proximity [
182]. This conclusion is in line with the data presented in Hakim
et al. [
180] which showed that, in the absence of AID damage, translocation correlated well to
Igh contact frequency. Taken together, it has been proven that, when DSB frequency is not rate-limiting either by generating abundant DSBs at random location via IR [
182] or eliminating DSBs induced by a specific factor such as AID [
180], spatial proximity probably determines the translocation frequency. However, if AID is still in action, which can induce DNA lesions at specific loci, it is conceivable that the translocation landscape likely will be determined by AID-initiated lesions in the B cell genome. This notion is supported by the data shown in Hakim
et al. [
180].
In our opinion, the rationale is relatively weak for the hypothesis that nuclear organization influences the “off-target” activity of AID. It is unclear to us why the non-Ig loci targeted by AID need to be close to the
Igh locus. Such a hypothesis would imply that AID accumulated at the
Igh locus could diffuse to other loci in close proximity. As discussed above, nuclear AID concentration is very low and AID can access the B cell genome in a rather promiscuous manner. Thus, it is difficult to conceive that certain sub-compartment of the nucleus can retain a high local concentration of AID. Indeed, another independent study showed that there is no correlation between proximity to
Ig genes and levels of AID targeting or gene mutation; thus, it was concluded that proximity to
Ig loci is unlikely to be a major determinant of AID targeting or mutation of non-Ig genes [
183].
Compliance with ethics guidelines
Zhangguo Chen and Jing H. Wang declare no conflict of interest. This manuscript is a review article and does not involve a research protocol requiring approval by the relevant institutional review board or ethics committee.
Higher Education Press and Springer-Verlag Berlin Heidelberg